File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0114_metho.xml
Size: 30,333 bytes
Last Modified: 2025-10-06 14:12:47
<?xml version="1.0" standalone="yes"?> <Paper uid="W91-0114"> <Title>SHARED PREFERENCES</Title> <Section position="3" start_page="0" end_page="109" type="metho"> <SectionTitle> 2 Preferences in Under- </SectionTitle> <Paragraph position="0"> standing and Generation Natural language understanding is a mapping from utterances to meanings, while generation goes in the opposite direction. Given a set String of input strings (of a given language) and a set Int of interpretations or meanings, we can represent understanding as a relation U C String x Int, and generation as G C IntxString. U and G are relations, rather than functions, since they allow for ambiguity: multiple meanings for an utterance and multiple ways of expressing a meaning 1. A minimal requirement for a reversible system is that U and G be inverses of each other. For all s 6 String and i 6 Int: (s, Oeu (i,s)ec (1) Intuitively, preferences are ways of controlling the ambiguity of U and G by ranking some interpretations (for U) or strings (for G) more highly than others. Formally, then, we can view preferences as total orders on the objects in question (we will capitalize the term when using it in this technical sense). 2 Thus, for any s 6 String an understanding Preference Pint will order the pairs {(s, 01(s,=) E U}, while a generation Preference * The definitions of U and G allow for strings with no interpretations and meanings with no strings. Since any meaning can presumably be expressed in any language, we may want to further restrict G so that everything is expressible: Yi 6 Int (Bs 6 String \[(s, ,) E GI).</Paragraph> <Paragraph position="1"> 2We use total orders rather than partial orders to avoid having to deal with incommensurate structures. The requirement of commensurability is not burdensome in practice, even though many heuristics apparently don't apply to certain structures. For example, a heuristic favoring low attachment of post-modifiers doesn't clearly tell us how to rank a sentence without post-modifiers, but we can insert such sentences into a total order by observing that they have all modifiers attached as low as po6sible.</Paragraph> <Paragraph position="2"> P,,r will rank {(/,s)l(/,, ) 6 G} s. Thus we can view the task of understanding as enumerating the interpretations of a string in the order given by Pint. Similarly, generation will produce strings in the order given by Po,,. Using Up,., and Gp.,.</Paragraph> <Paragraph position="3"> to denote the result of combining U and G with these preferences, we have, for all s G String and</Paragraph> <Paragraph position="5"> where G(0 = {sx,..., sin} and IJ < --. <p.,. (i, sk))\] Alternatively, we note that any Preference P induces an equivalence relation ='p which groups together the objects that are equal under p.4 We can therefore view the task of Generation and Understanding as being the enumeration of P's equivalence classes in order, without worrying about order within classes (note that Formulae 2 and 3 specify the order only of pairs where one member is less than the other under P.) The question now arises of what the relation between understanding Preferences and generation Preferences should be. Understanding heuristics are intended to find the meaning that the speaker is most likely to have intended for an utterance, and generation heuristics should select the string that is most likely to communicate a given meaning to the hearer. We would expect these Preferences to be inverses of each other: if s is the best way to express meaning i, then i should be the most likely interpretation of s. If we don't accept this condition, we will generate sentences that we expect the listener to misinterpret. Therefore we define class(Preference, pair) to be the equivalence class that pair is assigned to under Preference's ordering, 5 and link the the first 3Note that this definition allows Preferences to work 'across derivations.' For example, it allows Pint to rank pairs (s, ,}, (s', i9 where 8 # s'. It permits a Preference to say that i is a better interpretation for s than i' is for s:. It is not clear if this sort of power is necessary, and the algorithtns below require only that Preferences be able to rank different interpretations (strings) for a given string (interpretation).</Paragraph> <Paragraph position="6"> 4Any order P on a set of objects D partitions D into a set of equivalence classes by assigning each x E D to the set {ulv _<P x :z x _<p u}.</Paragraph> <Paragraph position="7"> Selass(Preference, pair) is defined as the number of classes containing items that rank more highly than pair under Preference.</Paragraph> <Paragraph position="8"> (most highly ranked) classes under P/., and P.,r as follows:</Paragraph> <Paragraph position="10"> It is also reasonable to require that opposing sets of preferences in understanding be reflected in generation. If string s, has two interpretations it and i2, with it being preferred to is, and string ss has the same two interpretations with the preferences reversed, then s, should be a better way of expressing i, than i2, and vice-versa for ss:</Paragraph> <Paragraph position="12"> Formula 4 provides a tight coupling of heuristics for understanding and generating the most preferred structures, but it doesn't provide any way to share Preferences for secondary readings.</Paragraph> <Paragraph position="13"> Formula 5 offers a way to share heuristics for secondary interpretations, but it is quite weak and would be highly inefficient to use. To employ it during generation to choose between sl and ss as ways of expressing il, we would have to run the understanding system on both sl and ss to see if we could find another interpretation i2 that both strings share but with opposite rankings relative to il.</Paragraph> <Paragraph position="14"> If we want to share Preferences for secondary readings, we will need to make stronger assumptions. The question of ranking secondary interpretations brings us onto treacherous ground since most common heuristics (e.g., preferring low attachment) specify only the best reading and don't help choose between secondary and tertiary readings. Furthermore, native speakers don't seem to have clear intuitions about the relative ranking of lesser readings. Finally, there is some question about why we should care about non-primary readings, since the best interpretation or string is normally what we want. However, it is important to deal with secondary preferences, in part for systematic completeness, but mostly because secondary readings are vital in any at,tempt to deal with figurative language- humor, irony, and metaphor - which depends on the interplay between primary and secondary readings.</Paragraph> <Paragraph position="16"> To begin to develop a theory of secondary Preferences, we will simply stipulate that the heuristics in question are shared 'across the board' be-I tween understanding and generatmn.~ The simplest way to do this is to extend Formula 4 into a biconditional, a~d require it to hold of all classes (we will reconsider this stipulation in Section 5).</Paragraph> <Paragraph position="17"> For all s6Strin~l and i6Int, we have: et.ss(P,.,, (,,,)) = el.ss(P.,., (i, 8)) (6) Since Preferences now work in either direction, we can simplify our notation and represent them as total orderings of a set T of trees, where each node of each tre4 is annotated with syntactic and semantic information, and, for any t 6 T, str(t) * k returns the string in String that t dominates (i.e., spans), and sere(t) returns the interpretation in Int for the root node of t. For apreferenee P on T and trees tl, th, we stipulate:</Paragraph> <Paragraph position="19"> We close this Section by noting a property of Preferences thatwill be important in Section 4: an ordered list Of Preferences can be combined into a new Preference by using each item in the list to refine the bordering specified by the previous ones. That is, the second Preference orders pairs that are equal under the first Preference, and the third Preference applies to those that are still equal under the second Preference, etc. If P1--. P, are Preferences, we define a new Complex Preference P<,...,> as follows:</Paragraph> <Paragraph position="21"/> </Section> <Section position="4" start_page="109" end_page="112" type="metho"> <SectionTitle> 3 An Algorithm for Sharing Preferences </SectionTitle> <Paragraph position="0"> If we consider ways of sharing Preferences between understanding and generation, the simplest one is to simply produce all possible interpretations(strings), and then sort them using the Preference. This is, of course, inefficient in cases where we are interested in only the more highly ranked possibilities. We can do better if we are willing to make few assumptions about the structure of Preferences and the understanding and generation routines. The crucial requirement on Preferences is that they be 'upwardly monotonic' in the following sense: if t, is preferred to t2, then it is also preferred to any tree containing tz as a subtree. Using subtree(t,,t2) to mean that tx is a subtree of t2, we stipulate</Paragraph> <Paragraph position="2"> Without such a requirement, there is no way to cut off unpromising paths, since we can't predict the ranking of a complete structure from that of its constituents* FinaLly, we assume that both understanding and generation are agenda-driven procedures that work by creating, combining, and elaborating trees. 6 Under these assumptions, the following high-level algorithm can be wrapped around the underlying parsing and generation routines to cause the output to be enumerated in the order given by a Preference P. In the pseudo-code below, mode specifies the direction of processing and input is a string (if mode is understanding) or a semantic representation (if mode is generation).</Paragraph> <Paragraph position="3"> execute_item removes an item from the agenda and executes it, returning 0 or more new trees.</Paragraph> <Paragraph position="4"> generate_items takes a newly formed tree, a set of previously existing trees, and the mode, and adds a set of new actions to the agenda. (The underlying understanding or generation algorithm is hidden inside generate_items.) The variable active holds the set of trees that are currently being used to generate new items, while frozen holds those that won't be used until later, complete_tree is a termination test that returns True if a tree is complete for the mode in question (i.e., if it has a full semantic interpretation for understanding, or dominates a complete string for generation).</Paragraph> <Paragraph position="5"> The global variable classes holds a list of equivalence classes used by equiv_class (defined below), while level holds the number of the equivalence class currently being enumerated. Thaw~restart is called each time level is incremented to generate new agenda items for trees that may belong to that class.</Paragraph> <Paragraph position="6"> ALGORITHM 1 e A wide variety of NLP algorithms can be implemented in this manner, particularly such recent reversible generation algorithms as \[Shieber, van Noord, Moore, and Pereira, 1989\] and \[Calder, Reape, and Zeevat, 1989\].</Paragraph> <Paragraph position="8"> frozen := initialize.agenda(input, mode); {end of global declarations} while frozen do begin solutions := get_complete_trees (frozen, level, mode); agenda := thaw&restart (frozen, level, agenda, mode); while agenda do begin new_trees := execute_item(agenda); while new_trees do begin new_tree := pop(new_trees); if equiv_class (P, new_tree) , > level then push(new_tree, frozen); else if complete_tree (new_tree,mode) then push(newAree, solutions); else generate, items (new_tree, active, agenda, mode); end; end; {agenda exhausted for this level} {solutions may need partitioning} while solutions do begin complete_tree := pop(solutions); if equiv_class(P, complete_tree) > level then push(complete_tree, frozen); else output(complete_tree, level) ; end {increment level to output next class} level := level + 1; end; The function equiv_class keeps track of the equivalence classes induced by the Preferences. Given an input tree, it returns the number of the equivalence class that the tree belongs to. Since it must construct the equivalence classes as it goes,along, it may return different values on different calls with the same argument (for example, it will always return 1 the first time it is called, even though the tree in question may end up ha a lower class.) However, successive calls to equiv_class will always return a non-decreasing series of values, so that a given tree is guaranteed to be ranked no more highly than the value returned (it is this property of eqaiv_class that forces the extra pass over the completed trees in the algorithm above: a tree that was assigned to class n when it was added to solutions may have been demoted to a lower class in the interim as more trees were examined). Less_than and eqeai take a Preference and a pair of trees and return True if the first tree is less than (equal to) the second under the Preference. Create_class takes a tree and creates a new class whose only member is that tree, while insert adds a class to classes in the indicated position (shifting other classes down, if necessary), and select_member returns an arbitrary member of a class.</Paragraph> <Paragraph position="9"> function equiv_class (P: Preference, T: Tree)</Paragraph> <Paragraph position="11"> To see that the algorithm enumerates trees in the order given by <p, note that the first iteration outputs trees which are minimal under <p.</Paragraph> <Paragraph position="12"> Now consider any tree t, which is output on a subseqent itertion N. For all other t,, output on that iteration, t, =p t,,. Furthermore, t, contains a subtree t,ub which was frozen for all levels up to N. Using T(J) to denote the set of trees output on iteration J, we have: VI_< I< N IV ti 6 T(I) ti <p t,ub\]\], whence, by stipulation 10, t, <p ti. Thus t, is greater than or equal to all trees which were enumerated before it. To calculate the time complexity of the algorithm, note that it calls equiv_class once for each tree created by the underlying understanding or generation algorithm (and once for each complete interpretation). Equiv_class, in turn, must potentially compare its argument with each existing equivalence class. Assuming that the comparison takes constant time, the '.complexity of the algorithm depends on the number k of equivalence classes <p induces: if the Underlying algorithm is O(f(n)), the overall comp~lexity is O(f(n)) x k. Depending on the Preference, k could be a small constant, or itself proportional to f(n), in which case the complexity woul~ be O(f(n)~).</Paragraph> </Section> <Section position="5" start_page="112" end_page="116" type="metho"> <SectionTitle> 4 Optimization of Prefer- ences </SectionTitle> <Paragraph position="0"> As we make more restrictive assumptions about Preferences, more efficient algorithms become possible. Initialily , we assumed only that Preferences specified! total orders on trees, i.e., that would take two I trees as input and determine if one was less than, greater than, or equal to the other ~. Given such an unrestricted view of Preferences, ~ve can do no better than producing all interp~-etations(strings) and then sorting them. This simple approach is fine if we want all possibilities, especially if we assume that there won't, be a large number of them, so that standard n ,2 or n logn sorting algorithms (see \[Aho, Hopcroft, and Ullman, 1983\]) won't be much of an addit~ional burden. However, this approach is inefficient if we are interested in only some of the possibilities. Adding the monotonicity restriction 10 permits Algorithm 1, which is more efficient in. that it postpones the creation of (successors of) lower ranked trees. However, we are still opera'ting with a very general view of what Preferencesl are, and further improvements are possible when we look at individual Preferences in detail, in this section, we will consider heuristics for lexical selection, scope, and anaphor resolution. We do not make any claims for the usefullness of these heuristics as such, but take them as concrete 'examples that show the importance of considering the computational properties of Preferences.</Paragraph> <Paragraph position="1"> Note that Algorithm 1 is stated in terms of a single Preference. It is possible to combine multiple Preferences into a single one using Formula 9, rWe also assume \[hat this test takes constant time.</Paragraph> <Paragraph position="2"> and we are currently investigating other methods of combination. Since the algorithms below are highly specialized, they cannot be combined with other Preferences using Formula 9. The ultimate goal of this research, however, is to integrate such specialized algorithms with a more sophisticated version of Algorithm 1.</Paragraph> <Section position="1" start_page="112" end_page="112" type="sub_section"> <SectionTitle> 4.1 Lexical Choice </SectionTitle> <Paragraph position="0"> One simple preferencing scheme involves assigning integer weights to lexical items and syntactic rules. Items or rules with higher weights are less common and are considered only if lower ranked items fail. When combined with restriction 10, this weighting scheme yields a Preference <wt that ranks trees according to their lexical and rule weights. Using maz_wt(T) to denote the most heavily weighted lexical item or rule used in the construction of T, we have:</Paragraph> <Paragraph position="2"> The significant property here is that the equivalence classes under <wt can be computed without directly comparing trees. Given a lexical item with weight n, we know that any tree containing it must be in class n or lower. Noting that our algorithm works by generate-and-test (trees are created and then ranked by equiv_class), we can achieve a modest improvement in efficiency by not creating trees with level n lexical items or rules until it is time to enumerate that equivalence class. We can implement this change for both generation and understanding by adding level as a parameter to both initialize_agenda and generate_items, and changing the functions they call to consider only rules and lexical items at or below level. How much of an improvement this yields will depend on how many classes we want to enumerate and how many lexical items and rules there are below the last class enumerated.</Paragraph> </Section> <Section position="2" start_page="112" end_page="114" type="sub_section"> <SectionTitle> 4.2 Scope </SectionTitle> <Paragraph position="0"> Scope is another place where we can improve on the basic algorithm. We start by considering scoping during Understanding. Given a sentence s with operators (quantifiers) ol...o,, assigning a scope amounts to determining a total order on ol ...o, s. If a scope Preference can do SNote that this ordering is not a Preference. A Preference will be a total ordering of trees, each of which contains such a scope ordering, i.e., a scope Preference will be an ordering of orderings of operators.</Paragraph> <Paragraph position="1"> no more than compare and rank pairs of scopings, then the simple generate-and-test algorithm will require O(n!) steps to find the best scoping since it will potentially have to examine every possible ordering. However, the standard heuristics for assigning scope (e.g., give &quot;strong&quot; quantifiers wide scope, respect left-to-right order in the sentence) can be used to directly assign the preferred ordering of ox... ON. If we assume that secondary readings are ranked by how closely they match the preferred scoping, we have a Preference <,c can be defined. In the following (ol, oj) 6 Sc(s) means that oi preceeds oj in scoping Sc of sentence s, and Scb,,t(s) is the preferred ordering of the operators in s given by the heuristics:</Paragraph> <Paragraph position="3"> Given such a Preference, we can generate the scopings of a sentence more efficiently by first producing the preferred reading (the first equivalence class), then all scopes that have one pair of operators switched (the second class), then all those with two pairs out of order, etc. In the following algorithm, ops is the set of operators in the sentence, and sort is any sorting routine, switched? is a predicate returning True if its two arguments have already been switched (i.e., if its first arg was to the right of its second in Scbe,t(s)), while switch(o,, o2, ord) is a function that returns new ordering which is the same as ord except that o~ precedes o, in it.</Paragraph> <Paragraph position="4"> {the best scoping}</Paragraph> <Paragraph position="6"/> <Paragraph position="8"> While the Algorithm 1 would require O(n!) steps to generate the first scoping, this algorithm will output the best scoping in the n 2 or n log n steps that it takes to do the sort (cf \[Aho, Hopcroft, and Ullman, 1983\]), while each additional scoping is produced in constant time. 9 The algorithm is profligate in that it generates all possible orderings of quantifiers, many of which do not correspond to legal scopings (see \[Hobbs and Shieber, 1987\]). It can be tightened up by adding a legality test before scope is output. null When we move from Understanding to Generation, following Formula 6, we see that the task is to take an input semantics with scoping Sc and enumerate first all strings that have Sc as their best scoping, then all those with Sc as the second best scoping, etc. Equivalently, we enumerate first strings whose scopings exactly match Sc, then those that match Sc except for one pair of operators, then those matching except for two pairs, etc. We can use the Algorithm 1 to implement this efficiently if we replace each of the two conditional calls to equiv_class. Instead of first computing the equivalence class and then testing whether it is less than level, we call the following function class_less_than: {True iff candidate ranked at level or below} { Target is the desired scoping} function classAess_than( candidate, target, level)</Paragraph> <Paragraph position="10"> 9switched.C/ can be implemented in constant time if we record the position of each operator in the original scoping SCbest. Then switched.C/(Ol, 02) returns True iff posiaon(o2) < p0siao,(ol).</Paragraph> <Paragraph position="12"> To estimate the complexity of class_less_than, note that if no switches are encountered, test_orderwill make one pass through targ_rest (= targ) in O(n) steps, where n is the length of targ.</Paragraph> <Paragraph position="13"> Each switch encoUntered results in a call'to sita- r pie_test, O(n) steps, plus a call to test_arg on the full list targ for another O(n) steps. The overall complexity is thus O((j+ 1) x n), where level = j is the number switches permitted. Note that class_less_than tests a candidate string's scoping only against the target scope, without having to inspect other possible strings or other possible scopings for the string. We therefore do not need to consider all strings that can have Sc as a scoping in order to fifid the most highly ranked ones that do. Furthermore, class_less_than will work on partial constituents (it doesn't require that cand have the same number of operators as targ), so unpromising piths can be pruned early.</Paragraph> </Section> <Section position="3" start_page="114" end_page="116" type="sub_section"> <SectionTitle> 4.3' Anaphoi.a </SectionTitle> <Paragraph position="0"> Next we consider the problem of anaphoric reference. From the standpoint of Understanding, resolving an anaphoric reference can be viewed as a matter of finding a Preference ordering of all the possible antecedents of the pronoun. Algorithm 1 would have to produce a separate interpretation for each object that had been mentioned in the discourse and then rank them all.</Paragraph> <Paragraph position="1"> This would clearly be extremely inefficient in any discourse more than a couple of sentences long. Instead, we will take the anaphora resolution algorithm from \[Rich and Luperfoy, 1988\], \[Luperfoy and Rich, 1991\] and show how it can be viewed as an implementation of a Complex Preference, allowing for a more efficient implementation. null Under this algorithm, anaphora resolution is entrusted to Experts of three kinds: a Proposer finds likely candidate antecendents, Filters provide a quick way of rejecting many candidates, and Rankers perform more expensive tests to choose among the rest. Recency is a good example of a Proposer; antecedents are often found in the last couple of sentences, so we should start with the most recent sentences and work back.</Paragraph> <Paragraph position="2"> Gender is a typical Filter; given a use of &quot;he&quot;, we can remove from consideration all non-male objects that the Proposers have offered. Semantic plausibility or Syntactic parallelism are Rankers; they are more expensive than the Filters and assign a rational-valued score to each candidate rather than giving a yes/no answer.</Paragraph> <Paragraph position="3"> When we translate these experts into our framework, we see that Proposers are Preferences that can efficiently generate their equivalence classes in rank order, rather than having to sort a pre-existing set of candidates. This is where our gain in efficiency will come: we can work back through the Proposer's candidates in order, confident that any candidates we haven't seen must be ranked lower than those we have seen. Filters represent a special class of Preference that partition candidates into only two classes: those that pass and those that are rejected. Furthermore, we are interested only in candidates that aiifilters assign to the first class. If we simply combine n Filters into a Complex Preference using Formula 9, the result is not a Filter since it partitions the input into 2&quot; classes. We therefore define a new simple Filter F(I ,..J.) that assigns its input to class 1 iff F1...Fn all do. Finally, Rankers are Preferences of the kind we've been discussing so far.</Paragraph> <Paragraph position="4"> When we observe that the effect of running a Proposer and then removing all candidates that the Filters reject is equivalent to first running the Filter and then using the Proposer to refine its first class 1deg, we see that the algorithm above, when run with Proposer Pr, Filters F1... Fn and Rankers Rt ... Rj, implements the Complex Preference P(Ftl ' I.),pr,at...a~), defined in accordance with Formu'la 9. We thus have the following algorithm, where nezt_class takes a Proposer and a pronoun as input and returns its next equivalence class of candidate antecedents for the pronoun.</Paragraph> <Paragraph position="6"> while (cand) do begin for eand in cands do begin for filter in Filters do begin if not(Filter(cand)) then remove(cand, filtered_cand); end end {filtered_cand now contains class n under} {P(F(,,...l.),pr ). Rankers R1-.. Rj} {may split it into several classes} refine&output(filtered_cand, Rankers); end output(cand, class); end end {Refine&Output} Moving to Generation, we use this Preference *0 In both cases, the result is: pl n fl ,.-- P., l'lfl, where Pl ... p. are the equivalence classes induced by the Proposer, and fl is the Filter's first equivalence class. to decide when to use a pronoun. Following Formula 6, we want to use a pronoun to refer to object z at level n iff that pronoun would be interpreted as referring to z in class n during Understanding. First we need a test occursf(Proposer, z) that will return True iff Proposer will eventually output z in some equivalence class. For example, a Recency Proposer will never suggest a candidate that hasn't occurred in the antecedent discourse, so there is no point in considering a pronoun to refer to such an object. Next, we note that the candidates that the Proposer returns are really pairs consisting of a pronoun and an antecedent, and that Filters work by comparing the features of the pronoun (gender, number, etc.) with those of the antecedent. We can implement Filters to work by unifying the (syntactic) features of the pronoun with the (syntactic and semantic) features of the antecedent, returning either a more fully-specified set of features for the pronoun, or .L if unification fails. We can now take a syntactically underspecified pronoun and z and use the Filter to choose the appropriate set of features. We are now assured that the Proposer will suggest z at some point, and that z will pass all the filters.</Paragraph> <Paragraph position="7"> Having established that z is a reasonable candidate for pronominal reference, we need to determine what claxs z will be assigned to as an antecedent. Rankers such as Syntactic Parallelism must look at the full syntactic structure**, so we must generate complete sentences before doing the final ranking. Given a sentence s contaning pronoun p with antecedent z, we can determine the equivalence class of (p, z) by running the Proposer until it (p, z) appears, then running the Filters on all other candidates, and passing all the survivors and (p,x) to refine~ontpnt, and then seeing what class (p, z) is returned in. Alternatively, if we only want to check whether (p, z) is in a certain class n or not, we can run the resolution algorithm given above until n classes have been enumerated, quitting if (p,x) is not in it. (See the next section for a discussion of this algorithm's obvious weaknesses.) nThe definitions we've given so far do not specify how Preferences should rank &quot;unfinished&quot; structures, i.e., those that don't contain all the information the Preference requires. One obvious solution is to assign incomplete structures to the first equivalence class; M the structures become complete, they can be moved down into lower daases if necessary. Under such a strategy, Preferences such as Syntactic Parallelism will return high scores on the incomplete constituents, but these scores will be meaningless, since many of the resulting complete structures will be placed into lower classes.</Paragraph> </Section> </Section> class="xml-element"></Paper>