File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/p95-1034_metho.xml
Size: 27,041 bytes
Last Modified: 2025-10-06 14:14:04
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1034"> <Title>Two-Level, Many-Paths Generation</Title> <Section position="4" start_page="252" end_page="253" type="metho"> <SectionTitle> 3 Issues in Lexical Choice </SectionTitle> <Paragraph position="0"> The process of selecting words that will lexicalize each semantic concept is intrinsically linked with syntactic, semantic, and discourse structure issues. 2 Multiple constraints apply to each lexical decision, often in a highly interdependent manner. However, while some lexical decisions can affect future (or past) lexical decisions, others are purely local, in the sense that they do not affect the lexicMization of other semantic roles. Consider the case of time adjuncts that express a single point in time, and assume that the generator has already decided to use a prepositional phrase for one of them. There are several forms of such adjuncts, e.g., at five.</Paragraph> <Paragraph position="1"> She left on Monday.</Paragraph> <Paragraph position="2"> in February.</Paragraph> <Paragraph position="3"> In terms of their interactions with the rest of the sentence, these manifestations of the adjunct are identical. The use of different prepositions is an interlexical constraint between the semantic and syntactic heads of the PP that does not propagate outside the PP. Consequently, the selection of the preposition can be postponed until the very end.</Paragraph> <Paragraph position="4"> Existing generation models however select the preposition according to defaults or randomly.</Paragraph> <Paragraph position="5"> among possible alternatives or by explicitly encoding the lexical constraints. The PENMAN generation system (Penman, 1989) defaults the preposition choice for point-time adjuncts to at, the most commonly used preposition in such cases. The FUF/SURGE (Elhadad, 1993) generation system is an example where prepositional lexical restrictions in time adjuncts are encoded by hand, producing fluent expressions but at the cost of a larger grammar. null Collocational restrictions are another example of lexical constraints. Phrases such as three straight ~We consider lexical choice as a general problem for both open and closed class words, not limiting it to the former only as is sometimes done in the generation literature.</Paragraph> <Paragraph position="6"> victories, which are frequently used in sports reports to express historical information, can be decomposed semantically into the head noun plus its modifiers. However, when ellipsis of the head noun is considered, a detailed corpus analysis of actual basketball game reports (Robin, 1995) shows that the forms wonflost three straight X, won~lost ~hree consecutive X, and won~lost three straight are regularly used, but the form *won~lost three consecutive is not. To achieve fluent output within the knowledge-based generation paradigm, lexical constraints of this type must be explicitly identified and represented.</Paragraph> <Paragraph position="7"> Both the above examples indicate the presence of (perhaps domain-dependent) lexical constraints that are not explainable on semantic grounds. In the case of prepositions in time adjuncts, the constraints are institutionalized in the language, but still nothing about the concept MONTH relates to the use of the preposition in with month names instead of, say, on (Herskovits, 1986). Furthermore, lexical constraints are not limited to the syntagmatic, interlexical constraints discussed above. For a generator to be able to produce sufficiently varied text, multiple renditions of the same concept must be accessible. Then, the generator is faced with paradigmatic choices among alternatives that without sufficient information may look equivalent. These choices include choices among synonyms (and near-synonyms), and choices among alternative syntactic realizations of a semantic role. However, it is possible that not all the alternatives actually share the same level of fluency or currency in the domain, even if they are rough paraphrases.</Paragraph> <Paragraph position="8"> In short, knowledge-based generators are faced with multiple, complex, and interacting lexical constraints, 3 and the integration of these constraints is a difficult problem, to the extent that the need for a different specialized architecture for lexical choice in each domain has been suggested (Danlos, 1986).</Paragraph> <Paragraph position="9"> However, compositional approaches to lexical choice have been successful whenever detailed representations of lexical constraints can be collected and entered into the lexicon (e.g., (Elhadad, 1993; Kukich et al., 1994)). Unfortunately, most of these constraints must be identified manually, and even when automatic methods for the acquisition of some types of this lexical knowledge exist (Smadja and McKeown, 1991), the extracted constraints must still be transformed to the generator's representation language by hand. This narrows the scope of the lexicon to a specific domain; the approach fails to scale up to unrestricted language. When the goal is domain-independent generation, we need to investigate methods for producing reasonable output in the absence of a large part of the information tradi3Including constraints not discussed above, originating for example from discourse structure, the user models for the speaker and hearer, and pragmatic needs.</Paragraph> <Paragraph position="10"> tionally available to the lexical chooser.</Paragraph> </Section> <Section position="5" start_page="253" end_page="253" type="metho"> <SectionTitle> 4 Current Solutions </SectionTitle> <Paragraph position="0"> Two strategies have been used in lexical choice when knowledge gaps exist: selection of a default, 4 and random choice among alternatives. Default choices have the advantage that they can be carefully chosen to mask knowledge gaps to some extent. For example, PENMAN defaults article selection to the and tense to present, so it will produce The dog chases the cat in the absence of definiteness information.</Paragraph> <Paragraph position="1"> Choosing the is a good tactic, because the works with mass, count, singular, plural, and occasionally even proper nouns, while a does not. On the down side, the's only outnumber a's and an's by about two-to-one (Knight and Chander, 1994), so guessing the will frequently be wrong. Another ploy is to give preference to nominalizations over clauses. This generates sentences like They plan the statement of the filing for bankruptcy, avoiding disasters like They plan that it is said to file for bankruptcy. Of course, we also miss out on sparkling renditions like They plan to say that they will file for bankruptcy. The alternative of randomized decisions offers increased paraphrasing power but also the risk of producing some non-fluent expressions; we could generate sentences like The dog chased a cat and A dog will chase the cat, but also An earth circles a sun.</Paragraph> <Paragraph position="2"> To sum up, defaults can help against knowledge gaps, but they take time to construct, limit paraphrasing power, and only return a mediocre level of quality. We seek methods that can do better.</Paragraph> </Section> <Section position="6" start_page="253" end_page="253" type="metho"> <SectionTitle> 5 Statistical Methods </SectionTitle> <Paragraph position="0"> Another approach to the problem of incomplete knowledge is the following. Suppose that according to our knowledge bases, input I may be rendered as sentence A or sentence B. If we had a device that could invoke new, easily obtainable knowledge to score the input/output pair (I, A) against (I, B), we could then choose A over B, or vice-versa. An alternative to this is to forget I and simply score A and B on the basis of fluency. This essentially assumes that our generator produces valid mappings from I, but may be unsure as to which is the correct rendition.</Paragraph> <Paragraph position="1"> At this point, we can make another approximation-modeling fluency as likelihood. In other words, how often have we seen A and B in the past? If A has occurred fifty times and B none at all, then we choose A. But ifA and B are long sentences, then probably we have seen neither. In that case, further approximations are required. For example, does A contain frequent three-word sequences? Does B? Following this reasoning, we are led into statistical language modeling. We built a language model for the English language by estimating bigram and trigram probabilities from a large collection of 46 million words of Wall Street Journal material. 5 We smoothed these estimates according to class membership for proper names and numbers, and according to an extended version of the enhanced Good-Turing method (Church and Gale, 1991) for the remaining words. The latter smoothing operation not only optimally regresses the probabilities of seen n-grams but also assigns a non-zero probability to all unseen n-grams which depends on how likely their component m-grams (m < n, i.e., words and bigrams) are. The resulting conditional probabilities are converted to log-likelihoods for reasons of numerical accuracy and used to estimate the overall probability P(S) of any English sentence S according to a Markov assumption, i.e.,</Paragraph> <Paragraph position="3"> Because both equations would assign lower and lower probabilities to longer sentences and we need to compare sentences of different lengths, a heuristic strictly increasing function of sentence length, f(l) = 0.5l, is added to the log-likelihood estimates.</Paragraph> </Section> <Section position="7" start_page="253" end_page="254" type="metho"> <SectionTitle> 6 First Experiment </SectionTitle> <Paragraph position="0"> Our first goal was to integrate the symbolic knowledge in the PENMAN system with the statistical knowledge in our language model. We took a semantic representation generated automatically from a short Japanese sentence. We then used PENMAN to generate 3,456 English sentences corresponding to the 3,456 (= 2'. 33) possible combinations of the values of seven binary and three ternary features that were unspecified in the semantic input. These features were relevant to the semantic representation but their values were not extractable from the Japanese sentence, and thus each of their combinations corresponded to a particular interpretation among the many possible in the presence of incompleteness in the semantic input. Specifying a feature forced PENMAN to make a particular linguistic decision. For example, adding (:identifiability-q t) forces the choice of determiner, while the :lex feature offers explicit control over the selection of open-class words. A literal translation of the input sentence was something like As for new company, there is plan to establish in February. Here are three randomly selected translations; note that the object of the &quot;establishing&quot; action is unspecified in the Japanese input, but PENMAN supplies a placeholder it when necessary, to ensure grammaticality: SAvailable from the ACL Data Collection Initiative, as CD ROM 1.</Paragraph> <Paragraph position="1"> A new company will have in mind that it is establishing it on February.</Paragraph> <Paragraph position="2"> The new company plans the launching on February.</Paragraph> <Paragraph position="3"> New companies will have as a goal the launching at February.</Paragraph> <Paragraph position="4"> We then ranked the 3,456 sentences using the bigram version of our statistical language model, with the hope that good renditions would come out on top. Here is an abridged list of outputs, log-likelihood scores heuristically corrected for length, and rankings: establishment in February. \[ -13.821412 \] 4 The new company plans to establish it in February. \[ -14. 121367 \] ..... ,,. .... ,.,. .... ,.,, ..... *o .... *o 60 The new companies plan the establishment on February. \[ -16.350112 \] 61 The new companies plan the launching in February. \[ -16.530286 \] ..., ..... ** ..... .,. ...... deg, ...... ,... 400 The new companies have as a goal the foundation at February. \[ -23.836556 \] 401 The new companies will have in mind to establish it at February. \[ -23.842337 \] ,,,. ..... deg, ..... ,,., ..... ,.. ..... .... While this experiment shows that statistical models can help make choices in generation, it fails as a computational strategy. Running PENMAN 3,456 times is expensive, but nothing compared to the cost of exhaustively exploring all combinations in larger input representations corresponding to sentences typically found in newspaper text. Twenty or thirty choice points typically multiply into millions or billions of potential sentences, and it is infeasible to generate them all independently. This leads us to consider other algorithms.</Paragraph> </Section> <Section position="8" start_page="254" end_page="256" type="metho"> <SectionTitle> 7 Many-Paths Generation </SectionTitle> <Paragraph position="0"> Instead of explicitly constructing all possible renditions of a semantic input and running PENMAN on them, we use a more efficient data structure and control algorithm to express possible ambiguities. The data structure is a word laltice--an acyclic state transition network with one start state, one final state, and transitions labeled by words. Word lattices are commonly used to model uncertainty in speech recognition (Waibel and Lee, 1990) and are well adapted for use with n-gram models.</Paragraph> <Paragraph position="1"> As we discussed in Section 3, a number of generation difficulties can be traced to the existence of constraints between words and phrases. Our generator operates on lexical islands, which do not interact with other words or concepts. 6 How to identify such islands is an important problem in NLG: grammatical rules (e.g., agreement) may help group words together, and collocational knowledge can also mark the boundaries of some lexical islands (e.g., nominal compounds). When no explicit information is present, we can resort to treating single words as lexical islands, essentially adopting a view of maximum compositionality. Then, we rely on the statistical model to correct this approximation, by identifying any violations of the compositionality principle on the fly during actual text generation.</Paragraph> <Paragraph position="2"> The type of the lexical islands and the manner by which they have been identified do not affect the way our generator processes them. Each island corresponds to an independent component of the final sentence. Each individual word in an island specifies a choice point in the search and causes the creation of a state in the lattice; all continuations of alternative lexicalizations for this island become paths that leave this state. Choices between alternative lexical islands for the same concept also become states in the lattice, with arcs leading to the sub-lattices corresponding to each island.</Paragraph> <Paragraph position="3"> Once the semantic input to the generator has been transformed to a word lattice, a search component identifies the N highest scoring paths from the start to the final state, according to our statistical language model. We use a version of the N-best algorithm (Chow and Schwartz, 1989), a Viterbi-style beam search algorithm that allows extraction of more than just the best scoring path. (Hatzivassiloglou and Knight, 1995) has more details on our search algorithm and the method we applied to estimate the parameters of the statistical model.</Paragraph> <Paragraph position="4"> Our approach differs from traditional top-down generation in the same way that top-down and bottom-up parsing differ. In top-down parsing, backtracking is employed to exhaustively examine the space of possible alternatives. Similarly, traditional control mechanisms in generation operate top-down, either deterministically (Meteer et al., 1987; Tomita and Nyberg, 1988; Penman, 1989) or by backtracking to previous choice points (Elhadad, 1993). This mode of operation can unnecessarily duplicate a lot of work at run time, unless sophisticated control directives are included in the search engine (Elhadad and Robin, 1992). In contrast, in bottom-up parsing and in our generation model, a special data structure (a chart or a lattice respectively) is used to efficiently encode multiple analyses, and to allow structure sharing between many alternatives, eliminating repeated search.</Paragraph> <Paragraph position="5"> knowledge, the word lattice will degenerate to a string, e.g.: the _ ~ large _/&quot;% Federal L/&quot;~ deficit ~,q.~ fell Suppose we are uncertain about definiteness and number. We can generate a lattice with eight paths instead of one: the deficit (* stands for the empty string.) But we run the risk that the n-gram model will pick a non-grammatical path like a large Federal deficits fell. So we can produce the following lattice instead: large -~J-J~ Federal ~) deficits a (~ In this case, we use knowledge about agreement to constrain the choices offered to the statistical model, from eight paths down to six. Notice that the sixpath lattice has more states and is more complex than the eight-path one. Also, the n-gram length is critical. When long-distance features control grammaticality, we cannot rely on the statistical model. Fortunately, long-distance features like agreement are among the first that go into any symbolic generator. This is our first example of how symbolic and statistical knowledge sources contain complementary information, which is why there is a significant advantage to combining them.</Paragraph> <Paragraph position="6"> Now we need an algorithm for converting generator inputs into word lattices. Our approach is to assign word lattices to each fragment of the input, in a bottom-up compositional fashion. For example, consider the following semantic input, which is written in the PENMAN-style Sentence Plan Language (SPL) (Penman, 1989), with concepts drawn from the SENSUS ontology (Knight and Luk, 1994), and may be rendered in English as It is easy for Americans to obtain guns:</Paragraph> <Paragraph position="8"> We process semantic subexpressions in a bottom-up order, e.g., A2, G, P, ~., and finally A. The grammar assigns what we call an e-structure to each subexpression. An e-structure consists of a list of distinct syntactic categories, paired with English word lattices: (<syn, lat>, <syn, lat>, ...). As we climb up the input expression, the grammar glues together various word lattices. The grammar is organized around semantic feature patterns rather than English syntax--rather than having one S -> NP-VP rule with many semantic triggers, we have one AGENT-PATIENT rule with many English renderings.</Paragraph> <Paragraph position="9"> Here is a sample rule:</Paragraph> <Paragraph position="11"> Given an input semantic pattern, we locate the first grammar rule that matches it, i.e., a rule whose left-hand-side features except : rest are contained in the input pattern. The feature :rest is our mechanism for allowing partial matchings between rules and semantic inputs. Any input features that are not matched by the selected rule are collected in :rest, and recursively matched against other grammar rules.</Paragraph> <Paragraph position="12"> For the remaining features, we compute new e-structures using the rule's right-hand side. In this example, the rule gives four ways to make a syntactic S, two ways to make an infinitive, and one way to make an NP. Corresponding word lattices are built out of elements that include: * (seq x y ... )--create a lattice by sequentially gluing together the lattices x, y, and ...</Paragraph> <Paragraph position="13"> * (or x y ... )--create a lattice by branching on x, y, and ...</Paragraph> <Paragraph position="14"> * (wrd w)--create the smallest lattice: a single arc labeled with the word w.</Paragraph> <Paragraph position="15"> * (xn <syn>)--if the e-structure for the semantic material under the xn feature contains <syn, lat>, return the word lattice lat; otherwise fail.</Paragraph> <Paragraph position="16"> Any failure inside an alternative right-hand side of a rule causes that alternative to fail and be ignored. When all alternatives have been processed, results are collected into a new e-structure. If two or more word lattices can be created from one rule, they are merged with a final or.</Paragraph> <Paragraph position="17"> Because our grammar is organized around semantic patterns, it nicely concentrates all of the material required to build word lattices. Unfortunately, it forces us to restate the same syntactic constraint in many places. A second problem is that sequential composition does not allow us to insert new words inside old lattices, as needed to generate sentences like John looked it up. We have extended our notation to allow such constructions, but the full solution is to move to a unification-based framework, in which e-structures are replaced by arbitrary feature structures with syn, sere, and lat fields. Of course, this requires extremely efficient handling of the disjunctions inherent in large word lattices.</Paragraph> </Section> <Section position="9" start_page="256" end_page="256" type="metho"> <SectionTitle> 8 Results </SectionTitle> <Paragraph position="0"> We implemented a medium-sized grammar of English based on the ideas of the previous section, for use in experiments and in the JAPANGLOSS machine translation system. The system converts a semantic input into a word lattice, sending the result to one of three sentence extraction programs: * RANDOM--follows a random path through the lattice.</Paragraph> <Paragraph position="1"> * DEFAULT--follows the topmost path in the lattice. All alternatives are ordered by the grammar writer, so that the topmost lattice path corresponds to various defaults. In our grammar, defaults include singular noun phrases, the definite article, nominal direct objects, in versus on, active voice, that versus who, the alphabetically first synonym for open-class words, etc. * STATISTICAL--a sentence extractor based on word bigram probabilities, as described in Sections 5 and 7.</Paragraph> <Paragraph position="2"> For evaluation, we compare English outputs from these three sources. We also look at lattice properties and execution speed. Space limitations prevent us from tracing the generation of many long sentences--we show instead a few short ones. Note that the sample sentences shown for the RANDOM extraction model are not of the quality that would normally be expected from a knowledge-based generator, because of the high degree of ambiguity (unspecified features) in our semantic input. This incompleteness can be in turn attributed in part to the lack of such information in Japanese source text and in part to our own desire to find out how much of the ambiguity can be automatically resolved with our statistical model.</Paragraph> </Section> <Section position="10" start_page="256" end_page="256" type="metho"> <SectionTitle> RANDOM EXTRACTION </SectionTitle> <Paragraph position="0"> Her incriminates for him to thieve an automobiles.</Paragraph> <Paragraph position="1"> She am accusing for him to steal autos. She impeach that him thieve that there was the auto.</Paragraph> </Section> <Section position="11" start_page="256" end_page="256" type="metho"> <SectionTitle> DEFAULT EXTRACTION </SectionTitle> <Paragraph position="0"> She accuses that he steals the auto.</Paragraph> </Section> <Section position="12" start_page="256" end_page="256" type="metho"> <SectionTitle> STATISTICAL BIGRAM EXTRACTION </SectionTitle> <Paragraph position="0"/> </Section> <Section position="13" start_page="256" end_page="256" type="metho"> <SectionTitle> RANDOM EXTRACTION </SectionTitle> <Paragraph position="0"> Procurals of guns by Americans were easiness.</Paragraph> <Paragraph position="1"> A procurements of guns by a Americans will be an effortlessness.</Paragraph> <Paragraph position="2"> It is easy that Americans procure that there is gun.</Paragraph> </Section> <Section position="14" start_page="256" end_page="256" type="metho"> <SectionTitle> DEFAULT EXTRACTION </SectionTitle> <Paragraph position="0"> The procural of the gun by the American is easy.</Paragraph> </Section> <Section position="15" start_page="256" end_page="257" type="metho"> <SectionTitle> STATISTICAL BIGRAM EXTRACTION </SectionTitle> <Paragraph position="0"/> </Section> <Section position="16" start_page="257" end_page="257" type="metho"> <SectionTitle> RANDOM EXTRACTION </SectionTitle> <Paragraph position="0"> You may be obliged to eat that there was the poulet.</Paragraph> <Paragraph position="1"> An consumptions of poulet by you may be the requirements.</Paragraph> <Paragraph position="2"> It might be the requirement that the chicken are eaten by you.</Paragraph> </Section> <Section position="17" start_page="257" end_page="257" type="metho"> <SectionTitle> DEFAULT EXTRACTION </SectionTitle> <Paragraph position="0"> That the consumption of the chicken by you is obligatory is possible.</Paragraph> </Section> <Section position="18" start_page="257" end_page="257" type="metho"> <SectionTitle> STATISTICAL BIGRAM EXTRACTION </SectionTitle> <Paragraph position="0"> 1 You may have to eat chicken.</Paragraph> <Paragraph position="1"> 2 You might have to eat chicken.</Paragraph> <Paragraph position="2"> 3 You may be required to eat chicken. 4 You might be required to eat chicken. 5 You may be obliged to eat chicken.</Paragraph> <Paragraph position="3"> TOTAL EXECUTION TIME: 58.78 CPU seconds. A final (abbreviated) example comes from interlingua expressions produced by the semantic analyzer of JAPANGLOSS, involving long sentences characteristic of newspaper text. Note that although the lattice is not much larger than in the previous examples, it now encodes many more paths.</Paragraph> </Section> <Section position="19" start_page="257" end_page="257" type="metho"> <SectionTitle> LATTICE CREATED </SectionTitle> <Paragraph position="0"/> </Section> <Section position="20" start_page="257" end_page="257" type="metho"> <SectionTitle> RANDOM EXTRACTION </SectionTitle> <Paragraph position="0"> Subsidiary on an Japan's of Perkin Elmer Co.'s hold a stocks's majority, and as for a beginnings, productia of an stepper and an dry etching devices which were applied for an constructia of microcircuit microchip was planed.</Paragraph> </Section> <Section position="21" start_page="257" end_page="257" type="metho"> <SectionTitle> STATISTICAL BIGRAM EXTRACTION </SectionTitle> <Paragraph position="0"> Perkin Elmer Co.'s Japanese subsidiary holds majority of stocks, and as for the beginning, production of steppers and dry etching devices that will be used to construct microcircuit chips are planned.</Paragraph> <Paragraph position="1"> TOTAL EXECUTION TIME: 106.28 CPU seconds.</Paragraph> </Section> class="xml-element"></Paper>