File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-1116_concl.xml

Size: 3,693 bytes

Last Modified: 2025-10-06 13:58:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1116">
  <Title>Generation that Exploits Corpus-Based Statistical Knowledge</Title>
  <Section position="10" start_page="708" end_page="709" type="concl">
    <SectionTitle>
8 Discussion
</SectionTitle>
    <Paragraph position="0"> We have presented a new generation grammar formalism capable of mapping meanings onto word lattices. It includes novel mechanisms for constructing and combining word lattices, and for re-writing meaning representations to handle a broad range of linguistic phenomena. The grammar accepts inputs along a continuum of semantic depth, requiring only a minimal amount of syntactic detail, making it attractive for a variety of purposes.</Paragraph>
    <Paragraph position="1"> Nitrogen's grammar is organized around semantic input patterns rather than the syntax of English. This distinguishes it from both unification grammar (Elhadad, 1993a; Shieber et al., 1989) and systemicnetwork grammar (Penman, 1989). Meanings can  be expressed directly, or else be recast and recycled back through the generator. This recycling ultimately allows syntactic constraints to be localized, even though the grammar is not organized around English syntax.</Paragraph>
    <Paragraph position="2"> Nitrogen's algorithm operates bottom-up, efficiently encoding multiple analyses in a lattice data structure to allow structure sharing, analogous to the way a chart is used in bottom-up parsing. In contrast, traditional generation control mechanisms work top-down, either deterministically (Meteer et al., 1987; Penman, 1989) or by backtracking to previous choice points (Elhadad, 1993b). This unnecessarily duplicates work at run time, unless sophisticated control directives are included in the search engine (Elhadad and Robin, 1992). Recently, (Kay, 1996) has explored a bottom-up approach to generation as well, using a chart rather than a word lattice. Nitrogen's generation is robust and scalable. It can generate output even for unexpected or incomplete input, and is designed for broad coverage.</Paragraph>
    <Paragraph position="3"> It does not require the detailed, difficult-to-obtain knowledge bases that other NLG systems require, since it relies instead on corpus-based statistics to make a wide variety of linguistic decisions. Currently the quality of the output is limited by the use of only word bigram statistical information, which cannot handle long-distance agreement, or distinguish likely collocations from unlikely grammatical structure. However, we plan to remedy these problems by using statistical information extracted from the Penn Treebank corpus (Marcus et al., 1994) to rank tagged lattices and parse forests.</Paragraph>
    <Paragraph position="4"> Nitrogen's rule matching is much less expensive than graph unification, and lattices generated for sub-AMIRs are cached and reused in subsequent references. The semantic roles used in the grammar formalism cover most common syntactic phenomena, though our grammar does not yet generate questions, or infer pronouns from explicit coreference. Nitrogen has been used extensively as part of a semantics-based Japanese-English MT system (Knight et al., 1995). Japanese analysis provides AMR's, which Nitrogen transforms into word lattices on the order of hundreds of nodes and thousands of arcs. These lattices compactly encode a number of syntactic variants that usually reach into the trillions and beyond. Most of these are somewhat ungrammatical or awkward, yet the statistical extractor rather successfully narrows them down to the top N best paths. An online demo is available at http://www.isi .edu/natural-language/mt/nitrogen/</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML