File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1099_intro.xml
Size: 3,392 bytes
Last Modified: 2025-10-06 14:06:33
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1099"> <Title>Combining Multiple, Large-Scale Resources in a Reusable Lexicon for Natural Language Generation</Title> <Section position="2" start_page="0" end_page="607" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Every generation system needs a lexicon, and in almost every case, it is acquired anew. Few efforts in building a rich, large-scale, and reusable generation lexicon have been presented in literature. Most generation systems are still supported by a small system lexicon, with limited entries and hand-coded knowledge. Although such lexicons are reported to be sufficient for the specific domain in which a generation system works, there are some obvious deficiencies: (1) Hand-coding is time and labor intensive, and introduction of errors is likely. (2) Even though some knowledge, such as syntactic structures for a verb, is domain-independent, often it is re-encoded each time a new application is under development. (3) Hand-coding seriously restricts the scale and expressive power of generation systems. As natural language generation is used in more ambitious applications, this situation calls for an improvement.</Paragraph> <Paragraph position="1"> Generally, existing linguistic resources are not suitable to use for generation directly. First, most large-scale linguistic resources so far were built for language interpretation applications.</Paragraph> <Paragraph position="2"> They are indexed by words, whereas, an ideal generation lexicon should be indexed by the semantic concepts to be conveyed, because the input of a generation system is at semantic level and the processing during generation is based on semantic concepts, and because the mapping in the generation process is from concepts to words. Second, the knowledge needed for generation exists in a number of different resources, with each resource containing a particular type of information; they can not currently be used simultaneously in a system.</Paragraph> <Paragraph position="3"> In this paper, we present work in building a rich, large-scale, and reusable lexicon for generation by combining multiple, heterogeneous linguistic resources. The resulting lexicon contains syntactic, semantic, and lexical knowledge, indexed by senses of words as required by generation, including: A complete list of syntactic subcategorizations for each sense of a verb to support surface realization.</Paragraph> <Paragraph position="4"> A large variety of transitivity alternations for each sense of a verb to support paraphrasing. null Frequency of lexical items and verb subcategorizations and also selectional constraints derived from a corpus to support lexical choice.</Paragraph> <Paragraph position="5"> Rich lexical relations between lexical concepts, including hyponymy, antonymy, and so on, to support lexical choice.</Paragraph> <Paragraph position="6"> The construction of the lexicon is semiautomatic, and the lexicon has been used for lexical choice and realization in a practical generation system. In Section 2, we describe the process to build the generation lexicon by combining existing linguistic resources. In Section 3, we show the application of the lexicon by actually using it in a generation system. Finally, we present conclusions and future work.</Paragraph> </Section> class="xml-element"></Paper>