File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0718_intro.xml
Size: 5,881 bytes
Last Modified: 2025-10-06 14:06:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0718"> <Title>Usage of WordNet in Natural Language Generation</Title> <Section position="2" start_page="0" end_page="128" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> WordNet (Miller et al., 1990) has been successfully applied in many human language related applications, such as word sense disambiguation, information retrieval, and text categorization; yet generation is among the fields in which the application of WordNet has rarely been explored. We demonstrate in this paper that, as a rich semantic net, WordNet is indeed a valuable resource for generation. We propose a corpus based technique to adapt WordNet to a specific domain and present experiments in the basketball domain. We also discuss possible ways to use WordNet knowledge in the generation task and to augment WordNet with other types of knowledge.</Paragraph> <Paragraph position="1"> In Section 2, we answer the question why WordNet is useful for generation. In Section 3, we discuss problems to be solved to successfully apply WordNet to generation. In Section 4, we present techniques to solve the problems.</Paragraph> <Paragraph position="2"> Finally, we present future work and conclude.</Paragraph> <Paragraph position="3"> 2 Why a valuable resource for generation? WordNet is a potentially valuable resource for generation for four reasons. First, Synonym sets in WordNet (synsets) can possibly provide large amount of lexical paraphrases. One major shortcoming of current generation systems is its poor expressive capability. Usually none or very limited paraphrases are provided by a generation system due to the cost of hand-coding in the lexicon. Synsets, however, provide the possibility to generate lexical paraphrases without tedious hand-coding in individual systems. For example, for the output sentence &quot;Jordan hit a jumper&quot;, we can generate the paraphrase &quot;Jordan hit a jump shot&quot; simply by replacing the word jumper in the sentence with its synonym jump shot listed in WordNet synset. Whereas, such replacements are not always appropriate.</Paragraph> <Paragraph position="4"> For example, tally and rack up are listed as synonyms of the word score, although the sentence like &quot;Jordan scored 22 points&quot; are common in newspaper sport reports, sentences like &quot;Jordan tallied 22 points&quot; or &quot;Jordan racked up 22 points&quot; seldomly occur. To successfully apply WordNet for paraphrasing, we need to develop techniques which can correctly identify interchangeability of synonyms in a certain context. Secondly, as a semantic net linked by lexical relations, WordNet can be used for lexicalization in generation. Lexicalization maps the semantic concepts to be conveyed to appropriate words. Usually it is achieved by step-wise refinements based on syntactic, semantic, and pragmatic constraints while traversing a semantic net (Danlos, 1987). Currently most generation systems acquire their semantic net for lexicalization by building their own, while WordNet provides the possibility to acquire such knowledge automatically from an existing resource.</Paragraph> <Paragraph position="6"> Next, WordNet ontology can be used for building domain ontology. Most current generation systems manually build their domain ontology from scratch. The process is time and labor intensive, and introduction of errors is likely.</Paragraph> <Paragraph position="7"> WordNet ontology has a wide coverage, so can possibly be used as a basis for building domain ontology. The problem to be solved is how to adapt it to a specific domain.</Paragraph> <Paragraph position="8"> Finally, WordNet is indexed by concepts rather than merely by words maims it especially desirable for the generation task. Unlike language interpretation, generation has as inputs the semantic concepts to be conveyed and maps them to appropriate words. Thus an ideal generation lexicon should be indexed by semantic concepts rather than words. Most available linguistic resources are not suitable to use in generation directly due to their lack of mapping between concepts and words. WordNet is by far the richest and largest database among all resources that are indexed by concepts. Other relatively large and concept=based resources such as PENMAN ontology (Bateman et al., 1990) usually include only hyponymy relations compared to the rich types of lexical relations presented in WordNet.</Paragraph> <Paragraph position="9"> Once WordNet is tailored to the domain, the main problem is how to use its knowledge in the generation process. As we mentioned in section 2, WordNet can potentially benefit generation in three aspects: producing large amount of lexicai paraphrases, providing the semantic net for lexicalization, and providing a basis for building domain ontology. A number of problems to be solved at this stage, including: (a)while using synset for producing paraphrases, how to determine whether two synonyms are interchangeable in a particular context? (b)while WordNet can provide the semantic net for lexicalization, the constraints to choose a particular node during lexical choice still need to be established.</Paragraph> <Paragraph position="10"> (c) How to use the WordNet ontology? The last problem is relevant to augmenting WordNet with other types of information. Although WordNet is a rich lexical database, it can not contain all types of information that are needed for generation, for example, syntactic information in WordNet is weak. It is then worthwhile to investigate the possibility to combine it with other resources.</Paragraph> <Paragraph position="11"> In the following section, we address the above issues in order and present our experiment results in the basketball domain.</Paragraph> </Section> class="xml-element"></Paper>