File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/01/p01-1008_relat.xml
Size: 2,884 bytes
Last Modified: 2025-10-06 14:15:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1008"> <Title>Extracting Paraphrases from a Parallel Corpus</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work on Paraphrasing </SectionTitle> <Paragraph position="0"> Many NLP applications are required to deal with the unlimited variety of human language in expressing the same information. So far, three major approaches of collecting paraphrases have emerged: manual collection, utilization of existing lexical resources and corpus-based extraction of similar words.</Paragraph> <Paragraph position="1"> Manual collection of paraphrases is usually used in generation (Iordanskaja et al., 1991; Robin, 1994). Paraphrasing is an inevitable part of any generation task, because a semantic concept can be realized in many different ways.</Paragraph> <Paragraph position="2"> Knowledge of possible concept verbalizations can help to generate a text which best fits existing syntactic and pragmatic constraints. Traditionally, alternative verbalizations are derived from a manual corpus analysis, and are, therefore, application specific.</Paragraph> <Paragraph position="3"> The second approach -- utilization of existing lexical resources, such as WordNet -- overcomes the scalability problem associated with an application specific collection of paraphrases. Lexical resources are used in statistical generation, summarization and question-answering. The question here is what type of WordNet relations can be considered as paraphrases. In some applications, only synonyms are considered as paraphrases (Langkilde and Knight, 1998); in others, looser definitions are used (Barzilay and Elhadad, 1997). These definitions are valid in the context of particular applications; however, in general, the correspondence between paraphrasing and types of lexical relations is not clear. The same question arises with automatically constructed thesauri (Pereira et al., 1993; Lin, 1998). While the extracted pairs are indeed similar, they are not paraphrases. For example, while &quot;dog&quot; and &quot;cat&quot; are recognized as the most similar concepts by the method described in (Lin, 1998), it is hard to imagine a context in which these words would be interchangeable.</Paragraph> <Paragraph position="4"> The first attempt to derive paraphrasing rules from corpora was undertaken by (Jacquemin et al., 1997), who investigated morphological and syntactic variants of technical terms. While these rules achieve high accuracy in identifying term paraphrases, the techniques used have not been extended to other types of paraphrasing yet. Statistical techniques were also successfully used by (Lapata, 2001) to identify paraphrases of adjective-noun phrases. In contrast, our method is not limited to a particular paraphrase type.</Paragraph> </Section> class="xml-element"></Paper>