File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/n04-1031_relat.xml
Size: 3,413 bytes
Last Modified: 2025-10-06 14:15:44
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1031"> <Title>Paraphrasing Predicates from Written Language to Spoken Language Using the Web</Title> <Section position="3" start_page="2" end_page="2" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Paraphrases are different expressions which convey the same or almost the same meaning. However, there are few paraphrases that have exactly the same meaning, and almost all have subtle differences such as style or formality etc. Such a difference is called a connotational difference. This paper addresses one of the connotational differences, that is, the difference of whether an expression is suitable or unsuitable for spoken language.</Paragraph> <Paragraph position="1"> Although a large number of studies have been made on learning paraphrases, for example (Barzilay and Lee, 2003), there are only a few studies which address the connotational difference of paraphrases. One of the studies is a series of works by Edmonds et al. and Inkpen et al (Edmonds and Hirst, 2002; Inkpen and Hirst, 2001).</Paragraph> <Paragraph position="2"> Edmonds et al. proposed a computational model which represents the connotational difference, and Inkpen et al. showed that the parameters of the model can be learned from a synonym dictionary. However, it is doubtful whether the connotational difference between paraphrases is sufficiently described in such a lexical resource. On the other hand, Inui et al. discussed read- null Note that this paper deals with Japanese.</Paragraph> <Paragraph position="3"> A predicate is a verb or an adjective.</Paragraph> <Paragraph position="4"> ability, which is one of the connotational differences, and proposed a method of learning readability ranking model of paraphrases from a tagged corpus (Inui and Yamamoto, 2001). The tagged corpus was built as follows: a large amount of paraphrase pairs were prepared and annotators tagged them according to their readability. However, they focused only on syntactic paraphrases. This paper deals with lexical paraphrases.</Paragraph> <Paragraph position="5"> There are several works that try to learn paraphrase pairs from parallel or comparable corpora (Barzilay and McKeown, 2001; Shinyama et al., 2002; Barzilay and Lee, 2003; Pang et al., 2003). In our work, paraphrase pairs are not learned from corpora but learned from a dictionary. Our corpora are neither parallel nor comparable, and are used to distinguish UES and SES.</Paragraph> <Paragraph position="6"> There are several studies that compare two corpora which have different styles, for example, written and spoken corpora or British and American English corpora, and try to find expressions unique to either of the styles (Kilgarriff, 2001). However, those studies did not deal with paraphrases.</Paragraph> <Paragraph position="7"> Bulyko et al. also collected spoken language corpora from the Web (Bulyko et al., 2003). The method of Bulyko et al. used N-grams in a training corpus and is different from ours (the detail of our method is described in Section 4).</Paragraph> <Paragraph position="8"> In respect of automatically collecting corpora which have a desired style, Tambouratzis et al. proposed a method of dividing Modern Greek corpus into Demokiti and Katharevoua, which are variations of Modern Greek (Tambouratzis et al., 2000).</Paragraph> </Section> class="xml-element"></Paper>