File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-1031_intro.xml

Size: 4,333 bytes

Last Modified: 2025-10-06 14:02:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1031">
  <Title>Paraphrasing Predicates from Written Language to Spoken Language Using the Web</Title>
  <Section position="2" start_page="0" end_page="2" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Information can be provided in various forms, and one of them is speech form. Speech form is familiar to humans, and can convey information effectively (Nadamoto et al., 2001; Hayashi et al., 1999). However, little electronic information is provided in speech form so far. On the other hand, there is a lot of information in text form, and it can be transformed into speech by a speech synthesis.</Paragraph>
    <Paragraph position="1"> Therefore, a lot of attention has been given to applications which uses speech synthesis, for example (Fukuhara et al., 2001).</Paragraph>
    <Paragraph position="2"> In order to enhance such applications, two problems need to be resolved. The first is that current speech synthesis technology is still insufficient and many applications often produce speech with unnatural accents and intonations. The second one is that there are a lot of differences between expressions used in written language and spoken language. For example, Ohishi indicated that difficult words and compound nouns are more often used in written language than in spoken language (Ohishi, 1970).</Paragraph>
    <Paragraph position="3"> Therefore, the applications are prone to produce unnatural speech, if their input is in written language.</Paragraph>
    <Paragraph position="4"> Although the first problem is well-known, little attention has been given to the second one. The reason why the second problem arises is that the input text contains Unsuitable Expressions for Spoken language (UES). Therefore, the problem can be resolved by paraphrasing UES into Suitable Expression for Spoken language (SES).</Paragraph>
    <Paragraph position="5"> This is a new application of paraphrasing. There are no similar attempts, although a variety of applications have been discussed so far, for example question-answering (Lin and Pantel, 2001; Hermjakob et al., 2002; Duclaye  and Yvon, 2003) or text-simplification (Inui et al., 2003). (1) Written (2) Spoken (3) Unnatural  spoken language, and (3) unnatural expressions. The overlap between two circles represents expressions used both in written language and spoken language. UES is the shaded portion: unnatural expressions, and expressions used only in written language. SES is the nonshaded portion. The arrows represent paraphrasing UES into SES, and other paraphrasing is represented by broken arrows. Paraphrasing unnatural expressions is not considered, since such expressions are not included in the input text. The reason why unnatural expressions are taken into consideration is that paraphrasing into such expressions should be avoided.</Paragraph>
    <Paragraph position="6"> In order to paraphrase UES into SES, this paper proposes a method of learning paraphrase pairs in the form of 'UES AX SES'. The key notion of the method is to distinguish UES and SES based on the occurrence probability in written and spoken language corpora which are automatically collected from the Web. The procedure of the method is as follows:  (step 1) Paraphrase pairs of predicates  are learned from a dictionary using a method proposed by (Kaji et al., 2002).</Paragraph>
    <Paragraph position="7"> (step 2) Written and spoken language corpora are automatically collected from the Web.</Paragraph>
    <Paragraph position="8"> (step 3) From the paraphrase pairs learned in step 1, those in the form of 'UESAXSES' are selected using the corpora.</Paragraph>
    <Paragraph position="9"> This paper deals with only paraphrase pairs of predicates, although UES includes not only predicates but also other categories such as nouns.</Paragraph>
    <Paragraph position="10"> This paper is organized as follows. In Section 2 related works are illustrated. Section 3 summarizes the method of Kaji et al. In Section 4, we describe the method of collecting corpora form the Web and report the experimental result. In Section 5, we describe the method of selecting suitable paraphrases pairs and the experimental result. Our future work is described in Section 6, and we conclude in Section 7.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML