File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0402_evalu.xml

Size: 7,422 bytes

Last Modified: 2025-10-06 13:59:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0402">
  <Title>Paraphrasing of Japanese Light-verb Constructions Based on Lexical Conceptual Structure Atsushi Fujita + Kentaro Furihata + Kentaro Inui +</Title>
  <Section position="8" start_page="8" end_page="9" type="evalu">
    <SectionTitle>
5 Experiment
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="8" end_page="8" type="sub_section">
      <SectionTitle>
5.1 Paraphrase generation and evaluation
</SectionTitle>
      <Paragraph position="0"> To empirically evaluate our paraphrasing model and the LCSdic, and to clarify the remaining problems, we analyzed a set of automatically generated paraphrase candidates. The sentences used in the experiment were collected in the following way: Step 1. From the 876,101 types of triplet &lt;n,c,v&gt; collected in Section 3.2, 23,608 types of &lt;n,c,v&gt; were extracted, whose components, n and &lt;c,v&gt; , are listed in the LCSdic.</Paragraph>
      <Paragraph position="1"> Step 2. For each of the 245 most frequent &lt;n,c,v&gt; , the 3 most frequent simple clauses including the &lt;n,c,v&gt; were extracted from the corpus from which &lt;n,c,v&gt; s were extracted in Section 3.2. As a result, we collected 735 sentences. null Step 3. We input these 735 sentences into our paraphrasing model, and then automatically generated paraphrase candidates. When more than one LCS is assigned to a verb in the LCSdic due to its polysemy or ergative verb such as &amp;quot;kaifuku-suru (recover),&amp;quot; our model generates all the possible paraphrase candidates. As a result, 825 paraphrase candidates, that is, at least one for each input, were generated.</Paragraph>
      <Paragraph position="2">  We manually classified the resultant 825 paraphrase candidates into 621 correct and 198 erroneous candidates. The remaining 6 candidates were not classified. The precision of the paraphrase generation was 75.8% (621 / 819).</Paragraph>
    </Section>
    <Section position="2" start_page="8" end_page="9" type="sub_section">
      <SectionTitle>
5.2 Error analysis
</SectionTitle>
      <Paragraph position="0"> To clarify the cause of the erroneous paraphrases, we manually classified 198 erroneous paraphrase candidates. Table 3 lists the error sources.</Paragraph>
      <Paragraph position="1">  The experiment came close to confirming that the right-first matching algorithm in our paraphrasing model operates correctly. Unfortunately, the matching rules produced some erroneous paraphrases in LCS transformation.</Paragraph>
      <Paragraph position="2"> Errors in predicate matching: To paraphrase (10s) below, &amp;quot;CONTROL&amp;quot; in LCS  must be matched with &amp;quot;CONTROL&amp;quot; in LCS  x was incorrectly matched with z' and x' remained empty. The desired form of LCS  [[chief ]x' CONTROL [y' MOVE FROM [subordinate] TO [FILLED]]] This error was caused by the mis-matching of &amp;quot;CONTROL&amp;quot; with &amp;quot;MOVE FROM TO.&amp;quot; Although we regard some predicates as being in the same classes as those described in Section 4.2.1, these need to be considered carefully. In particular &amp;quot;MOVE FROM TO&amp;quot; needs further investigation because it causes many errors whenever it has the &amp;quot;FILLED&amp;quot; argument.</Paragraph>
      <Paragraph position="3"> Errors in argument matching: Even if all the predicates are matched properly, there would still be a chance of errors being caused by incorrect argument matching. With the present algorithm, z can be matched with y' if and only if z' contains &amp;quot;FILLED.&amp;quot; In the case of (12), however, z has to be matched with y', even though z' is empty. The desired form of LCS  In contrast to dative cases in English, in Japanese, the dative case has ambiguity. That is, it can be a complement to the verb or an adjunct  . However, since LCS is not capable of determining whether the case is a complement or an adjunct, z is occasionally incorrectly filled with an adjunct. For example, &amp;quot;medo-ni&amp;quot; in (14s) should not fill z, because it acts as an adverb, even though it consists of a noun, &amp;quot;medo (prospect)&amp;quot; and a case particle for the dative. We found that 78 erroneous candidates constitute this most dominant type of errors.</Paragraph>
      <Paragraph position="4">  The ambiguity of dative cases in Japanese has been discussed in the literature of linguistics and some natural language processing tasks (Muraki, 1991). To date, however, a practical compliment/adjunct classifier has not been established. We plan to address this topic in our future research. Preliminary investigation revealed that only certain groups of nouns can constitute both compliments and adjuncts according to the governing verb. Therefore, generally whether a word acts as a complement is determined without combining it with the verb.</Paragraph>
      <Paragraph position="5">  (Muraki, 1991) classifies dative cases into 11 thematic roles that can be regarded as complements. In contrast, there is no typology of dative cases that act as adjuncts.  In our model, we assume that a triplet &lt;n,c,v&gt; consisting of a nominalized verb n and a light-verb tuple &lt;c,v&gt; from our vocabulary lists (see Section 3.2) always act as an LVC. However, not only the triplet itself but also its context sometimes affects whether the given triplet can be paraphrased. For example, we regard &amp;quot;imi-ga aru&amp;quot; as an LVC, because the nominalized verb &amp;quot;imi&amp;quot; and the tuple &lt;&amp;quot;ga&amp;quot;, &amp;quot;aru&amp;quot;&gt; appear in the vocabulary lists. However, the &lt;n,c,v&gt; in (15s) does not act as an LVC, while the same triplet in (16s) does.</Paragraph>
      <Paragraph position="6">  imi-ga aru.</Paragraph>
      <Paragraph position="7"> meaning-NOM exist-PRES &amp;quot;kennel&amp;quot; has the meaning of doghouse. t. &amp;quot;kennel&amp;quot;-wa inugoya-o imi-suru.</Paragraph>
      <Paragraph position="8"> &amp;quot;kennel&amp;quot;-TOP doghouse-ACC mean-ACT, PRES &amp;quot;kennel&amp;quot; means doghouse.</Paragraph>
      <Paragraph position="9"> The above difference is caused by the polysemy of the nominalized verb &amp;quot;imi&amp;quot; that denotes &amp;quot;worth&amp;quot; in the context of (15s), but &amp;quot;meaning&amp;quot; in (16s). Although incorporating word sense disambiguation using contextual clues complicates our model, in fact only a limited number of nominalized verbs are polysemous. We therefore expect that we can list them up and use this as a trigger for making a decision as to whether we need to take the context into account. Namely, given a &lt;n,c,v&gt; , we would be able to classify it into (a) a main verb phrase, (b) a delicate case in terms of the dependence of its context, and (c) an LVC.</Paragraph>
      <Paragraph position="10"> We can adopt a different approach to avoiding incorrect paraphrase generation. As described in Section 5.1, our model generates all the possible paraphrase candidates when more than one LCS is assigned to a verb. Similarly, our approach can be extended to (i) over-generate paraphrase candidates by considering the polysemy of not only assigned LCS types, but also that of nominalized verbs (see (15s) and (16s)) and whether the given &lt;n,c,v&gt; is an LVC, and (ii) revise or reject the incorrect candidates by using handcrafted solid rules or statistical language models.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML