File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1611_evalu.xml

Size: 8,713 bytes

Last Modified: 2025-10-06 13:59:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1611">
  <Title>Paraphrasing Japanese noun phrases using character-based indexing</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Data and preprocessing
</SectionTitle>
      <Paragraph position="0"> As input noun phrases, we used 53 queries excerpted from Japanese IR test collection BMIR-J21 (Kitani et al., 1998) based on the following criteria.</Paragraph>
      <Paragraph position="1"> * A query has two or more index terms.</Paragraph>
      <Paragraph position="2"> It is less likely to retrieve proper paraphrases with only one index term, since we adopt character-based indexing.</Paragraph>
      <Paragraph position="3"> * A query does not contain proper names.</Paragraph>
      <Paragraph position="4"> It is generally difficult to paraphrase proper names. We do not deal with proper name paraphrasing.</Paragraph>
      <Paragraph position="5"> * A query contains at most one Katakana word or number.</Paragraph>
      <Paragraph position="6"> The proposed method utilize characteristics of Kanzi characters, ideograms. It is obvious that the method does not work well for Kanzi -poor expressions. We searched paraphrases in three years worth of newspaper articles (Mainichi Shimbun) from 1991 to 1993. As described in section 2, each article is segmented into passages at punctuation marks and symbols. These passages are assigned a unique identifier and indexed, then stored in the GETA retrieval engine (IPA, 2003). We used the JUMAN morphological analyzer (Kurohashi and Nagao, 1998) for indexing the passages. As a result of preprocessing described above, we obtained 6,589,537 passages to retrieve. The average number of indexes of a passage was 12.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Qualitative evaluation
</SectionTitle>
      <Paragraph position="0"> Out of 53 input noun phrases, no paraphrase was obtained for 7 cases. Output paraphrases could be classified into the following categories.</Paragraph>
      <Paragraph position="1">  1BMIR-2 contains 60 queries.</Paragraph>
      <Paragraph position="2"> (1) The paraphrase has the same meaning as that of the input noun phrase.</Paragraph>
      <Paragraph position="3"> e.g.Fw(damage by cool summer) (cool summer damage)2 Note that this example is hardly obtained by the existing approaches such as syntactic transformation and word substitution with thesaurus.</Paragraph>
      <Paragraph position="4"> (2) The paraphrase does not have exactly the same meaning but has related meaning. This category is further divided into three subcategories.</Paragraph>
      <Paragraph position="5"> (2-a) The meaning of the paraphrase is more specific than that of the input noun phrase.</Paragraph>
      <Paragraph position="6"> e.g.(agricultural chemicals)-t~ N(insecticide and herbicide) (2-b) The meaning of the paraphrase is more general than that of the input noun phrase.</Paragraph>
      <Paragraph position="7"> e.g.A(stock movement)-Az8 w(movement of stock and exchange rate) (2-c) The paraphrase has related meaning to the input but is not categorized into above two. e.g.(drinks) -M2(international drink exhibition) (3) There is no relation between the paraphrase and the  input noun phrase.</Paragraph>
      <Paragraph position="8"> Among these categories, (1) and (2-a) are useful from a viewpoint of information retrieval. By adding the paraphrase of these classes to a query, we can expect the effective phrase expansion in queries.</Paragraph>
      <Paragraph position="9"> Since the paraphrase of (2-b) generalizes the concept denoted by the input, using these paraphrases for query expansion might degrade precision of the retrieval. However, they might be useful for the recall-oriented retrieval. The paraphrases of (2-c) have the similar property, since relatedness includes various viewpoints.</Paragraph>
      <Paragraph position="10"> The main reason of retrieval failure and irrelevant retrieval (3) are summarized as follows: * The system cannot generate a paraphrase, when there is no proper paraphrase for the input. In particular, this tends to be the case for single-word inputs, such as &amp;quot;(liquid crystal)&amp;quot; and &amp;quot;h(movie)&amp;quot;. But this does not imply the proposed method does not work well for single-words inputs. We had several interesting paraphrases for single-word inputs, such as &amp;quot;3;N(chemicals for agriculture and gardening)&amp;quot; for &amp;quot;(agricultural chemicals)&amp;quot;. * We used only three years worth of newspaper articles due to the limitation of computational resoruces. Sometimes, the system could not generate 2The left-hand side of the arrow is the input and the right-hand side is its paraphrase.</Paragraph>
      <Paragraph position="11"> the paraphrase of the input because of the limited size of the corpus.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Quantitative evaluation
</SectionTitle>
      <Paragraph position="0"> Since there is no test collection available to evaluate paraphrasing, we asked three judges to evaluate the output of the system subjectively. The judges classified the outputs into the categories introduced in 5.2. The evaluation was done on the 46 inputs which gave at least one output.</Paragraph>
      <Paragraph position="1"> Table 1 shows the results of judgments. Column &amp;quot;Q&amp;quot; denotes the query identifier, &amp;quot;Len.&amp;quot; denotes its length in morphemes, &amp;quot;#Para.&amp;quot; denotes the number of outputs and the columns (1) through (3) denote the number of outputs which are classified into each category by three judges.</Paragraph>
      <Paragraph position="2"> Therefore, the sum of these columns makes a triple of the number of outputs. The decimal numbers in the parentheses denote the generalized raw agreement indices of each category, which are calculated as given in (6) (Uebersax, 2001), where K is the number of judged cases, C is the number of categories, njk is the number of times category j is applied to case k, and nk is calculated by summing up over categories on case k; nk = summationtextCj=1 njk.</Paragraph>
      <Paragraph position="4"> In our case, K is the number of outputs (column &amp;quot;#Para.&amp;quot;), nk is the number of judges, 3, and j moves over (1) through (3).</Paragraph>
      <Paragraph position="5"> As discussed in 5.2, from the viewpoint of information retrieval, paraphrases of category (1) and (2-a) are useful for query expansion of phrasal index terms. Column &amp;quot;Acc.&amp;quot; denotes the ratio of paraphrases of category (1) and (2-a) to the total outputs. Column &amp;quot;Prec.&amp;quot; denotes non-interpolated average precision. Since the precision differs depending on the judge, the column is showing the average of the precisions given by three judges.</Paragraph>
      <Paragraph position="6"> We could obtain 45 paraphrases on average for each input. But the average accuracy is quite low, 10%, which means only one tenth of output is useful. Even though considering that all paraphrases not being in category (3) are useful, the accuracy only doubled. This means filtering conditions should be more rigid. However, looking at the agreement indices, we see that category (3) ranks very high. Therefore, we expect finding the paraphrases in category (3) is easy for a human. From all this, we conclude that the proposed method need to be improved in accuracy to be used for automatic query expansion in information retrieval, but it is usable to help users to modify their queries by suggesting possible paraphrases.</Paragraph>
      <Paragraph position="7"> Seeing the column &amp;quot;Len.&amp;quot;, we find that the proposed method does not work for complex noun phrases. The average length of input noun phrase is 4.5 morphemes.</Paragraph>
      <Paragraph position="8"> The longer input often results in less useful paraphrases.</Paragraph>
      <Paragraph position="9"> The number of outputs also decreases for longer inputs.</Paragraph>
      <Paragraph position="10"> We require all concepts mentioned in the input to have their counterparts in its paraphrases as described in 3.1.</Paragraph>
      <Paragraph position="11"> This condition seems to be strict for longer inputs. In addition, we need to take into account syntactic variations of longer inputs. Integrating syntactic transformation into the proposed method is one of the possible extensions to explore when dealing with longer inputs (Yoshikane et al., 2002).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML