File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1093_metho.xml

Size: 4,923 bytes

Last Modified: 2025-10-06 14:14:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1093">
  <Title>D B'~Z~A * D D BL?cA * D D B~'LT~A * D D B~-/'cA * D D * A.~3 ~ i~'B * D ~ D * AL g - D 1.6 D * B,t~ J: LFA * D D'B~A'D D * A~Zo~'~cCo')B * D' 1.7 D * A~,~-1~-9~ ~ B - D / D * B~:_o~,~Z'09A * D D. B~Y-I~-~A * D</Title>
  <Section position="4" start_page="553" end_page="554" type="metho">
    <SectionTitle>
3. Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="553" end_page="553" type="sub_section">
      <SectionTitle>
3.1 Test Data
</SectionTitle>
      <Paragraph position="0"> We used the articles contained in &amp;quot;Nikkei Shinbun&amp;quot; for January and February in 1992 as the corpus for the experiments. The number of the articles is about 27,000, which contain about 7 million characters.</Paragraph>
      <Paragraph position="1"> Experiments were carried out using 400 compound nouns: 100 for 5-kanji words, 100 for 6-kanji words, 100 for 7-kanji words and 100 for 8-kanji words. The frequency of these word lengths is about the same in the corpus. Alter randomly selecting the test samples, we confirmed that they were all compound nouns.</Paragraph>
      <Paragraph position="2"> Numerical expressions appeared in 10% of the test samples, and such expressions were pre-processed as follows: &amp;quot;~'\]\-I-~'&amp;quot; --~ &amp;quot;~ pr-num/~\]\-\[- num/~ n&amp;quot; (C/~: about; ~: hundred; A.: eight; W: ten; ~-: dealer)</Paragraph>
    </Section>
    <Section position="2" start_page="553" end_page="553" type="sub_section">
      <SectionTitle>
3.2 Baseline
</SectionTitle>
      <Paragraph position="0"> Baselines have rarely been introduced in research on Japanese noun compounds. This paper introduces a baseline to facilitate our evaluation of the effectiveness of our method.</Paragraph>
      <Paragraph position="1"> The baseline we used is leftmost derivation. This is an extension of left branchprefereture in Lauer (1995). The baseline is also a well-known heuristic method to analyze Japanese noun phrases combined with &amp;quot;C/)&amp;quot; (such as &amp;quot;A(c)B~C&amp;quot;). As shown below, this heuristic method works well especially when the length of a compound noun is relatively short. Note that the baseline correctly analyzes &amp;quot;i~/E~\]j~,~-&amp;quot; if &amp;quot;~)~&amp;quot; is registered. However, the baseline actually fails because it cannot capture the unregistered word.</Paragraph>
    </Section>
    <Section position="3" start_page="553" end_page="553" type="sub_section">
      <SectionTitle>
3.3 Results and Comparison
</SectionTitle>
      <Paragraph position="0"> Table 2 shows the results of the proposed method. The first line indicates the number of samples for which the correct dependency structure was given as the single minimum cost solution. The second line indicates the accumulated number of samples for which the col~rect dependency structure was given as one of the minimum cost solutions. Table 3 shows the results of the baseline, and indicates the number of samples for which the correct  The result of baseline Comparing the two tables reveals that the proposed method is more accurate than the baseline. For longer word length, the difference is greater.</Paragraph>
      <Paragraph position="1"> Our result cannot be compared accurately with the existing result (Kobayashi et a/., 1995) because we used a different test corpus, and only the results on 4-, 5- and 6-kanji compound nouns were reported. However, the accuracy of their results on 6-kanji compound nouns is 53%, unless they combine their conceptual dependency model with a heuristic using the distance of modifier and modifee. After combining the model and the heuristic, accuracy improves to 70%, which is the same as ours.</Paragraph>
      <Paragraph position="2"> An 8-kanji compound noun usually contains four nouns. The performance of our method (accuracy of 58%) is encouraging, since most of the errors were caused by proper nouns. This problem can be solved using a pre-processor (explained below).</Paragraph>
    </Section>
    <Section position="4" start_page="553" end_page="554" type="sub_section">
      <SectionTitle>
3.4 Causes of Errors
</SectionTitle>
      <Paragraph position="0"> Forty-two percent of the error was caused by proper nouns, 16% by time expressions, and 15% by monetary expressions. This means that proper nouns are a major cause of the errors, as pointed out in previous research.</Paragraph>
      <Paragraph position="1"> There are several reasons for this: (1) an identical proper noun normally does not appear  many times in the corpus.</Paragraph>
      <Paragraph position="2"> (2) proper nouns sometimes cause cross-boundary errors at the initial morphological analysis.</Paragraph>
      <Paragraph position="3"> We can be optimistic about eliminating these three types of errors. If we use a preprocessor (for proper nouns, see Kitani et al., 1994), most of them can be eliminated.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML