File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/c02-1100_evalu.xml

Size: 4,505 bytes

Last Modified: 2025-10-06 13:58:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1100">
  <Title>Lenient Default Unification for Robust Processing within Unification Based Grammar Formalisms</Title>
  <Section position="7" start_page="77" end_page="77" type="evalu">
    <SectionTitle>
6 Performance Evaluation
</SectionTitle>
    <Paragraph position="0"> We measured the performance of our robust parsing algorithm by measuring coverage and degree of overgeneration for the Wall Street Journal in the Penn Treebank (Marcus et al., 1993). The training corpus consists of 5,903 sentences selected from the Wall Street Journal (Wall Street Journal 00 - 02), and we prepared two sets of test corpora, TestSetA and TestSetB. TestSetA consists of 1,480 sentences (Wall Street Journal 03) and is used for measuring coverage.5 TestSetB consists of 100 sentences and is used for measuring the degree of overgeneration. The sentences of TestSetB are the shortest 100 sentences in TestSetA. Table 1 shows the average sentence length of each corpus. Here, 'coverage' means the ratio of 'the number of sentences that are covered by a grammar' to 'the number of all sentences'. Here, we say 'a sentence is covered' when a sentence can be analyzed by a parser and the result includes trees that are consistent with brackets and POS tags annotated in the Penn Treebank.</Paragraph>
    <Paragraph position="1"> Grammar rules were extracted by offline parsing with the XHPSG grammar (Tateisi et al., 1998),  which is a translation into HPSG of the manually-developed XTAG English grammar (The XTAG Research Group , 1995). The growth of the number of extracted rules is shown in the left of Figure 3.</Paragraph>
    <Paragraph position="2"> The average cost per sentence in offline parsing was 8.11. This means the total number of nodes and structure-sharing that are removed was less than 9 for each sentence. The coverage for the training corpus by offline parsing was 95.4%.</Paragraph>
    <Paragraph position="3"> The coverage was measured by using the XHPSG grammar with the extracted rules. The coverage for TestSetA and TestSetB is illustrated in the middle and right of Figure 3, respectively. As seen in the figure, the coverage for the Wall Street Journal grew from 24.7% to 65.3% for TestSetA and from 64% to 88% for TestSetB.</Paragraph>
    <Paragraph position="4"> We measured the degree of overgeneration by measuring the number of edges, using a parser based on A* algorithm. Figure 4 shows the average number of edges when TestSetB was parsed.</Paragraph>
    <Paragraph position="5"> From this figure and Figure 3, we can observe that the coverage grew from 64% to 88% by generating just 87.99 more edges (the number of edges grew from 240.68 to 328.67 in average).</Paragraph>
    <Paragraph position="6"> From the experiments, we can say that our approach is effective in extending coverage with a little overgeneration.</Paragraph>
    <Paragraph position="7"> We have analyzed the phenomena that cannot be analyzed by the original XHPSG grammar but can be analyzed by the extracted rules in the first 200 sentences in Wall Street Journal 03 of the test set. Among the 200 sentences, the original XHPSG grammar can cover 38 sentences (19% of the sentences) and the XHPSG grammar with the extracted rules can analyze 131 sentences (65.5% of the sentences). Table 2 shows the number of each  phenomenon that the original grammar fails to analyze ((A) in the table), and also shows the number of each phenomenon that the XHPSG grammar with the extracted rules still fails to analyze ((B) in the table). As seen in the table, more than 70% of phenomena that the original grammar cannot analyze were analyzed by our method. Note that most of the phenomena that cannot be analyzed with the extracted rules were lack of lexical entry, inconsistency between the grammar and the treebank, and complicated phenomena that are currently open problems in the field of linguistics.</Paragraph>
    <Paragraph position="8"> Most of the lack of lexical entries failures were caused by the lack of 'apostrophe s.' This means that just by adding lexical entries for 'apostrophe s', we can cover almost half of this type of error. Among the words listed in the table, the XHPSG grammar has no lexical entry for 'itself' and 'as (Adv)'. As our method is only concerned with grammar rules, our method cannot recover words that have no lexical entry. This means that if a sentence includes the word 'itself', the sentence cannot be recovered by our method.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML