File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-2144_evalu.xml

Size: 3,740 bytes

Last Modified: 2025-10-06 14:00:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2144">
  <Title>HPSG-Style Underspecified Japanese Grammar with Wide Coverage</Title>
  <Section position="7" start_page="878" end_page="879" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We implemented our parser and grammar in LiLFeS (Makino et al., 1998) s, a feature-structure description language developed by our group. We tested randomly selected 10000 sentences fi'om the Japanese EDR corpus (EDR, 1996). Tile EDR Corpus is a Japanese version of treebank with morphological, structural, and semantic information. In our experiments, we used only the structural information, that is, parse trees. Both the parse trees in our parser and the parse trees in the EDR Corpus are first converted into bunsetsu dependencies, and they are compared when calculating accuracy. Note that the internal structures of bunsetsus, e.~.</Paragraph>
    <Paragraph position="1"> structures of compound nouns, are not considered in our evaluations.</Paragraph>
    <Paragraph position="2"> ~re evaluated the following grammars: (a) the original underspecified grammar, (b) (a) + constraint for wa-marked PPs, (c) (a) + constraint for relative clauses with a comma, (d) (a) + constraint for nominal time suffixes with a comma, and (e) (a) + all the three constraints. We evaluated those grammars by the following three measurements: Coverage The percentage of the sentences that generate at least one parse tree.</Paragraph>
    <Paragraph position="3"> Partial Accuracy The percentage of the correct dependencies between bunsetsus (excepting the last obvious dependency) for the parsable sentences.</Paragraph>
    <Paragraph position="4"> Total Accuracy The percentage of the correct dependencies between bunsetsus (excepting the last dependency) over all sentences.</Paragraph>
    <Paragraph position="5">  from the Japanese EDR Corpus: (a-e) are grammars respectively corresponding to Section 2 (a), Section 2 + Subsection 3.1 (b), Section 2 + Subsection 3.2 (c), Section 2 + Subsection 3.3 (d), and Section 2 + Section 3 (e).</Paragraph>
    <Paragraph position="6"> When calculating total accuracy, the dependencies for unparsable sentences are predicted so that every bunsetsu is attached to the nearest bunsetsu. In other words, total accuracy can be regarded as a weighted average of partial accuracy and baseline accuracy.</Paragraph>
    <Paragraph position="7"> Table 2 lists the results of our experiments.</Paragraph>
    <Paragraph position="8"> Comparison of the results between (a) and (bd) shows that all the three constraints improve partial accuracy and total accuracy with little coverage loss. And grammar (e) using the combination of the three constraints still works with no side effect.</Paragraph>
    <Paragraph position="9"> We also measured average parsing time per sentence for the original grammar (a) and the fully augmented grammar (e). The parser we adopted is a naive CKY-style parser. Table 3 gives the average parsing time per sentence for those 2 grammars. Pseudo-principles and further constraints on LEs/LETs also make parsing more time-efficient. Even though they are sometimes considered to be slow in practical application because of their heavy feature structures, actually we found them to improve speed. In (Torisawa and Tsujii, 1996), an efficient HPSG parser is proposed, and our preliminary experiments show that the parsing time of the effident parser is about three times shorter than that of the naive one. Thus, the average parsing time per sentence will be about 300 msec., and we believe our grammar will achive a practical speed. Other techniques to speed-up the parser are proposed in (Makino et al., 1998).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML