File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1204_evalu.xml
Size: 4,769 bytes
Last Modified: 2025-10-06 13:59:10
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1204"> <Title>Deep Linguistic Analysis for the Accurate Identification of Predicate-Argument Relations</Title> <Section position="6" start_page="2" end_page="2" type="evalu"> <SectionTitle> 5 Experimental results </SectionTitle> <Paragraph position="0"> In this section, we evaluate the accuracy of HPSG parsing using the November 2002 release of Prop-Bank (Kingsbury and Palmer, 2002). An HPSG grammar was extracted from Section 02-21 and a disambiguation model was trained using the same data. Table 2 shows specifications of the grammar and the disambiguation model, where the size of the training data shows the file size of a compressed training data and the estimation time represents a user time required for estimating a52 a14a16a12a18a17a23 a8a10a13a26a20 . We prepared two grammars for the evaluation: a1 penn was extracted from the Penn Treebank with the original algorithm (Miyao et al., 2004), and a1 prop was extracted using the PropBank annotations for argument/modifier distinction by a method similar to Chen and Rambow (2003). That is, constituents annotated with ARGa0 were treated as an argument in the grammar extraction. In a1 penn, prepositional phrases are basically treated as modifiers since we have no cue to detect argument/modifier distinction in the original Penn Treebank. Section 02-21 was also used for developing HPSG-to-PropBank mapping. Note that the PropBank annotation was used only for this purpose, and was not used for training a statistical disambiguation model. This is very different from existing methods of identifying PropBank-style annotations where they trained the identification model using the PropBank. In the following, Section 22 of the PropBank was used for the development of the parser, while Section 23 was used for the final evaluation.</Paragraph> <Paragraph position="1"> The accuracy of HPSG parsing was measured against the core-argument annotations (i.e., ARG0, ..., ARG5) of the PropBank. Each predicate-argument relation output by the parser was represented as a tuple a4a10a3a2a5a4a7a6 a8a9a8a10a2a12a11 a6a13a8a15a14a16a4 a6a9a8a9a8a10a2a12a11 a14 , where a10a3a2a5a4a7a6 was a predicate, a8a17a2a12a11 a6a18a8a15a14a19a4a20a6 was the label of an argument position (i.e., one of ARG0, ..., ARG5), and a8a10a2a12a11 was the head word of the argument ofa10a3a2a5a4a20a6 . Each tuple was compared to the annotations in the PropBank. We used a mapping table described in words of core arguments, with HPSG-to-PropBank mapping) Section 4 for mapping the argument labels of HPSG into the PropBank-style.</Paragraph> <Paragraph position="2"> Table 3 shows the accuracy of semantic arguments output by the HPSG parser without mapping HPSG outputs to PropBank-style, while Table 4 shows the accuracy with the HPSG-to-PropBank mapping. LP/LR columns represent labeled precision/recall while UP/UR represent unlabeled precision/recall. &quot;Labeled&quot; here means the label of argument positions. That is, a predicate-argument relation was judged to be correct if a4a10a3a2a5a4a7a6 a8a9a8a10a2a12a11 a6a13a8a15a14a16a4 a6a9a8a9a8a10a2a12a11 a14 was correctly output. &quot;Unlabeled&quot; means that the head word of the argument was correctly output regardless of the argument position, i.e.,a10a3a2a5a4a7a6 and a8a10a2a12a11 were correctly output. The &quot;Gold parses&quot; row represents the accuracy attained when correct HPSG derivations are given. That is, it represents the accuracy when Section 23 of the HPSG treebank was given. This represents the upper bound of this measure in this evaluation.</Paragraph> <Paragraph position="3"> First of all, we can see that labeled precision/recall significantly increased with the HPSG-to-PropBank mapping. This means that the low accuracy of the naive evaluation (Table 3) was mainly due to the disagreements of the representation of semantic structures.</Paragraph> <Paragraph position="4"> As shown in Table 4, despite not employing the PropBank for the machine learning of a disambiguation model, the labeled precision/recall attained by a1 prop were superior to an existing study using the Collins parser (75.9/69.6) (Gildea and Hockenmaier, 2003), and the results were approaching existing studies on the same task using a CCG parser (76.1/73.5) (Gildea and Hockenmaier, 2003).</Paragraph> <Paragraph position="5"> Although the results cannot directly be compared with another work using LTAG (Chen and Rambow, 2003) because their target annotations were limited to those localized in an elementary tree, considering that their target annotations were 87% of corearguments, our results are competitive with their results (82.57/71.41).</Paragraph> </Section> class="xml-element"></Paper>