File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1010_evalu.xml

Size: 2,049 bytes

Last Modified: 2025-10-06 13:58:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1010">
  <Title>Automatic Predicate Argument Analysis of the Penn TreeBank</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4. EVALUATION
</SectionTitle>
    <Paragraph position="0"> The current implementation of the tagger assigns predicate argument structures to all of the 6500 verbs that occur in the Penn Treebank. However, our evaluation of its accuracy is not yet so comprehensive. Our first preliminary evaluation of the performance of the tagger was based on a 5000 word section of the Penn TreeBank. The tagger was run on this, and the argument labeling was subsequently hand corrected by a linguistics graduate student, giving an accuracy rate of 81% out of 160 predicate argument structures. We have since automatically tagged and hand corrected an additional 660 predicate argument structures, with an accuracy rate of 86%, (556 structures), giving us a combined accuracy rate of 83.7%. There are over 100 verbs involved in the evaluation. The number of possible frames for the verbs in the second test ranges from 13 frames to 30, with the typical number being in the teens. Not all of these frames actually appear in the TreeBank data.</Paragraph>
    <Paragraph position="1"> These results compare favorably with the results reported by Gildea and Jurafsky of 80.7% on their development set, (76.9% on the test set.) Their data comes from the Framenet project, [Lowe, et al., 97], which has been in existence for several years, and consisted of over 900 verbs out of 1500 words and almost 50,000 sentences. The Framenet project also uses more fine-grained semantic role labels, although it should be possible to map from our Arg0, Arg1 labels to their labels. They used machine learning techniques applied to human annotated data, whereas our tagger does not currently use statistics at all, and is primarily rulebased. Once we have sufficient amounts of data annotated we plan to experiment with hybrid approaches.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML