File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/m92-1015_evalu.xml
Size: 1,697 bytes
Last Modified: 2025-10-06 14:00:10
<?xml version="1.0" standalone="yes"?> <Paper uid="M92-1015"> <Title>The &quot;ALL TEMPLATES&quot; results of our &quot;official&quot; runs were as follows : RECALL PRECISION</Title> <Section position="8" start_page="125" end_page="126" type="evalu"> <SectionTitle> GRAMMAR EVALUATION </SectionTitle> <Paragraph position="0"> To understand why some systems did better than others, we need some glass-box evaluation of individua l components. As we know, it is very hard to define any glass-box evaluation which can be applied across systems .</Paragraph> <Paragraph position="1"> We have experimented with one aspect of this, grammar (parse) evaluation, which can at least be applied acros s those systems which generate a full sentence parse.</Paragraph> <Paragraph position="2"> We use as our standard for comparison the Univ . of Pennsylvania Tree Bank, which includes parse trees for a portion of the MUC terrorist corpus . We take our parse trees, restructure them (automatically) to conform better to the Penn parses, strip labels from brackets, and then compare the bracket structure to that of the Tree Bank . The result is a recall/precision score which should be meaningful across systems .</Paragraph> <Paragraph position="3"> We have experimented with a number of parsing strategies, and found that parse recall is well correlate d with template recall [2] .</Paragraph> <Paragraph position="4"> In principle, we would like to try to extend these comparisons to &quot;deeper&quot; relations, such as functiona l subject/object relations. These will be harder to define, but may be applicable over a broader range of systems .</Paragraph> </Section> class="xml-element"></Paper>