File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2006_concl.xml

Size: 2,405 bytes

Last Modified: 2025-10-06 13:55:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2006">
  <Title>Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank</Title>
  <Section position="8" start_page="47" end_page="47" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> We have demonstrated that an unlexicalized parser with minimal manual modification for WSJ text but no tuning of performance to optimize on this dataset alone, and no use of PTB - can achieve accuracy competitive with parsers employing lexicalized statistical models trained on PTB.</Paragraph>
    <Paragraph position="1"> We speculate that we achieve these results because our system is engineered to make minimal useoflexicalinformationbothinthegrammarand in parse ranking, because the grammar has been developed to constrain ambiguity despite this lack of lexical information, and because we can computethefullpackedparseforestforallthetestsen- null  tencesefficiently(withoutsacrificingspeedofprocessing with respect to other statistical parsers). These advantages appear to effectively offset the disadvantage of relying on a coarser, purely structural model for probabilistic parse selection. In future work, we hope to improve the accuracy of the system by adding lexical information to the statistical parse selection component without exploiting in-domain treebanks.</Paragraph>
    <Paragraph position="2"> Clearly, more work is needed to enable more accurate, informative, objective and wider comparison of extant parsers. More recent PTB-based parsers show small improvements over Collins' Model 3 using PARSEVAL, while Clark and Curran (2004) and Miyao and Tsujii (2005) report 84% and 86.7% F1-scores respectively for their own relational evaluations on section 23 of WSJ.</Paragraph>
    <Paragraph position="3"> However, it is impossible to meaningfully compare these results to those reported here. The reannotated DepBank potentially supports evaluations which score according to the degree of agreement  betweenthisandtheoriginalannotationand/ordevelopment of future consensual versions through collaborative reannotation by the research community. We have also highlighted difficulties for relational evaluation schemes and argued that presenting individual scores for (classes of) relations and features is both more informative and facilitates system comparisons.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML