File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-1034_concl.xml

Size: 2,966 bytes

Last Modified: 2025-10-06 13:58:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1034">
  <Title>Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification</Title>
  <Section position="6" start_page="223" end_page="223" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> This paper presented a new method for identifying base NPs. Our treebank approach uses the simple technique of matching part-of-speech tag sequences, with the intention of capturing the simplicity of the corresponding syntactic structure. It employs two existing corpus-based techniques: the initial noun phrase grammar is extracted directly from an annotated corpus; and a benefit score calculated from errors on an improvement corpus selects the best subset of rules via a coarse- or fine-grained pruning algorithm.</Paragraph>
    <Paragraph position="1"> The overall results are surprisingly good, especially considering the simplicity of the method. It achieves 94% precision and recall on simple base NPs. It achieves 91% precision and recall on the more complex NPs of the Ramshaw &amp; Marcus corpus. We believe, however, that the base NP finder can be improved further. First, the longest-match heuristic of the noun phrase bracketer could be replaced by more sophisticated parsing methods that account for lexical preferences. Rule application, for example, could be disambiguated statistically using distributions induced during training. We are currently investigating such extensions. One approach closely related to ours -- weighted finite-state transducers (e.g. (Pereira and Riley, 1997)) -- might provide a principled way to do this. We could then consider applying our error-driven pruning strategy to rules encoded as transducers. Second, we have only recently begun to explore the use of local repair heuristics. While initial results are promising, the full impact of such heuristics on overall performance can be determined only if they are systematically learned and tested using available training data. Future work will concentrate on the corpus-based acquisition of local repair heuristics.</Paragraph>
    <Paragraph position="2"> In conclusion, the treebank approach to base NPs provides an accurate and fast bracketing method, running in time linear in the length of the tagged text.. The approach is simple to understand, implement, and train. The learned grammar is easily modified for use with new corpora, as rules can be added or deleted with minimal interaction problems.</Paragraph>
    <Paragraph position="3"> Finally, the approach provides a general framework for developing other treebank grammars (e.g., for subject/verb/object identification) in addition to these for base NPs.</Paragraph>
    <Paragraph position="4"> Acknowledgments. This work was supported in part by NSF (\]rants IRI-9624639 and GER-9454149.</Paragraph>
    <Paragraph position="5"> We thank Mitre for providing their part-of-speech tagger. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML