File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/p02-1018_concl.xml
Size: 3,570 bytes
Last Modified: 2025-10-06 13:53:17
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1018"> <Title>A simple pattern-matching algorithm for recovering empty nodes and their antecedents</Title> <Section position="5" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Conclusion </SectionTitle> <Paragraph position="0"> This paper described a simple pattern-matching algorithm for restoring empty nodes in parse trees that do not contain them, and appropriately indexing these nodes with their antecedents. The pattern-matching algorithm combines both simplicity and reasonable performance over the frequently occuring types of empty nodes.</Paragraph> <Paragraph position="1"> Performance drops considerably when using trees produced by the parser, even though this parser's precision and recall is around 0.9. Presumably this is because the pattern matching technique requires that the parser correctly identify large tree fragments that encode long-range dependencies not captured by the parser. If the parser makes a single parsing error anywhere in the tree fragment matched by a pattern, the pattern will no longer match. This is not unlikely since the statistical model used by the parser does not model these larger tree fragments.</Paragraph> <Paragraph position="2"> It suggests that one might improve performance by integrating parsing, empty node recovery and antecedent nding in a single system, in which case the current algorithm might serve as a useful baseline.</Paragraph> <Paragraph position="3"> Alternatively, one might try to design a sloppy pattern matching algorithm which in effect recognizes and corrects common parser errors in these constructions. null Also, it is undoubtedly possible to build programs that can do better than this algorithm on special cases. For example, we constructed a Boosting classi er which does recover *U* and empty complementizers 0 more accurately than the pattern-matcher described here (although the pattern-matching algorithm does quite well on these constructions), but this classi er's performance averaged over all empty node types was approximately the same as the pattern-matching algorithm.</Paragraph> <Paragraph position="4"> As a comparison of tables 3 and 4 shows, the pattern-matching algorithm's biggest weakness is its inability to correctly distinguish co-indexed NP * (i.e., NP PRO) from free (i.e., unindexed) NP *.</Paragraph> <Paragraph position="5"> This seems to be a hard problem, and lexical information (especially the class of the governing verb) seems relevant. We experimented with specialized classi ers for determining if an NP * is co-indexed, but they did not perform much better than the algorithm presented here. (Also, while we did not systematically investigate this, there seems to be a number of errors in the annotation of free vs. co-indexed NP * in the treebank).</Paragraph> <Paragraph position="6"> There are modications and variations on this algorithm that are worth exploring in future work.</Paragraph> <Paragraph position="7"> We experimented with lexicalizing patterns, but the simple method we tried did not improve results. Inspired by results suggesting that the pattern-matching algorithm suffers from over-learning (e.g., testing on the training corpus), we experimented with more abstract skeletal patterns, which improved performance on some types of empty nodes but hurt performance on others, leaving overall performance approximately unchanged. Possibly there is a way to use both skeletal and the original kind of patterns in a single system.</Paragraph> </Section> class="xml-element"></Paper>