File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/p04-1040_evalu.xml
Size: 3,897 bytes
Last Modified: 2025-10-06 13:59:10
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1040"> <Title>Enriching the Output of a Parser Using Memory-Based Learning</Title> <Section position="9" start_page="0" end_page="0" type="evalu"> <SectionTitle> 9 Discussion </SectionTitle> <Paragraph position="0"> The experiments described in the previous sections indicate that although statistical parsers do not explicitly output some information available in the corpus they were trained on (grammatical and semantic tags, empty nodes, non-local dependencies), this information can be recovered with reasonably high accuracy, using pattern matching and machine learning methods.</Paragraph> <Paragraph position="1"> For our task, using dependency structures rather than phrase trees has several advantages. First, after converting both the treebank trees and parsers' outputs to graphs with head-modifier relations, our method needs very little information about the linguistic nature of the data, and thus is largely corpusand parser-independent. Indeed, after the conversion, the only linguistically informed operation is the straightforward extraction of features indicating the presence of subject and object dependents, and finiteness of verb groups.</Paragraph> <Paragraph position="2"> Second, using a dependency formalism facilitates a very straightforward evaluation of the systems that produce structures more complex than trees. It is not clear whether the PARSEVAL evaluation can be easily extended to take non-local relations into account (see (Johnson, 2002) for examples of such extension). null Finally, the independence from the details of the parser and the corpus suggests that our method can be applied to systems based on other formalisms, e.g., (Hockenmaier, 2003), to allow a meaningful dependency-based comparison of very different parsers. Furthermore, with the fine-grained set of dependency labels that our system provides, it is possible to map the resulting structures to other dependency formalisms, either automatically in case annotated corpora exist, or with a manually developed set of rules. Our preliminary experiments with Collins' parser and the corpus annotated with grammatical relations (Carroll et al., 2003) are promising: the system achieves 76% precision/recall fscore, after the parser's output is enriched with our method and transformed to grammatical relations using a set of 40 simple rules. This is very close to the performance reported by Carroll et al. (2003) for the parser specifically designed for the extraction of grammatical relations.</Paragraph> <Paragraph position="3"> Despite the high-dimensional feature spaces, the large number of lexical features, and the lack of independence between features, we achieved high accuracy using a memory-based learner. TiMBL performed well on tasks where structured, more complicated and task-specific statistical models have been used previously (Blaheta and Charniak, 2000).</Paragraph> <Paragraph position="4"> For all subtasks we used the same settings for TiMBL: simple feature overlap measure, 5 nearest neighbours with majority voting. During further experiments with our method on different corpora, we found that quite different settings led to a better performance. It is clear that more careful and systematic parameter tuning and the analysis of the contribution of different features have to be addressed. Finally, our method is not restricted to syntactic structures. It has been successfully applied to the identification of semantic relations (Ahn et al., 2004), using FrameNet as the training corpus.</Paragraph> <Paragraph position="5"> For this task, we viewed semantic relations (e.g., Speaker, Topic, Addressee) as dependencies between a predicate and its arguments. Adding such semantic relations to syntactic dependency graphs was simply an additional graph transformation step.</Paragraph> </Section> class="xml-element"></Paper>