File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1157_concl.xml

Size: 2,534 bytes

Last Modified: 2025-10-06 13:53:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1157">
  <Title>Verb Phrase Ellipsis detection using Automatically Parsed Text</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Summary and Future work
</SectionTitle>
    <Paragraph position="0"> This paper has presented a robust system for VPE detection. The data is automatically tagged and parsed, syntactic features are extracted and machine learning is used to classify instances. This work ofiers clear improvement over previous work, and is the flrst to handle un-annotated free text, where VPE detection can be done with limited loss of performance compared to annotated data.</Paragraph>
    <Paragraph position="1"> + Three difierent machine learning algorithms, Memory Based Learning, GIS-based and L-BFGS-based maximum entropy modeling are used. They give similar results, with L-BFGS-MaxEnt generally giving the highest performance.</Paragraph>
    <Paragraph position="2"> + Two difierent parsers were used, Charniak's parser and RASP, achieving similar results in experiments, with RASP results being slightly higher. RASP generates more flne-grained POS info, while Charniak's parser generates more reliable parse structures for identifying auxiliary-flnal VP's.</Paragraph>
    <Paragraph position="3"> + Experiments on the Treebank give 82% F1, with the most informative feature, empty VP's, giving 70% F1.</Paragraph>
    <Paragraph position="4"> + Re-parsing the Treebank gives 67% F1 for both parsers. Charniak's parser combined with Johnson's algorithm generates the empty VP feature with 32% F1.</Paragraph>
    <Paragraph position="5"> + Repeating the experiments by parsing parts of the BNC gives 71% F1, with the empty VP feature further reduced to 25% F1. Combining the datasets, flnal results of 71-2% F1 are obtained.</Paragraph>
    <Paragraph position="6"> Furtherworkcan bedone onextracting grammatical relation information (Lappin et al., 1989; Cahill et al., 2002), or using those provided by RASP, to produce more complicated features. While the experiments suggest a performance barrier around 70%, it may be worthwhile to investigate the performance increases possible through the use of larger training sets. In the next stage of work, we will use machine learning methods for the task of flnding antecedents. We will also perform a classiflcation of the cases to determine what percentage can be dealt with using syntactic reconstruction, and how often more complicated approaches are required.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML