File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-1028_concl.xml

Size: 2,070 bytes

Last Modified: 2025-10-06 13:53:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1028">
  <Title>Closing the Gap: Learning-Based Information Extraction Rivaling Knowledge-Engineering Methods</Title>
  <Section position="13" start_page="0" end_page="0" type="concl">
    <SectionTitle>
DRES ARRIVED IN GUATEMALA ON 11 JAN-
</SectionTitle>
    <Paragraph position="0"> UARY&amp;quot;. In this case, only the first occurrence of OQUELI COLINDRES should be used as a positive example for the human target slot. However, ALICE does not have access to such information, since the MUC-4 training documents are not annotated (i.e., only templates are provided, but the text strings in a document are not marked). Thus, ALICE currently uses all occurrences of &amp;quot;OQUELI COLINDRES&amp;quot; as positive training examples, which introduces noise in the training data. We believe that annotating the string occurrences in training documents will provide higher quality training data for the learning approach and hence further improve accuracy.</Paragraph>
    <Paragraph position="1"> Although part-of-speech taggers often boast of accuracy over 95%, the errors they make can be fatal to the parsing of sentences. For example, they often tend to confuse &amp;quot;VBN&amp;quot; with &amp;quot;VBD&amp;quot;, which could change the entire parse tree. The MUC-4 corpus was provided as uppercase text, and this also has a negative impact on the named entity recognizer and part-of-speech tagger, which both make use of case information.</Paragraph>
    <Paragraph position="2"> Learning approaches have been shown to perform on par or even outperform knowledge-engineering approaches in many NLP tasks. However, the full-scale scenario template IE task was still dominated by knowledge-engineering approaches. In this paper, we demonstrate that using both state-of-art learning algorithms and full parsing, learning approaches can rival knowledge-engineering ones, bringing us a step closer to building full-scale IE systems in a domain-independent fashion with state-of-the-art accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML