File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1028_intro.xml
Size: 2,064 bytes
Last Modified: 2025-10-06 14:01:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1028"> <Title>Closing the Gap: Learning-Based Information Extraction Rivaling Knowledge-Engineering Methods</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> WOUNDED WHEN A BOMB EXPLODED IN SAN JUAN BAUTISTA MUNICIPALITY. OFFICIALS SAID THAT SHINING PATH MEMBERS WERE RESPONSIBLE FOR THE ATTACK ... ... POLICE SOURCES STATED THAT THE BOMB ATTACK INVOLVING THE SHINING PATH CAUSED SERIOUS DAMAGES ... ... </SectionTitle> <Paragraph position="0"> to aid in extraction. Several benchmark data sets have been used to evaluate IE approaches on semi-structured texts (Soderland, 1999; Ciravegna, 2001; Chieu and Ng, 2002a).</Paragraph> <Paragraph position="1"> For the task of extracting information from free texts, a series of Message Understanding Conferences (MUC) provided benchmark data sets for evaluation. Several subtasks for IE from free texts have been identified. The named entity (NE) task extracts person names, organization names, location names, etc. The template element (TE) task extracts information centered around an entity, like the acronym, category, and location of a company. The template relation (TR) task extracts relations between entities. Finally, the full-scale IE task, the scenario template (ST) task, deals with extracting generic information items from free texts. To tackle the full ST task, an IE system needs to merge information from multiple sentences in general, since the information needed to fill one template can come from multiple sentences, and thus discourse processing is needed.</Paragraph> <Paragraph position="2"> The full-scale ST task is considerably harder than all the other IE tasks or subtasks outlined above.</Paragraph> <Paragraph position="3"> As is the case with many other natural language processing (NLP) tasks, there are two main approaches to IE, namely the knowledge-engineering approach and the learning approach. Most early IE systems adopted the knowledge-engineering ap-</Paragraph> </Section> class="xml-element"></Paper>