File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1113_intro.xml
Size: 2,820 bytes
Last Modified: 2025-10-06 14:01:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1113"> <Title>Generating extraction patterns from a large semantic network and an untagged corpus Thierry POIBEAU Thales and LIPN Domaine de Corbeville</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"> The bases of IE as defined in the introduction are exposed in (Pazienza, 1997). IE is known to have established a now widely accepted linguistic architecture based on cascading automata and domain-specific knowledge (Appelt et al, 1993). However, several studies have outlined the problem of the definition of the resources, see E. Riloff (1995).</Paragraph> <Paragraph position="1"> To address this problem of portability, a recent research effort focused on using machine learning throughout the IE process (Muslea, 1999). A first trend was to directly apply machine learning methods to replace IE components. For instance, statistical methods have been successfully applied to the named-entity task. Among others, (Bikel et a., 1997) learns names by using a variant of hidden Markov models.</Paragraph> <Paragraph position="2"> Another research area trying to avoid the time-consuming task of elaborating IE resources is concerned with the generalization of extraction patterns from examples. (Muslea, 1999) gives an extensive description of the different approaches of that problem. Autoslog (Riloff, 1993) was one of the very first systems using a simple form of learning to build a dictionary of extraction patterns. Successors of AutoSlog like Crystal (Soderland et al., 1995) mainly use decision trees and relational learning techniques to learn set of rules during their extraction step. More recently, the SrV system (Freitag, 1998) and the Pinocchio system (Ciravegna, 2001) use a combination of relational and basic statistical methods inspired from Naive Bayes for IE tasks.</Paragraph> <Paragraph position="3"> These approaches acquire knowledge from texts but they must be completed with a semantic expansion module. Several authors have presented experiments based on Wordnet (Bagga et al., 1996).</Paragraph> <Paragraph position="4"> Our approach is original given that it consists in an integrated system, using both a semantic network and a corpus to acquire knowledge and overcome the limitations of both knowledge sources. On the one hand, the fact that we use a semantic network allows us to obtain a broader coverage than if we only used a training corpus (contrary Ciravegna' system for example). On the other hand, the corpus ensures that the acquired resources are quite adapted to the task (contrary Bagga' system for example). The performance of the system will demonstrate this point (see below section 5).</Paragraph> </Section> class="xml-element"></Paper>