File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-0202_intro.xml
Size: 3,036 bytes
Last Modified: 2025-10-06 14:03:48
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0202"> <Title>Comparing Information Extraction Pattern Models</Title> <Section position="3" start_page="0" end_page="12" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A common approach to Information Extraction (IE) is to use patterns which match against text and identify items of interest. Patterns are applied to text which has undergone various levels of linguistic analysis, such as phrase chunking (Soderland, 1999) and full syntactic parsing (Gaizauskas et al., 1996). The approaches use different definitions of what constitutes a valid pattern. For example, the AutoSlog system (Riloff, 1993) uses patterns which match certain grammatical categories, mainly nouns and verbs, in phrase chunked text while Yangarber et al. (2000) use subject-verb-object tuples derived from a dependency parse. An appropriate pattern language must encode enough information about the text to be able to accurately identify the items of interest. However, it should not contain so much information as to be complex and impractical to apply.</Paragraph> <Paragraph position="1"> Several recent approaches to IE have used patterns based on a dependency analysis of the input text (Yangarber, 2003; Sudo et al., 2001; Sudo et al., 2003; Bunescu and Mooney, 2005; Stevenson and Greenwood, 2005). These approaches have used a variety of pattern models (schemes for representing IE patterns based on particular parts of the dependency tree). For example, Yangarber (2003) uses just subject-verb-object tuples while Sudo et al. (2003) allow any subpart of the tree to act as an extraction pattern. The set of patterns allowed by the first model is a proper subset of the second and therefore captures less of the information contained in the dependency tree. Little analysis has been carried out into the appropriateness of each model. Sudo et al. (2003) compared three models in terms of their ability to identify event participants.</Paragraph> <Paragraph position="2"> The choice of pattern model has an effect on the number of potential patterns. This has implications on the practical application for each approach, particularly when used for automatic acquisition of IE systems using learning methods (Yangarber et al., 2000; Sudo et al., 2003; Bunescu and Mooney, 2005). This paper evaluates the appropriateness of four pattern models in terms of the competing aims of expressive completeness (ability to represent information in text) and complexity (number of possible patterns). Each model is examined by comparing it against a corpus annotated with events and determining the proportion of those which it is capable of representing. The remainder of this paper is organised as follows: a variety of dependency-tree-based IE pat- null tern models are introduced (Sections 2 and 3).</Paragraph> <Paragraph position="3"> Section 4 describes experiments comparing each model and the results are discussed in Section 5.</Paragraph> </Section> class="xml-element"></Paper>