File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/x98-1016_intro.xml
Size: 3,045 bytes
Last Modified: 2025-10-06 14:06:51
<?xml version="1.0" standalone="yes"?> <Paper uid="X98-1016"> <Title>Transforming Examples into Patterns for Information Extraction</Title> <Section position="3" start_page="98" end_page="99" type="intro"> <SectionTitle> 3 General and Specific Pat- </SectionTitle> <Paragraph position="0"> terns Before we describe our example-based strategy for building patterns, we examine the organization of the pattern base in more detail. We can group the patterns into &quot;layers&quot; according to their range of appli- null cability: 1. Domain-independent: this layer contains the most generally applicable patterns. Included in this layer are many of the patterns for name recognition (for people, organizations, and locations, as well as temporal and numeric expressions, currencies, etc.), and the purely syntactic patterns for noun groups and verb groups. These patterns are useful in a wide range of tasks. 2. Domain-specific: the next layer contains domain-specific patterns, which are useful across a narrower range of scenarios, but still have considerable generality. These include domain-specific name patterns, such as those for certain types of artifacts, as well as patterns for noun phrases which express relationships among entities, such as those between persons and organizations.</Paragraph> <Paragraph position="1"> . Scenario-specific: the last layer contains scenario-specific patterns, having the narrowest applicability, such as the clausal patterns that capture relevant events.</Paragraph> <Paragraph position="2"> This stratification reflects the relative &quot;persistence&quot; of the patterns. The patterns at the lowest level, having the widest applicability, are built in as a core component of the system. These change little when the system is ported to a new domain. The mid-range patterns, applicable in certain commonly encountered domains, can be organized as domain-specific pattern libraries, which can be plugged in as required by the extraction task. 6 For example, for the &quot;business/economic news&quot; domain, we have patterns that capture: * entities - organization, company, person, location; null * relations - person/organization, organiza null tion/location, parent/subsidiary organization.</Paragraph> <Paragraph position="3"> The scenario-specific patterns must be built on a perscenario basis. This is accomplished through a set of graphical tools, which engage the user only at the level of surface representations, hiding the internal operation of the patterns. The user's input is reduced to * providing textual examples of events of interest, * describing the corresponding output structures (LFs) which the example text should induce.</Paragraph> <Paragraph position="4"> In the remaining sections we discuss how the system can use this information to * automatically build patterns to map the user-specified text into the user-specified LF, based European Information Services operation has appointed George Garrick, .40 years</Paragraph> </Section> class="xml-element"></Paper>