File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/w93-0112_intro.xml
Size: 4,086 bytes
Last Modified: 2025-10-06 14:05:28
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0112"> <Title>Structural Methods for Lexical/Semantic Patterns</Title> <Section position="3" start_page="128" end_page="128" type="intro"> <SectionTitle> 2. Information Extraction </SectionTitle> <Paragraph position="0"> The recent Message Understanding Conferences (MUCs) and APt.PA Tipster project have posed a complex and fairly specific problem in information extraction (IE).</Paragraph> <Paragraph position="1"> The problem given is that of creating semantic templates or frames to correspond to newswire and newspaper articles. The expressiveness of the templates is restricted and somewhat skeletal, capturing the bare facts of the text, and not its complete meaning. Hobbs (\[8\]) has argued effectively that the problem is not one of full text understanding, but specifically one of information extraction ---: many types of information, such as speaker attitude, intensional constructs, facts not relevant to the chosen domain, etc., are not required; only a representation-specific set of domain information is the target for extraction.</Paragraph> <Paragraph position="2"> These types of systems provide a useful groundwork for the study of text interpretation systems because of the relative lack of difficulty in representing and manipulating the resulting knowledge structures. Although denotational structures for the type of factual information required in IE can be quite complex, they are still far more tractable than representations of speaker attitude, opaque contexts, or intensionai constructions.</Paragraph> <Paragraph position="3"> For example, in the ongoing TIPSTER project, information in only two specific domains is to be extracted one domain is joint ventures and business ownership, the other the microelectronics industry. The domains are further restricted by the particular hierarchy of predicate types used in the knowledge representation. Each domain has a set of templates (a particular implementation of frames) which rigidly define what types of facts and relations from the text are representable.</Paragraph> <Paragraph position="4"> 2.1. Mapping- IE : text~--~KR These information extraction tasks, as a subset of text understanding tasks, can be viewed as mapping problems, in which the problem is to find the proper representation, in terms of templates, for the source text. The problem is one of mapping from the strings of the source text to a problem-dependent knowledge representation scheme.</Paragraph> <Paragraph position="5"> The template knowledge representation used in the TxP-STER/MUC tasks is based on a frame-like system commonly known as the entity-relation, or ER., model.</Paragraph> <Paragraph position="6"> The El:t model codes information as multi-place relations. Typically, each type of relation has a fixed number of arguments, each of which is an entity in the model.</Paragraph> <Paragraph position="7"> Entities can either be atomic -- in the case of Ill'STEP,.</Paragraph> <Paragraph position="8"> atoms can be strings from text or items from a predetermined hierarchy of types -- or they can be composite, referring to other relational structures.</Paragraph> <Paragraph position="9"> Objects referenced in text often participate in more than one relationship. For example, the direct object of a sentence will often be the subject of a subordinate clause, either explicitly, or by pronominal reference. In a strict El:t model, this direct object would have to be represented twice, once for each clause. By a slight extension, atoms in the ER model can be generalized to objects which can take multiple references. Thus, no real atoms appear in relations, but only references to atoms, or to other relations. This model is often termed an object-oriented model, but because of the overloading of that name in so many fields, I prefer to call these models reference-relation models (RR). The important extention from the ER model is that relations themselves may I)e treated as objects of reference by other relations.</Paragraph> </Section> class="xml-element"></Paper>