File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1079_intro.xml
Size: 4,445 bytes
Last Modified: 2025-10-06 14:06:33
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1079"> <Title>A Text Understander that Learns</Title> <Section position="3" start_page="0" end_page="476" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The approach to learning new concepts as a result of understanding natural language texts we present here builds on two different sources of evidence -- the prior knowledge of the domain the texts are about, and grammatical constructions in which unknown lexical items occur. While there may be many reasonable interpretations when an unknown item occurs for the very first time in a text, their number rapidly decreases when more and more evidence is gathered. Our model tries to make explicit the reasoning processes behind this learning pattern.</Paragraph> <Paragraph position="1"> Unlike the current mainstream in automatic linguistic knowledge acquisition, which can be characterized as quantitative, surface-oriented bulk processing of large corpora of texts (Hindle, 1989; Zernik and Jacobs, 1990; Hearst, 1992; Manning, 1993), we propose here a knowledge-intensive model of concept learning from few, positive-only examples that is tightly integrated with the non-learning mode of text understanding. Both learning and understanding build on a given core ontology in the format of terminological assertions and, hence, make abundant use of terminological reasoning. The 'plain' text understanding mode can be considered as the instantiation and continuous filling of roles with respect to single concepts already available in the knowledge base. Under learning conditions, however, a set of alternative concept hypotheses has to be maintained for each unknown item, with each hypothesis denoting a newly created conceptual interpretation tentatively associated with the unknown item.</Paragraph> <Paragraph position="2"> The underlying methodology is summarized in Fig. 1. The text parser (for an overview, cf.</Paragraph> <Paragraph position="3"> BrSker et al. (1994)) yields information from the grammatical constructions in which an unknown lexical item (symbolized by the black square) occurs in terms of the corresponding dependency parse tree. The kinds of syntactic constructions (e.g., genitive, apposition, comparative), in which unknown lexical items appear, are recorded and later assessed relative to the credit they lend to a particular hypothesis. The conceptual interpretation of parse trees involving unknown lexical items in the domain knowledge base leads to the derivation of concept hypotheses, which are further enriched by conceptual annotations. These reflect structural patterns of consistency, mutual justification, analogy, etc. relative to already available concept descriptions in the domain knowledge base or other hypothesis spaces. This kind of initial evidence, in particular its predictive &quot;goodness&quot; for the learning task, is represented by corresponding sets of linguistic and conceptual qual-</Paragraph> <Paragraph position="5"/> <Section position="1" start_page="476" end_page="476" type="sub_section"> <SectionTitle> Role Terms Axiom Semantics </SectionTitle> <Paragraph position="0"> A - C A z = C z a : C a z E C z Q - R QZ = RZ a R b (a z, b z) E R z</Paragraph> </Section> <Section position="2" start_page="476" end_page="476" type="sub_section"> <SectionTitle> Concepts and Roles </SectionTitle> <Paragraph position="0"> ity labels. Multiple concept hypotheses for each unknown lexical item are organized in terms of corresponding hypothesis spaces, each of which holds different or further specialized conceptual readings.</Paragraph> <Paragraph position="1"> The quality machine estimates the overall credibility of single concept hypotheses by taking the available set of quality labels for each hypothesis into account. The final computation of a preference order for the entire set of competing hypotheses takes place in the qualifier, a terminological classifier extended by an evaluation metric for quality-based selection criteria. The output of the quality machine is a ranked list of concept hypotheses. The ranking yields, in decreasing order of significance, either the most plausible concept classes which classify the considered instance or more general concept classes subsuming the considered concept class (cf. Schnattinger and Hahn (1998) for details).</Paragraph> </Section> </Section> class="xml-element"></Paper>