File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/82/p82-1010_intro.xml
Size: 3,243 bytes
Last Modified: 2025-10-06 14:04:21
<?xml version="1.0" standalone="yes"?> <Paper uid="P82-1010"> <Title>ENGLISH WORDS AND DATA BASES: HOW TO BRIDGE THE GAP</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> I INTRODUCTION </SectionTitle> <Paragraph position="0"> If a question-answering system is to cover a non-trivial fragment of its natural input-language, and to allow for an arbitrarily structured data base, it cannot assume that the syntactic/semantic structure of an input question has much in common with the formal query which would formulate in terms of the actual data base structure what the desired information is. An important decision in the design of a q.a. system is therefore, how to embody in the system the necessary knowledge about the relation between English words and data base notions.</Paragraph> <Paragraph position="1"> Most existing programs, however, do not face this issue. They accept considerable constraints on both the input language and the possible data base structures, so as to be able to establish a fairly direct correspondence between the lexical items of the input language and the primitives of the data base, which makes it possible to translate input questions into query expressions in a rather straightforward fashion.</Paragraph> <Paragraph position="2"> In designing the PHLIQAI system, bridging the gap between free English input and an equally unconstrained data base structure was one of the main goals. In order to deal with this problem in a systematic way, different levels of semantic analysis are distinguished in the PHLIQAI program. At each of these levels, the meaning of the input question is represented by an expression of a formal logical language. The levels differ in that each of them assumes different semantic primitives.</Paragraph> <Paragraph position="3"> At the highest of these levels,the meaning of the question is represented by an expression of the English-oriented Formal Language (EFL); this language uses semantic primitives which correspond to the descriptive lexical items of English. The primitives of the lowest semantic level are the primitives of the data base (names of files, attributes, data-items). The formal language used at this level is therefore called the Data Base Language (DBL).</Paragraph> <Paragraph position="4"> Between EFL and DBL, several other levels of meaning representation are used as intermediary steps. Because of the space limitations imposed on the present paper, I am forced to evoke a somewhat misleading picture of the PHLIQA set-up, by ignoring these intermediate levels.</Paragraph> <Paragraph position="5"> Given the distinctions just introduced, the problem raised by the discrepancy between the English lexicon and the set of primitives of a given data base can be formulated as follows: one must devise a formal characterization of the relation between EFL and DBL, and use this characterization for an effective procedure which translates EFL queries into DBL queries. I will introduce PHLIQA's solution to this problem by giving a detailed discussion of some examples I which display complications that Robert Moore suggested as topics for the panel discussion at this conference.</Paragraph> </Section> class="xml-element"></Paper>