File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1058_metho.xml
Size: 13,956 bytes
Last Modified: 2025-10-06 14:11:30
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1058"> <Title>LOOK-UP FOR RELEVANT STATEMENTS INFERENCE LOOK-UP FOR ANSWER SYNTHESIS</Title> <Section position="1" start_page="0" end_page="357" type="metho"> <SectionTitle> NATURAL LANGUAGE UNDERSTANDING AND THE PERSPECTIVES OF QUESTION ANSWERING </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="357" type="sub_section"> <SectionTitle> Czechoslovakia </SectionTitle> <Paragraph position="0"> A method of automatic answering of questions in natural language, based only on input texts and a set of rules of inference, is described. A first experimental system including a grammatico-semantic analysis of the input texts and questions, a procedure of inferencing, a search for appropriate answers to individual questions and a synthesis of the answers are being implemented, mainly in the language Q and PL/I. The output of the analysis, the underlying representations of the utterances of the input text, serves as a base of the knowledge representation scheme, on which the inference rules (mapping dependency trees into dependency trees\] operate.</Paragraph> <Paragraph position="1"> The important, though partial possibilities of automatic understanding of natural language gave rise to different kinds of experimental systems, ranging from sophisticated systems of machine translation through various kinds of modelling of dialogue (with robots, data bases, etc.\] to question answering. 1 From a linguistic viewpoint the main challenge consists in attempting to transfer the burden of the communication between humans and computers to the latter, who should be able to react in an appropriate way to the user s input texts formulated in her or his native language, without serious restrictions. The necessity of thousands of human beings preparing data &quot;for computers&quot; (not only encoding messages, but also compiling data bases\] should be removed.</Paragraph> <Paragraph position="2"> This challenge constitutes one of the central tasks of modern linguistics; an explicit description of the main features of the language system, which is necessary for these purposes, must be based on a sound theoretical framework suitable for the description of grammar as well as of the linguistically patterned aspects of semantics and pragmatics. A close cooperation of linguistics with logic, computer science and cognitive science has become urgent. This task presents also an effective way of checking the results of theoretical linguistics in various important fields.</Paragraph> <Paragraph position="3"> These considerations have led the group of algebraic linguistics in Prague (now belonging to the department of applied mathematics, faculty of mathematics and physics, Charles University) to start working on an experimental system based on the approach called TIBAQ (Text-and-~nference Based Answering of Questions).2 Its four main procedures are (I\] grammatico-semantic analysisi (2\] rules of inference, (3) identification of a full (direct\] Qr partial answer, and (4\] synthesis; see the overall scheme in Fig. I.</Paragraph> </Section> </Section> <Section position="2" start_page="357" end_page="357" type="metho"> <SectionTitle> LOOK-UP FOR RELEVANT STATEMENTS INFERENCE LOOK-UP FOR ANSWER SYNTHESIS ~- ANALYSIS m~eaning </SectionTitle> <Paragraph position="0"> is it a question? set of statements set of relevant statements enriched set of relevant statements ~- answers Fig. 1 An overall scheme of a system based on the method TIBAQ (I) The automatic grammatico-semantic a n a 1 y s i s3 is being prepare~ in such a form that it can handle Czech and English polytechnical texts (papers, reports, monographs) in their usual shape, and also questions formulated in Czech. Thus there will be no need for the user to &quot;cope with the needs of the computer system&quot;. The procedure of analysis has the following characteristic properties distinguishing it from a mere parsing procedure: (i) The analysis procedure is based on a systematic theoretical account of the structure of natural language, the functional generative description; this linguistic approach, elaborated in the Prague group of algebraic linguistics, 4 makes use of the results of the empirical research carried out in the frame of European structural linguistics, and also of the methodological requirements formulated by Chomsky and the different wings that developed from his school. The resulting linguistic approach is used as a general base ensuring that the particular practical solutions (in ambiguity removal, etc.) chosen for a restricted area can be replaced by more generally valid sets of rules, whenever it appears as necessary to cross the boundaries of this narrow area Ce.g. when applying the method to a new kind of texts, to a new polytechnical domain, etc.). This is ensured thanks to the universal character of natural language and to the fact that the linguistic framework (if appropriately chosen) provides means for an adequate description of all its subdomains Ccf. Haji~ov~ and Sgall, 198Oa).</Paragraph> <Paragraph position="1"> (ii) In connection with this requirement the analysis procedure is designed to transfer the input sentences from their outer form to a disambiguated notation of their meanings <which can be identified with their underlying structures, in the framework of functional</Paragraph> </Section> <Section position="3" start_page="357" end_page="357" type="metho"> <SectionTitle> LANGUAGE UNDERSTANDING AND QUESTION ANSWERING 359 </SectionTitle> <Paragraph position="0"> generative description). The level of meaning of sentences includes such syntactic units as Actor, Objective, Addressee and other participants or cases, Manner, Instrument, Place, Direction and other free adverbial modifications, as well as lexical and morphological meanings (the latter including e.g. number, tense, modalities>. This level is formulated as a linguistic counterpart of intensional structure,which makes it possible to define the concept of strict synonymy of expressions and to ensure an algorithmic transition'to a postulated universal formal language of intensional logic~<among the trends that started with Montague, our account of meaning- stands close to that by David Lewis, though the form of formal language we prefer has much in common with Tich~'s framework>. The representations of the meanings of sentences serve as the main components of knowledge representation in the semantic networks of the systems based on the method TIBAQ.</Paragraph> <Paragraph position="1"> They can be illustrated by the representation in Fig. 2.</Paragraph> <Paragraph position="2"> <iii> As can be seen from this representation, our approach works with dependency trees as the form of meanings of sentences.</Paragraph> <Paragraph position="3"> This allows us to work with relatively simple underlying structures in which such notions as &quot;head&quot; and &quot;modifier&quot;, or &quot;noun&quot; phrase vs. &quot;verb&quot; phrase, as well as the relations described by Fillmore as cases find an economical treatment.</Paragraph> <Paragraph position="4"> <iv> Not only the roles of the elements Of syntactic relations, but also the topic-focus articulation of sentences finds its proper place in the representations yielded by this procedure of analysis.</Paragraph> <Paragraph position="5"> Also the whole pragmatically based interplay of topic, fccus, contextual boundness and communicative dynamism, as combined with the recursive properties of sentence structure can in principle be rendered in the chosen form of representations of the meanings of sentences. 6 Analysis of written texts does not allow for a complete identification of all the items relevant for the topic-focus articulation, and the present form of our algorithms gives results which are not fully reliable, but the errors appear to be neither too numerous nor too grave for the given purpose. The main rules consiat in understanding the parts of a sentence standing to the left of the finite verb as belonging to the topic, while the verb itself <if it is not semantically void, as the copula, or become, carry out, etc.) and the elements following it are classed as belonging to the focus in the Czech polytechpical texts. 7 Such a treatment appears as sufficient for ensuring that those cases in which the topic-focus articulation is semantically relevant will be handled appropriately. This concerns the relative scopes of quantifiers in such sentences as Every car has several wheel_______ss and the &quot;holistic ~ understanding of the topic e.g. in Smokin~ is dangerous, as well as Kuno s &quot;exhaustive listing&quot; and the difference between thetic and categorical judgements; even more important is the relevance of the boundary between topic and focus for the determination of the scope Qf negation, and thus also for the identification of presuppositions in some cases: Many arrowsd~dn't hit the target does not imply that the target wasn-t hit by many arrows, and The king of France didn't come to COLING 82 does not presuppose th~-~xistence of a king of France. The relevance of topic and focus for natural lanuuage understandinq is most clearly recognized in connect-.</Paragraph> <Paragraph position="6"> ion with the assignment of reference to definite noun phrases <and other expressions>.</Paragraph> <Paragraph position="7"> Cv~ The procedure of analysis provides also for a treatment of the interconnections between the individual assertions <which are Stored in the shape of the meanings of sentences). This is done by means of two main devices: first, in the representation of each lexical meaning in the lexicon there is an indication of the relations 360 P. SGALL of synonymy and hyponymy (subordination, su\[erordination> of the given item to others, and also semantic features are used (for a partial modelling of the object domain pertinent to the treated area of polytechnical texts>; 8 second, the relation between an object and the occurrences of expressions referring to it in the texts is handled by means of a register or concordance, supplying addresses of all the occurrences of a given unit in the whole set of knowledge representation. null After having examined different means of implementation of the analysis procedure, esp. Kay's parser, Wood's ATN, the Grenoble system and others, we decided that among the systems actually available to us the framework elaborated in the T.A.U.M. group, based on Colmerauer s Q-systems, can serve best our aims. Thanks to the Canadian colleagues we got the possibility to implement Q-systems <through Fortran> on such computers as IBM 360, EC 1040 (Robotron>and others (by means of a procedure given at our disposal by B. Thouin who together with R.Kittredge introduced us to the intricacies of their systems>.</Paragraph> <Paragraph position="8"> It appeared that Q-systems are a means flexible enough to be used for our purposes, in spite of the fact that several major differences can be found between the original goals Q-systems were designed for and between our goals: after a couple of years of experience our programmers <first of all Z.Kirschner and K.Oliva> are able to use Q-systems for a dependency-based analysis attempting to penetrate into the underlying structures of sentences <which is necessary also for translation between typologically different languages>. The trees Q-systems were designed to operate on can be readily interpreted as standing close to our dependency trees <though instead of each of the nodes exemplified in Fig.2 it is necessary to have a whole subtree composed of several nodes, since Q-language works only with elementary node labels>.</Paragraph> <Paragraph position="9"> Moreover, it became also clear that Q-systems are a suitable means to handle inflectional languages exhibiting complicated systems of morphemic ambiguity and synonymy, 9 as well as the so-called free word order <which is not free at all, but determined by the topic-focus articulation, esp. by communicative dynamism, in a much more straight-forward way than is the case in English>. It is not necessary to work with individual rules for the different permutations of the elements of a sentence, since an approach working - roughly speaking - with an elementary dependency tree for every tentative clause <a finite verb and its neighbours on both sides> is possible, including the use of list variables for the irrelevant parts of the tree. I0 The strong combinatoric power of Q-systems, as well as its relative transparency, made it possible to formulate a procedure of analysis, which is by far not yet complete, but which accounts already for hundreds of kinds of phenomena from the syntax of Czech. These include a relatively complete analysis of the structure of noun phrases, achieved by means of checking the agreement of an adjective with its governing noun, and preferring a noun in the genitive case to be understood as an adjunct of an immediately preceding noun, whenever this is possible, while with the other oblique cases <simple and prepositional> there is a complex scale, elaborated by J.Panevov~, deciding whether the given noun functions as an adjunct of this or that preceding noun or as a modifier of the verb <the indices of the given nouns, verbs and morphemic means are used to determine the specific dependency relation>. The participants modifying the verb are identified with the help of lexical data concerning valency <obligatory and optional modifications and their usual morphemic forms>. ~fe mentioned alreaJy the identif~catfon of topic and focus, achieved precisely on the base of the &quot;fzee&quot; word order.</Paragraph> </Section> class="xml-element"></Paper>