File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/79/j79-1024_intro.xml
Size: 8,060 bytes
Last Modified: 2025-10-06 14:04:14
<?xml version="1.0" standalone="yes"?> <Paper uid="J79-1024"> <Title>American Journal of Computational Linguistics Mi crof i che 2 4 THE SQAP DATA BASE FOR NATURAL LANGUAGE INFOR8MATION</Title> <Section position="4" start_page="5" end_page="10" type="intro"> <SectionTitle> 0. Introduction </SectionTitle> <Paragraph position="0"> This paper describes the natural language data base structure used in the SQ,AP system we dish Question Answering system).</Paragraph> <Paragraph position="1"> Much of that system is already working, but the paper does not only describe the solutions to solved problems. Difficulties and unsolved problems are also presented, since I feel this is important to further progress.</Paragraph> <Paragraph position="2"> One of the goals of the SQAP project was to create a question-answering system capable of handllng facts of many different kinds. The system should thus not be restricted to a small special application area.</Paragraph> <Paragraph position="3"> 1. . Natural language representation There is an obvious need for computers with a capability to converse in natural human languages. Natural lmguages are more general-purpose than most artificial languages, which means that you can talk about a wider subject area if you use natural ldnguages. Natural languages can be used by everyone without special trafning, so computers talking natural language can make more people able to use more different computer facilities. Finally, a rizing part of computer usage in the future will be unintelligent processing of natural language texts, and such systems can be improved if the processing is not wholly unintelligent.</Paragraph> <Paragraph position="4"> There are also wellknown difficulties with natural languages for computers. Natural language is closely connected to human knowledge. Therefore, natural laneage sentences can only be understood by a mm or a computer with factual howledge about the subject matter and with the ability to reason w,ith those facts. To disambiguate such wellknown examples as &quot;The pig was in the pentf (~ar- ille el 1964) or !!He went to the park with the girlu (~chank 1969) the computer must have an underlying knowledge about various kinds of &quot;pens&quot; , about where &quot;the girlw was previously and so On.</Paragraph> <Paragraph position="5"> Also, the same thing can be said in many different ways, and a computer with natural language capabilities must be able to understand this, so that for example it can see the similarity between &quot;Find the mean income of unmarried women with at least two bhildren.lf and llSearch through -bhe personell file. For each individual who is a woman, who is not married, and who has a number of children greater -khan two, accumulate income to calculate the mean.11 Theref ore, a computer undexst anding natural language must have a data base with basic factual knowledge about the world in general or abouf the subject matter which the compvter is to be used for.</Paragraph> <Paragraph position="6"> This data base is needed to understand ambiguous sentences, but also to interpret the sentences into executable data processing commands.</Paragraph> <Paragraph position="7"> The requirements on such a data base are : - You should be able to store a wlde variety of different kinds of facts. Natural languages are very general-purpose. so the data base should also be general-purpose.</Paragraph> <Paragraph position="8"> - You should be able to use this data base to make deductions. The capability to do simple and natural deductions fast is more important than the capability to make very adv~ced and longdrange deductions. Since the data base will be large, an important part of deduction will be the selection of the relevant facts a~d rules out of the large mass of facts not needed for one special deduction.</Paragraph> <Paragraph position="9"> The data base can be .more or less close to natural language. A data base close to natural language makes input translation easier, and also the loss of nuances during the input translation will be smaller. But the data base must on the other hand have a logical structure which is suitable for deduction knd fact searching.</Paragraph> <Paragraph position="10"> One model of natural language knowledge is the following: The knowledge consists of &quot;conceptst1 and of rules relating these concepts to each other. A typical concept might be &quot; John1' , TrAll young menrf, ltThe event when John meets Maxy in the pa.rkn or &quot;The month of July, 1973&quot;. The concepts are related by rules, which can be very simple relations (like the relation between &quot;111 young ment1 and the property ffyoungn) or complex patterns of concepts (~ike the rule &quot;If Mary is weak and tired, and she meets a strong brutal man, then she will be frightened.&quot;) These rules form a network linking all concepts together.</Paragraph> <Paragraph position="11"> This model of natural language is close to that often used by psychologists in trying to explain the working of the intelligence in the human mind.</Paragraph> <Paragraph position="12"> fPhe SQAP system uses a data base of that kind. The model may at first seem simple and straightfbrwazd. When you try to produce a worldng question-a,nsw&ring system, you will however find that there are many difficulties and complications with such a data base. Thia report presents the raBe,% wortant of the problems we have met, and in some cases also our solutions. 1 believe that o%her producer8 of natural language system will sooner or later encounter the same problem, and they may then benefit from our experience as presented in this paper.</Paragraph> <Paragraph position="13"> 2, Zntroduution to our data base, During the 1960:s, several researchers independently and simultaneously came up with -&he same basio (idea of organizir@ such a data base - Sandewall 1965, Simmons 1971 , 'Shapiro 1971 . So6e of them were influenced by the caae gqanunar of Fillmore 196'8. The idea is that the data base is organized into nodes, each node representing a concept. In natmal language, the prepositions are used to ,represent short simple and direct relations between concepts &quot;John is - in the bed&quot;, &quot;The fire was lit & Marytt In the data base, the idea of prepositions is extended so that all aimple ad direct relations between concepts are represented by implicit prepositions. (~ust as you could say that there is aa implicit preposition Itby&quot; in the phrase &quot;Mary lit the fire&quot; .) Yore complex rules or relations between concepts are represented by extra concepts. Thus there is a concept for the event ItMary lit the fire1' and this concept is related to &quot;Maryw, Itthe flreu and &quot;act of lightingv in a structure like that in figure 1. This structure has four concepts linked together by three &quot;prepositiona,lIt relations : CASE, BY and OBJ. From now on, I will in this paper call such relations ltshort relationsf1. The data base is organized so that the deduction rules can follow the short relakions in both directions, that is go from &quot;M+ryI1 to ItMary lit the firet1 or from &quot;Ma,z?y lit the fireff to &quot;Mary&quot;.</Paragraph> <Paragraph position="14"> 3. Objects, events md prdicates Noun phrases in natural language usually refer to one or a set of objects in the real world, like f~Stockholmll or &quot;hrery house lin Swedent1 or &quot;The nice man with a bicycle1v. In our system each such concept is represented by a node in the data base, which could be called an object node.</Paragraph> <Paragraph position="15"> Each object node is associated with one or more predicate nodes expressing properties of that object. In our data base, we mark predicates with the postfix ct*Pcr. Thus, the phrase &quot;An always happy girl&quot; would in our data base be represented like in figure 2:</Paragraph> </Section> class="xml-element"></Paper>