File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1013_metho.xml
Size: 15,000 bytes
Last Modified: 2025-10-06 14:11:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1013"> <Title>THE SYNTAX AND SEMANTICS OF USER-DEFINED MODIFIERS IN A TRANSPORTABLE NATURAL LANGUAGE PROCESSOR</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> THE SYNTAX AND SEMANTICS OF USER-DEFINED MODIFIERS IN A TRANSPORTABLE NATURAL LANGUAGE PROCESSOR </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> The Layered Domain Class system (LDC) is an experimental natural language processor being developed at Duke University which reached the prototype stage in May of 1983. Its primary goals are (I) to provide English-language retrieval capabilities for structured but unnormaUzed data files created by the user, (2) to allow very complex semantics, in terms of the information directly available from the physical data file; and (3) to enable users to customize the system to operate with new types of data. In this paper we shall discuss (a) the types of modifiers LDC provides for; (b) how information about the syntax and semantics of modifmrs is obtained from users; and (c) how this information is used to process English inputs.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> I INTRODUCTION </SectionTitle> <Paragraph position="0"> The Layered Domain Class system (LDC) is an experimental natural language processor being developed at Duke .University. In this paper we concentrate on the typ.~s of modifiers provided by LDC and the methods by which the system acquires information about the syntax and semantics of user-defined modifiers. A more complete description is available in \[4,5\], and further details on matters not discussed in this paper can be found in \[1,2,6,8,9\].</Paragraph> <Paragraph position="1"> The LDC system is made up of two primary components. First, the Ic'nowledge aeTui.~i2ion component, whose job is to find out about the vocabulary and semantics of the language to be used for a new domain, then inquire about the composition of the underlying input file. Second, the User-Phase Processor, which enables a user to obtain statistical reductions on his or her data by typed English inputs.</Paragraph> <Paragraph position="2"> The top-level design of the User-Phase processor involves a linear sequence of modules for scavtvtir~g the input and looking up each token in the dictionary; pars/rig the scanned input to determine its syntactic structure; translatiort of the parsed input into an appropriate formal query; and finally query processing.</Paragraph> <Paragraph position="3"> .........................................</Paragraph> <Paragraph position="4"> This research has been supported in part by the National Science Foundation, Grants MCS-81-16607 and IST-83-01994; in part by the National Library of Medicine, Grant LM-07003; and in part by the Air Force Office of Scientific Research, Grant 81-0221.</Paragraph> <Paragraph position="5"> The User-Phrase portion of LDC resembles familiar natural language database query systems such as INTELLECT, JETS. LADDER, LUNAR. PHLIQA, PLANES, REL, RENDEZVOUS, TQA, and USL (see \[10-23\]) while the overall LDC system is similar in its objectives to more recent systems such as ASK, CONSUL, IRUS, and TEAM (see \[24-319.</Paragraph> <Paragraph position="6"> At the time of this writing, LDC has been completely customized for two fairly complex domains.</Paragraph> <Paragraph position="7"> from which examples are drawn in the remainder of the paper, and several simpler ones. The complex domains are a 2~al gTz, des domain, giving course grades for students in an academic department, and a bu~di~tg ~rgsvtizatiovt domain, containing information on the floors, wings, corridors, occupants, and so forth for one or more buildings. Among the simpler domains LDC has been customized for are files giving employee information and stock market quotations.</Paragraph> </Section> <Section position="4" start_page="0" end_page="53" type="metho"> <SectionTitle> II MODIFIER TYPES PROVIDED FOR </SectionTitle> <Paragraph position="0"> As shown in \[4\]. LDC handles inputs about as complicated as students who were given a passing grade by an instructor Jim took a graduate course from As suggested here, most of the syntactic and semantic sophistication of inputs to LDC are due to noun phrase modifiers, including a fairly broad coverage of relative clauses. For example, if LDC is told that &quot;students take courses from instructors&quot;, it will accept such relative clause forms as students who took a graduate course from Trivedi courses Sarah took from Rogers instructors Jim took a graduate course from courses that were taken by Jim students who did not take a course from Rosenberg We summarize the modifier types distinguished by LDC in Table i. which is divided into four parts roughly corresponding to pre-norninal, nominal, post-nominal, and negating modifiers. We have included several modifier types, most of them anaphorie, which are processed syntactically, and methods for whose semantic processing are being implemented along the lines suggested in \[7\].</Paragraph> <Paragraph position="1"> Most of the names we give to modifier types are selfexplanatory, but the reader will notice that we have chosen to categorize verbs, based upon their semantics, as tr~Isial verbs, irrtplied para~ter verbs; and operational verbs. &quot;Trivial&quot; verbs, which involve no semantics to speak of, can be roughly paraphrased as &quot;be associated with&quot;. For example, students who take a certain course are precisely those students associated ~ith the database records related to the course.</Paragraph> <Paragraph position="2"> &quot;Implied parameter&quot; verbs can be paraphrased as a longer &quot;trivial&quot; verb phrase by adding a parameter and requisite noise words for syntactic acceptability. For example, students who fai/a course are those students who rrmlce a grade of F in the course. Finally, &quot;operational&quot; verbs require an operation to be performed on one or more of its noun phrase arguments, rather than simply asking for a comparison of its noun phrase referent(s) against values in specified fields of the physical data file. For example, the students who oz~tscure Jim are precisely those students who Trtake a grade h~gher than the grade of Jirm At present, prepositions are treated semantically as trivial verbs, so that &quot;students in AI&quot; is interpreted as &quot;students associated with records related to the AI course&quot;.</Paragraph> <Paragraph position="3"> Table 1 - Modifier Types Available in LDC Negations the non graduate students (of many sorts) offices not adjacent to X-23B instructors that did not teach M yes yes etc.</Paragraph> </Section> <Section position="5" start_page="53" end_page="53" type="metho"> <SectionTitle> III KNOWLEDGE ACQUISITION FOR MODIFIERS </SectionTitle> <Paragraph position="0"> The job of the knowledge acquisition module of LDC, called &quot;Prep&quot; in Figure 1, is to' find out about (a) the vocabulary of the new domain and (b) the composition of the physical data file. This paper is concerned only with vocabulary acquisition, which occurs in three stages. In Stage 1, Prep asks the user to name each ent~.ty, or conceptual data item, of the domain. As each entity name is given, Prep asks for several simple kinds of information, as in having the given entity as surface subject, as in</Paragraph> </Section> <Section position="6" start_page="53" end_page="53" type="metho"> <SectionTitle> ACQUIRING VERBS FOR STUDENT: A STUDENT CAN pass a course </SectionTitle> <Paragraph position="0"> fail a course take a course from an instructor make a grade from an instructor make a grade in a course In Stage 2, Prep learns the rnorhological variants of words not known to it, e.g. plurals for nouns, comparative and superlative forms for adjectives, and past tense and participle forms for verbs. For example, verbs, and other modifier types, based upon the following principles.</Paragraph> <Paragraph position="1"> 1. Systems which attempt to acquire complex semantics from relatively untrained users had better restrict the class of the domains they seek to provide an interface to.</Paragraph> <Paragraph position="2"> For this reason, LDC restricts itself to a class of domains \[1\] in which the important relationships among domain entities involve hierarchical decompositions.</Paragraph> <Paragraph position="3"> 2. There need not be any correlation between the type of modifier being defined and the way in which its rr~eaTt/rtg relates to the underlying data file. For this reason, Prep acquires the meanings of all user-defined modifiers in the same manner by providing such primitives as id, the identity function; va2, which retrieves a specified field of a record; vzzern, which returns the size of its argument, which is assumed to be a set; sum, which returns the sum of '.'-s list of inputs; aug, which returns the average of its list of inputs; and pct, which returns the percentage of its list of boolean arguments which are true. Other user-defined adjectives may also be used. Thus, a &quot;desirable instructor&quot; might be defined as an instructor who gave a good grade to more than half his students, where a &quot;good grade&quot; is defined as a grade of B or above. These two adjectives may be specified as shown below.</Paragraph> </Section> <Section position="7" start_page="53" end_page="53" type="metho"> <SectionTitle> ACQUIRING SEMANTICS FOR DESIRABLE INSTRUCTOR </SectionTitle> <Paragraph position="0"> As shown here, Prep requests three pieces of information for each adjective-entity pair, namely (1) the pv-/.rn.ary (highest-level) and ~c~rget \[lowest-level) entities needed to specify the desired adjective meaning; (2) a list of furtcticvts corresponding to the arcs on the path from the primary to the target nodes; and finally (3) a pred/cate to be applied to the numerical value obtained from the series of function calls just acquired.</Paragraph> </Section> <Section position="8" start_page="53" end_page="55" type="metho"> <SectionTitle> IV UTILIZATION OF THE INFORMATION ACQUIRED DURING PREPROCESSING </SectionTitle> <Paragraph position="0"> As shown in Figure i, the English-language processor of LDC achieves domain independence by restricting itself to (a) a domain-independent.</Paragraph> <Paragraph position="1"> linguistically-motivated phrase-structure grammar \[6\] and (b) and the domain-specific files produced by the knowledge acquisition module.</Paragraph> <Paragraph position="2"> The simplest file is the pattern file, which captures the morphology of domain-specific proper nouns, e.g. the entity type &quot;room&quot; may have values such as X-238 and A-22, or &quot;letter, dash. digits&quot;. This information frees us from having to store all possible field values in the dictionary, as some systems do, or to make reference to the physical data file when new data values are typed by the user, as other systems do.</Paragraph> <Paragraph position="3"> The domain-specific d/ctlon~ry file contains some standard terms (articles, ordinals, etc.) and also both root words and inflections for terms acquired from the user. The sample dictionary entry (longest Superl long (nt meeting week)) says that &quot;longest&quot; is the superlative form of the adjective &quot;long&quot;, and may occur in noun phrases whose 'head noun refers to entities of type meeting or week.</Paragraph> <Paragraph position="4"> By having this information in the dictionary, the parser can perform &quot;local&quot; compatibility checks to assure the integrity of a noun phrase being built up, i.e. to assure all words in the phrase can go together on non-syntactic grounds. This aids in disambiguation, yet avoids expensive interaction with a subsequent semantics module.</Paragraph> <Paragraph position="5"> related to negation Interestingly, most meaningful interpretations of phrases containing &quot;non&quot; or &quot;not&quot; can be obtained by inserting the retrieval r2.odule's Not command at an appropriate point in the macro body for the modifier in question. For example, An opportunity to perform &quot;non-local&quot; compatibility checking is provided for by the eompat file, which tells (a) the case structure of each verb, i.e. which prepositions may occur and which entity types may fill each noun phrase &quot;slot&quot;, and (b) which pairs of entity types may be linked by each preposition. The former information will have been acquired directly from the user, while the latter is predicted by heuristics based upon the sorts of conceptual relationships that can occur in the &quot;layered&quot; domains of interest \[1\].</Paragraph> <Paragraph position="6"> Finally, the macro file contains the meanings of modifiers, roughly in the form in which they were acquired using the specification language discussed in the previous section. Although this required us to formulate our own retrieval query language \[3\], having complex modifier meanings directly exceutable by the retrieval module enables us to avoid many of the problems typically arising in the translation from parse structures to formal retrieval queries* Furthermore, some modifier meanings can be derived by the system from the meanings of other modifiers, rather than separately acquired from the user* For example, if the meaning of the adjective &quot;large&quot; has been given by the user, the system automatically processes &quot;largest&quot; and &quot;larger than ...&quot; by appropriately interpreting the macro body for &quot;large&quot;.</Paragraph> <Paragraph position="7"> A partially unsolved problem in macro processing involves the resolution of scope ambiguities students who were not failed by Rosenberg might or might not be intended to include students who did not take a course from Rosenberg. The retrieval query commands generated by the positive usage of &quot;fail&quot;, as in students that Rosenberg failed would be the sequence instructor -- Rosenberg; student -> fail so the question is whether to introduce &quot;not&quot; at the phrase level not iinstructor = Rosenberg; student -> fail~ or instead at the verb level instructor = Rosenberg; not ~student -> fail\] Our current system takes the literal reading, and thus generates the first interpretation given* The example points out the close relationship between negation scope and the important problem of &quot;presupposition&quot;, in that the user may be interested only in students who had a chance to be failed*</Paragraph> </Section> class="xml-element"></Paper>