File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/a88-1007_metho.xml
Size: 18,659 bytes
Last Modified: 2025-10-06 14:12:01
<?xml version="1.0" standalone="yes"?> <Paper uid="A88-1007"> <Title>IMPROVED PORTABILITY AND PARSING THROUGH INTERACTIVE ACQUISITION OF SEMANTIC INFORMATION t</Title> <Section position="4" start_page="50" end_page="51" type="metho"> <SectionTitle> 2. METHODOLOGY </SectionTitle> <Paragraph position="0"> The essential feature of our parser which facilitates the collecting of syntactic patterns is the INTERMEDIATE SYNTACTIC REPRESENTATION (ISR) produced by the syntactic analyser. The ISR is the result of regularizing the surface syntactic structure into a canonical form of operators and arguments. Since there are only a limited number of structures which can appear in an ISR, we have been able to write a program to analyze the ISR and examine the syntactic patterns as they are generated.</Paragraph> <Paragraph position="1"> The engi,eer repaired the broke, sac A brief note about the implementation: Since the ISR is represented as a Prolog llst, the program which analyzes it was written as a definite-clause grammar and has the flavor of a small parser. As a sample ISR, we present in Figure 1 the regularized representation of the obvious parse for the sentence The englneer repaired the broke~ mac (pretty-printed for clarity). At the top level, the ISR consists of the main verb (preceded by its tense operators), followed by its subject and object. The ISR of a noun phrase contains first the determiner (labelled TPOS), then the head noun, (the label NVAR stands for &quot;noun or variant&quot;), and finally any nominal modifiers. Note that part of the regularisation performed by the ISR is morphological, since the actual lexical items appearing in the ISR are represented by their root forms. Hence broken in the input sentence is regularized to break in the ISR, and rep41red in the input sentence appears in the ISR simply as repair.</Paragraph> <Paragraph position="2"> SPQR is invoked by two restrictions which are called after the BNF grammar has assembled a complete NP (and constructed the ISR for that NP), and after it has assembled a complete sentence (and constructed its ISR). The program operates by presenting to the user a syntactic pattern (either a head-modifier pattern or a predicate-argument pattern) found in the ISR, and querying hlm/her about the acceptability of that pattern.</Paragraph> <Paragraph position="3"> For each of the basic types of patterns which the program currently generates, the chart in Table 1 shows that pattern's components, an example of that pattern, and a sentence in which the pattern occurs. When presented with a syntactic pattern such as those in the chart in Table 1, the user can respond to the query in one of two ways, depending on the semantic compatibility of the predicate and arguments (e.g., in the case of an SVO pattern) or of the head and modifiers (e.g., for an ADJ pattern) contained in the pattern. If the pattern describes a relationship that can be said to hold among domain entitles (i.e., if the pattern occurs in the sublanguage), the user accepts the pattern, thereby classifying it as good. The analysis of the ISR and the parsing of the sentence are then allowed to continue. If, however, the pattern describes a relationship among domain entities that is not consistent with the user's domain knowledge or with his/her pragmatic knowledge (i.e., if the pattern cannot or does not occur in the sublanguage) the user rejects it, thereby classifying it as bad, and signalling an incorrect parse. This response causes the restriction which checks selection to fail, and as a result, the parse under construction is immediately failed, and the parser backtracks.</Paragraph> <Paragraph position="4"> As the user classifies these co-occurrence patterns into good patterne and bad patterns, they are stored in a pattern database which is consuited before any query to the user is made. Thus, once a pattern has been classified as good or bad, the user is not asked to classify it again. If a pattern previously classified as bad by the user is encountered in the course of analyzing the ISR, SPQR consults the database, recognizes that the pattern is bad, and automatically fails the parse being assembled. Similarly, if a pattern previously</Paragraph> </Section> <Section position="5" start_page="51" end_page="53" type="metho"> <SectionTitle> PATTERN COMPONENTS EXAMPLE </SectionTitle> <Paragraph position="0"> (1) SVO subject, main verb, object inspection reveal particle \[NSPECTION o~ iube oll ~lter REYEALED metal PARTICLES.</Paragraph> <Paragraph position="1"> (2) ADJ adjective, head* normal pressure Troxblesbootlng re~ealed NORMAL sac lxbe oil PRESSURE.</Paragraph> <Paragraph position="2"> (3) ADV head, adverb decrease rapidly Sac air pressure DECREASED RAPIDLY to 5.7~ psi.</Paragraph> <Paragraph position="3"> (4) CONJ conjunct1, conjunction, conjunct2 pressure and temperature Troublemhooting rewealed normal PRESSURE AND TEMPERATURE.</Paragraph> <Paragraph position="4"> (5) NOUN-NOUN noun modifier, head VAL V~ PARTS ezeeesi, e/lr (c)orroded.</Paragraph> <Paragraph position="5"> (6) PREP head, prep, object DIHENGA GED immedlatel I AFTER ALARM.</Paragraph> <Paragraph position="6"> (7) PREDN noun, predicate nominal Alarm CAPABILITY is a NECESSITY.</Paragraph> <Paragraph position="7"> valve part disengage after alarm capability necessity *We use &quot;head&quot; throughout the chart to denote the head of a construction in which a modifier appears. The head can simply be thought of as that word which the modifier modifies. recorded as good is encountered, SPQR will recognine that the pattern is good simply by consulting the database, and allow the parsing to proceed.</Paragraph> <Paragraph position="8"> The selectional mechanism as described so far deals only with lexical patterns (i.e., patterns involving specific lexical items appearing in the lexicon). However, we have implemented a method of generalising these patterns by using information taken from the domain isa (generallzation/specialization) hierarchy to construct semantic class patterns from the lexical patterns. After deciding whether a given pattern is good or bad, the user is asked if the relation described by the pattern can be generalized. In presenting this second query, SPQR shows the user all the super-concepts of each word appearing in the pattern, and asks for the most general superconcept(s), if any, for which the relation holds. Let us take as an example the noun-noun pattern generated by the compound nominal oll pressure. ~Vhlle parsing a sentence containing this expression, the user would accept the noun-noun pattern \[oil, presexre\]. The program will then show the user in hierarchically ascending order all the generalizations for oil (fllld, ph~aieal_obieet, and root_eo~eept), and all the generalizations for premesre (mcalar_qxa~tlty, obiect_property, abatract_obyect, and again root_C/o~cept). The user can then identify which of those super-concepts of oll and preuxre can form a semantically acceptable compound nominal. In this case, the correct generalization would be \[fluid, aealar_qxantity\], because * The fluids in the domain are oil, air, and water; the scalar quantities are pressure and temperature; and it is consistent with the domain to speak of the pressure and the temperature of oil, air, and water.</Paragraph> <Paragraph position="9"> * We cannot generalize higher than fluid since it would be semantically anomalous to speak of &quot;physical_object pressure&quot; for every phyaieal_ob~eet in the domain (e.g., one would not speak of eonneetlng_pin pressure or gearboz pressure).</Paragraph> <Paragraph position="10"> * We cannot generalize higher than pressure since aKape is also an objeet_propertT, and it would be infelicitous to speak of oll sKape.</Paragraph> <Paragraph position="11"> As with the lexical-level patterns, the user's generalizations are stored for reference in evaluating patterns generated by other sentences. The obvious advantage of storing not just lexical patterns but also semantic patterns is the broader coverage of the latter: Knowing that the semantic class pattern ~nld, pressure\] is semantically acceptable provides much more information than knowing only that the lexical pattern loll, pressure\] is good.</Paragraph> <Paragraph position="12"> g. SOME (SIMPLIFIED) EIAMPLES As we mentioned earlier, multiple syntactic analyses which can only be disamhiguated by using semantic information abound in our corpuses because of the telegraphic and fragmentary nature of our texts. This ambiguity has two principal causes: (1) A sentence which parses correctly as a fragment can often be parsed as a full assertion as well.</Paragraph> <Paragraph position="13"> (2) Determiners are often omitted from our sen- null tences, thus making it difficult to establish NP boundaries.</Paragraph> <Paragraph position="14"> Since such syntactically degenerate sentences will generally contain fewer syntactic markers than full, non-telegraphic English sentences, they are characterized by correspondingly greater ambiguity. We now present an example of the use of selection to rule out a semantically anomalous assertion parse in favor of a correct fragment reading, s Consider the sentence Loss of second installed sac. In the correct analysis, the sentence is parsed is a noun string fragment; however, another reading is available in which the sentence is analyzed as a full assertion, with loss of second as the subject, installed as main verb, and sac as direct object. A paraphrase of this parse might be The loss of a second installed the sac. But this analysis is semantically completely anomalous for several reasons, but most notably because it tin this simplified explanation, we present only the svo pattern. In actual parsing of this sentence, however, additional patterns would be generated from the NP level. makes no sense to say that the loss of a second can cause a sac (or anything else) to be installed. Since our parser tries assertion parses before fragment parses, the incorrect reading of this sentence is produced first. In generating the assertion parse, the parser encounters the svo pattern \[loss, install, sae\], and queries the user as follows: -SVO- pattern : loss install sac This query asks if a loss can install a sac in this domain, or if a domain expert would ever speak of a loss installing a sac. Since it is nonsensical to speak of a loss installing a sac, the correct response to SPQR's query in this case is to reject the pattern, causing the assertion parse to fail after the module elicits the appropriate generalizations of the pattern.</Paragraph> <Paragraph position="15"> In order to generalize the pattern, the user is shown all the the super-ordinates of toes and sac, and asked to generalize the anomalous SVO pattern lion, install, sae\]. The super-concepts of loss are /allure, problem, scent, abstract_object, and root_concept. The super-concepts of sac are unit, meehanical_dewiee, system_component, pAysleal_objeet, and root_concept. Since nothing that is an abstract object can install anything at all, the correct generalization would be \[abstract_object, install, root_concept\]. Since the user's response to the original prompt labelled the pattern as bad, the assertion parse under construction then fails, and the parser backtracks. An especially convoluted example of assertion-fragment ambiguity is found in the sentence Ezperieneed frequent /oases of pressure following clutch engage command. In the correct reading (which is again a fragment), the subject is elided, the main verb is effiperlenesd, and the direct object is the frequent losses of pressure (in this parse, following eluteA engage command functions as a sentence adjunct, with clutch engage command as a compound nominal).</Paragraph> <Paragraph position="16"> However, in another reading generated by our parser, the subject is experienced frequent losses of pressure following clutch, the main verb is engage, and command is the direct object. This reading would fail selection at the SVO level (if not sooner) because the svo pattern \[loss, engage,</Paragraph> </Section> <Section position="6" start_page="53" end_page="53" type="metho"> <SectionTitle> W/OUT SPQR </SectionTitle> <Paragraph position="0"/> </Section> <Section position="7" start_page="53" end_page="53" type="metho"> <SectionTitle> TIMING RATIO TO CORRECT PARSE = 0.64 TIMING RATIO TO COMPLETION = 0.85 NEW CORRECT PARSES FOUND USING SPQR = 2 NEW CORRECT FIRST PARSES FOUND USING SPQR = 13 </SectionTitle> <Paragraph position="0"> *That is, which parse, on the average, was the correct one.</Paragraph> <Paragraph position="1"> tSPQR has not yet been optimized.</Paragraph> <Paragraph position="2"> command\] is anomalous for two reasons: The sub-ject of engage cannot be an abstract concept such as loam, and the object of enrage must be a machine part.</Paragraph> </Section> <Section position="8" start_page="53" end_page="54" type="metho"> <SectionTitle> 4. EXPERIMENTAL RESULTS </SectionTitle> <Paragraph position="0"> The experimental results we present here are based on a sample of 31 sentences from one of our CASREP corpuses, each of which was parsed with and without invoking SPQR. We compare results obtained without using the selectional module to results obtained with the parser set to query the user about selectional patterns (starting from an empty pattern database). The chart in Table 2 summarizes the results for the 31 sentences.</Paragraph> <Paragraph position="1"> One of the statistics presented in Table 2 is the SEARCH FOCUS, which is a measure of the efficiency of the parser in either reaching the correct parse of a sentence, or generating all possible parses. It is equal to the ratio of the number of nodes attached to the parse tree in the course of parsing (and possibly detached upon backtracking), 4 to the number of nodes in the completed, correct parse tree. Thus a search focus of 1.0 in reaching the correct parse would indicate that for every (branching) grammar rule tried, the first option was the correct one, or, in other words, that the parser had never backtracked.</Paragraph> <Paragraph position="2"> The first line of the Table 2 chart deserves some explanation. One might wonder how a mechanism designed in part to rule oat parses can actually prodsce a correct analysis for a sentence where none had been available without the module. The explanation is the COMMITTED DISJUNCTION mechanism we have implemented in our parser in order to reduce the (often spurious) ambiguity caused by allowing both full sentential and fragmentary readings. This pruning of the search space is most apparent when the parser is turned loose and set to generate all possible parses, as it was when we gathered the statistics summarized above. Recall that our parser tries full assertion parses before fragment parses. The effect of the COMMITTED DISJUNCTION mechanism is to commit the parser to produce only assertion parses (and no fragment parses) if an assertion parse is found. Fragment parses are tried only if no assertion parse is available. Thus no fragment reading will ever be generated for a sentence which can be analysed as both an assertion and a fragment. This has proved to be the correct behavior in a majority of the texts we have analysed. However, a fragment which can also be analysed as an assertion will never receive a correct parse unless all assertion parses can be blocked using selection. Thus it is possible for selection to make available a correct syntactic analysis where none would be available without selection.</Paragraph> </Section> <Section position="9" start_page="54" end_page="54" type="metho"> <SectionTitle> 5. FUTURE PLANS </SectionTitle> <Paragraph position="0"> Our ultimate goal is to integrate SPQR with the domain model and the semantic component that maps syntactic constituents into predicates 4Another way to interpret this figure is that it represents the number of grammar rules tried.</Paragraph> <Paragraph position="1"> and associated thematic roles. At present, these components are developed independently. Our aim is to link these components in order to maintain consistency and facilitate updating the system. For example, if semantic rules exist to fill thematic roles of a given predicate, we should be able to derlwe a set of &quot;surface&quot; selectional patterns consistent with the underlying semantics.</Paragraph> <Paragraph position="2"> Similarly, given a set of selectional patterns, we should be able to suggest a (set of) semantic rule(s) consistent with the observed selection. In addition, if a word encountered in parsing is not represented in the domain model, it should be possible to suggest where the word should fit in the model, based on similarity to previously observed patterns. If, for example, in the CASREP domain, we encounter a sentence such as The widget broke, but widgets do not appear in our domain model, the system would check for any patterns of the form iX, break\]; if it finds such a pattern, e.g., \[machine_part, break\], the system can then suggest that widget be classified as a machine part in the domain model. If the user concurs, widget would then automatically be entered into the model.</Paragraph> <Paragraph position="3"> In addition to the above work, which is already underway, we plan to improve the user interface, to measure the rate at which selectlonal patterns are acquired, and to investigate the use of selectio~al patterns in developing a weighting algorithm based on frequency of occurrence in the domain.</Paragraph> <Paragraph position="4"> 5.1. The User Interfaee In the current implementation, the questions which the program asks the user are phrased in terms of grammatical categories, and are thus tailored to users who know what is meant by such terms as &quot;svo&quot; and &quot;noun-noun compounds&quot;. As a result, only linguists can be reasonably expected to make sense of the questions and provide meaningful answers. Our intended users, however, are not linguists, but rather domain experts who will know what can and cannot be said in the sublanguage, but who cannot be expected to reason in terms of grammatical categories. Deciding how to phrase questions designed to elicit the desired information is a difficult problem. Our first attempt will be to paraphrase the pattern. E.g., for the svo pattern \[Iou t instal\[, aac\], the query to the user would be something like &quot;Can a loss install a sac?&quot;.</Paragraph> </Section> class="xml-element"></Paper>