File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/p87-1009_metho.xml
Size: 12,992 bytes
Last Modified: 2025-10-06 14:12:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P87-1009"> <Title>Phrasal Analysis of Long Noun Sequences</Title> <Section position="3" start_page="59" end_page="60" type="metho"> <SectionTitle> 2. The PHRA_N-SPAN System </SectionTitle> <Paragraph position="0"> PHRAN, a PHRasal ANalysis program, (.A.rens, 1986) (Wilensky and Arens, 1980), is an implementation of a knowledge-based approach to natural language understanding. The knowledge PHRAN has of the language is stored in the form of pattern-concept pairs (PCPs). The linguistic component of a pattern-concept pair is called a phrasal pattern and describes an utterance at one of various different levels of abstraction. It may be a single word, or a literal string like</Paragraph> <Section position="1" start_page="59" end_page="60" type="sub_section"> <SectionTitle> Digital Equipment Corporation, </SectionTitle> <Paragraph position="0"> or it may be a general phrase such as (1) <~component> <~send> <data> to < component > which allows any object belonging to the semantic category component to appear as the first and last constituents, anything in the semantic category data as the third constituent, any form of the verb 8end as the second, while the lexical item to must appear as the fourth constituent.</Paragraph> <Paragraph position="1"> Associated with each phrasal pattern is a conceptual template, which describes the meaning of the phrasal ~pattern, usually with references to the constituents of the associated phrase. Each PCP encodes a single piece of knowledge about the language the database is describing.</Paragraph> <Paragraph position="2"> For the purpose of describing design specifications and requirements a declarative representation language was devised, called SRL (Specification and Requirements Language). In SRL the conceptual template associated with phrasal pattern (1) above is a form of unidirectional value transfer. In this specific case it denotes the transfer of the data described by the third constituent of the pattern by the controlling agent described by the first constituent to the component described by the fifth. For further details of the representation language used see (Granacki et al, 1987).</Paragraph> <Paragraph position="3"> PHRA_N analyzes input by searching for phrasal patterns that match fragments of it and replacing such fragments with the conceptual template associated with the pattern. The result of matching a pattern may in turn be present as a constituent in a larger pattern. Finally, the conceptual template associated with a pattern that accounts for all the input is used to generate a structure denoting the meaning of the complete utterance.</Paragraph> <Paragraph position="4"> A slightly more involved version of the PCP discussed above is used by PHRAN-SPAN to analyze the sentence: The cpu tranofer8 the code word from the controller to the peripheral device.</Paragraph> <Paragraph position="5"> 3. The Problem wlth Long Noun Sequences Long noun sequences pose considerable difficulty to a natural language analyzer. The problems will be described and treated in this section in terms of phrasal analysis, but they are not artifacts of this approach. A comparison with other approaches to such constructs, mentioned later in this paper, also makes this clear.</Paragraph> <Paragraph position="6"> The main difficulties with multiple noun sequences are: * Determination of their length. One must make sure that the first few nouns are not taken to constitute the first noun phrase, ignoring the words that follow. For example, upon reading bu~ request cycle we do not want the analyzer to conclude that the first noun phrase is simply bus, or bus request.</Paragraph> <Paragraph position="7"> * Interpretation of ambiguous noun/verbs. A large portion of the vocabulary used in digital system specification consists of words which are both nouns and verbs. Consequently the phrase interrupt vector transfer phase, for example, might be interpreted as a command to interrupt the vector transfer phase, or (unless we are careful about number agreement) as the claim that phase is transferred by interrupt vectors.</Paragraph> <Paragraph position="8"> In spoken language stress is sometimes used to &quot;adjective-ize&quot; nouns used as modifiers. For example, the spoken form would be &quot;arithmetic register transfer&quot; rather than &quot;arithmetic register transfer&quot;. Obviously, such a device is not available in our case, where specifications are typed.</Paragraph> <Paragraph position="9"> * Determination of enough about their meaning to permit further analysis of the input. Full understanding of such expressions requires more domain knowledge than one would wish to employ at this point in the analysis process (Cf. Finin (1980)). However, at least a minimal understanding of the semantics of the noun phrase is necessary for testing selectional restrictions of higher level phrasal patterns. This is required, in turn, in order to provide a correct representation of the meaning of the complete input.</Paragraph> <Paragraph position="10"> The phrasal approach utilizes the phrasal pattern as the primary means of recognizing expressions, and in particular noun sequences. In effect, a phrasal pattern is a sequence of restrictions that constituents must satisfy in order to match the pattern. The most common restrictions on a constituent in a PHRAN phrasal pattern, and the ones relevant in our case, are of the following three types: 1. The constituent must be a particular word; 2. It must belong to a particular semantic category; or, 3. It must belong to a particular syntactic category.</Paragraph> <Paragraph position="11"> In addition, simple lookahead restrictions may be attached to any constituent of the pattern. In the original version of PHRAN such restrictions were limited to demanding that the following word be of a certain syntactic category.</Paragraph> <Paragraph position="12"> Simple phrasal patterns are clearly not capable of solving the problem of recognizing multiple noun sequences. It is not possible to anticipate all such sequences and specify them literally, word for word, since they are often generated on the fly by the system specifier.</Paragraph> <Paragraph position="13"> For a similar reason phrasal patterns describing the sequence of semantic categories that the nouns belong to are, as a rule, inadequate.</Paragraph> <Paragraph position="14"> Finally, from the syntactic point of view all these constructions are just sequences of nouns. A pattern simply specifying such a sequence provides little of the information needed to decide which expression is present and what it might refer to.</Paragraph> </Section> </Section> <Section position="4" start_page="60" end_page="62" type="metho"> <SectionTitle> 4. A Heurlstlc Solution </SectionTitle> <Paragraph position="0"> PHRAN's inherent priority scheme was used to solve part of the problem. If a word can be Used either as a noun or a verb, it is recognized first as a noun, all other things being equal. This simple approach was modified to be subject to the following rules: 1. If the current word is a noun, and the next word may be either a noun or a verb, test it for number agreement (as a verb). If the test is unsuccessful do not end the noun phrase.</Paragraph> <Paragraph position="1"> 2. If the current word is a noun, and the next word may be either a noun or a verb, test if the current word* is a possible active agent with respect to the next (as a verb). If not, do not end the noun phrase.</Paragraph> <Paragraph position="2"> 3. If the current word is a noun, and the next word may be either a noun or a verb, check the word after the next one. If it is (unambiguously) a verb, end the noun phrase with the next word. If it is (unambiguously) a noun, do not end the noun phrase. If the second word away may be either a noun or a verb, treat the utterance as potentially ambiguous, with a noun phrase ending either at the current word or with the next word.</Paragraph> <Paragraph position="3"> Once a complete noun phrase is detected a new token is created to represent its referent. * The current word may be the last in a sequence of nouns; we are again assuming that its meaning can be used to approximate the meaning of the noun sequence. While all nouns used in its construction are noted, it inherits the semantics of the last noun in the sequence. This information may be used in later stages of the analysis. Other programs which receive the analyzer's output will inspect the representation of the noun phrase again later to determine its meaning more precisely.</Paragraph> <Paragraph position="4"> The heuristic described above has been found to be sufficient to deal with all inputs our system has received up until now. It detects as ambiguous a sentence such as the following: The cpu signal interrupts transfer activity.</Paragraph> <Paragraph position="5"> When looking at the word cpu PHRAN-SPAN finds that Rule 1. can be used. Since number agreement is absent between cpn and signal (used as a verb), the noun phrase cannot be considered complete yet. When the word signal is processed, the system notes that interrupts may be either a (plural) noun or a verb. Number agreement is found, and it is also the case that a signal may act as an agent in an action of interruption, so rules 1. and 2. provide no information. Using Rule 3. we find that the following word, transfer is an ambignous noun/verb. Thus the result of the analysis to this point is indicated as ambiguous, possibly a. \[the cpu signal\] \[interrupts\] \[transfer activity\], or b. \[the cpu signal interrupts\] \[transfer\] \[activity\].</Paragraph> <Paragraph position="6"> The type of ambiguity detected by Rule 3.</Paragraph> <Paragraph position="7"> can often be eliminated by instructing the users of the specification system to use modals when possible. In case of the example above, to force one of the two readings for the sentence, a user might type the cpu signal will interrupt transfer activity, or the cpu signal interrupts will transfer activity, as appropriate.</Paragraph> <Section position="1" start_page="61" end_page="61" type="sub_section"> <SectionTitle> 4.1. Requesting User Assistance </SectionTitle> <Paragraph position="0"> When Rule 3. detects an ambiguity, the system presents both alternatives to the user and asks for an indication of the intended one.</Paragraph> <Paragraph position="1"> PCPs encode in their phrasal pattern descriptions, among other things, selectional restrictions that at times allow the system to rule out some of the ambiguities detected by Rule 3. For example, it is conceivable that interrupts might not be acceptable as agents in a transfer. PHRAN-SPAN would thus be capable of eventually ruling out analysis b. above on its own.</Paragraph> <Paragraph position="2"> However, more often than not it is the case that both interpretations provided by Rule 3. are sensible. We decided that the risk of a wrong specification being produced required that in cases of potential ambiguity the system request immediate aid from the user. Therefore, when sentences like the one in the example above are typed and processed, PHRAN-SPAN will present both possible readings to the user and request that the intended one be pointed out before analysis proceeds.</Paragraph> </Section> <Section position="2" start_page="61" end_page="61" type="sub_section"> <SectionTitle> 4.2. Rule Implementation </SectionTitle> <Paragraph position="0"> The rules described above are implemented in several pattern-concept pairs and are incorporated into the standard PHRAN knowledge base of PCPs. For example, one of the PCPs used to detect the situation described in Rule 1. while taking into consideration Rule 3. is (in simplified form): {part of speech: noun phrase semantics: inherit from (second noun) modifiers: (first noun)}</Paragraph> </Section> <Section position="3" start_page="61" end_page="62" type="sub_section"> <SectionTitle> 4.3. Current Status </SectionTitle> <Paragraph position="0"> The system currently processes specifications associated with all primitive concepts of the specification language, which are sufficient to describe behavior in the domain of digital systems.</Paragraph> <Paragraph position="1"> Pattern-concept pairs have been written for 25 basic verbs common in specifications and for over 100 nouns. This is in addition to several hundred PCPs supplied with the original PHRAN system.</Paragraph> <Paragraph position="2"> The system is coded in Franz LISP and runs on SUN/2 under UNIX 4.2 BSD. In interpreted mode a typical specification sentence will take 20 cpu seconds to process. No attempt has been made to optimize the code, compile it, or port it to a LISP processor. Any of these should result in an interface which could operate in near real-time.</Paragraph> </Section> </Section> class="xml-element"></Paper>