XML Viewer - p84-1108

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1108_metho.xml
Size: 16,138 bytes
Last Modified: 2025-10-06 14:11:43
<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1108">
  <Title>A Computational Analysis of Complex Noun Phrmms in N,,vy Messages</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABS TRACT
</SectionTitle>
    <Paragraph position="0"> Methods of text compression in Navy messages are not limited to sentence fragments and the omissions of function words such as the copula be. Text compression is also exhibited within ~grammatieal&amp;quot; sentences and is identified within noun phrases in Navy messages.</Paragraph>
    <Paragraph position="1"> Mechanisms of text compression include increased frequency of complex noun sequences and also increased usage of nominalizations. Semantic relationships among elements of a complex noun sequence can be used to derive a correct bracketing of syntactic constructions.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
I INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> At the Navy Center for Applied Research in Artificial Intelligence, we have begun computer-analyzing and processing the compact text in Navy equipment failure messages, specifically equipment failure messages about electronics and data communications systems.</Paragraph>
    <Paragraph position="1"> These messages are required to be sent within 24 hours of the equipment casualty. Narrative remarks are restricted to a length of no more than 99 lines, and each line is restricted to a length of no more than 69 characters.</Paragraph>
    <Paragraph position="2"> Because hundreds of these messages are sent daily to update ship readiness data bases, automatic procedures are being implemented to handle them efficiently. Our task has been to process them for purposes of dissemination and summarization, and we have developed a prototype system for this purpose. To capture the information in the narrative, we have chosen to use natural language understanding techniques developed at the Linguistic String Project \[Sager 1981\].</Paragraph>
    <Paragraph position="3"> These messages, like medical reports \[Marsh 1982\] and technical manuals \[Lehrberger 1982\], exhibit properties of text compression, in part due to imposed time and length constraints. Some methods of compression result in sentences that are usually called ill-formed in normal English texts \[Eastman 1981\]. Although unusual in normal, full English texts, these are characteristic of messages. Recent work on these properties' include discussions of omissions of function words such as the copula be, which results in sentence fragments and omissions of articles in compact text \[Marsh 1982, 1983; Bachenko 1983\]. However, compact text also utilizes mechanisms of compression that are present in normal English but are used with greater frequency in messages and technical reports. Although the messages contain sentence fragments, they also contain many complete sentences.</Paragraph>
    <Paragraph position="4"> These sentences are long and complicated in spite of the telegraphic style often used. The internal structure of noun phrases in these constructions is often quite complex, and it is in these noun phrases that we find syntactic constructions characteristic of text compression. Similar properties have been noted in other report sub-languages \[Lehrberger, 1982; Levi, 1978\].</Paragraph>
    <Paragraph position="5"> When processing these messages it becomes important to recognize signs of text compression since the function words that so often direct a parsing procedure and reduce the choice of possible constructions are frequently absent. Without these overt markers of phrase boundaries, straightforward parsing becomes difficult and structural ambiguity becomes a serious problem. For example, sentences (1)-(2) are superficially identical, however in Navy messages, the first is a request for a part (an antenna) and the second a sentence fragment specifying an antenna performing a specific function. (a transmit antenna).</Paragraph>
    <Paragraph position="6">  (1) Request antenna shipped by fastest available means.</Paragraph>
    <Paragraph position="7"> (2) Transmit antenna shipped by fastest available means.</Paragraph>
    <Paragraph position="8">  The question arises of how to recognize and capture these distinctions. We have chosen to take a sublangnage, or domain specific, approach to achieving correct parses by specifying the types of possible combinations among elements of a construction in both structural and semantic terms.</Paragraph>
    <Paragraph position="9"> This paper discusses a method for recognizing instances of textual compression and identifies two types of textual compression that arise in standard and sub-language texts: complex noun sequences and nominalizations. These are both typically found in noun phrase constructions. We propose a set of semantic relations for complex noun sequences, within a sublanguage analysis, that permits the proper bracketing of modifier and host for correct interpretation of noun phrases.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="506" type="metho">
    <SectionTitle>
II TEXT COMPRESSION IN NOUN PHRASES
</SectionTitle>
    <Paragraph position="0"> We can recognize the sources of text compression by two means: (1) comparing a full grammar of the standard language to that of the domain in which we are working,  and {2) comparing the distribution of constructions in two different sublanguages. The first comparison distinguishes those constructions that are peculiar to a sub-language /el. Marsh 1982\]. A comparison of a full grammar with two sublanguage grammars, the equipment failure messages discussed here and a set of patient medical histories, disclosed that the sublanguage grammars were substantially smaller than full English grammars, having fewer productions and reflecting a more limited range of modifiers and complements \[Grishman 1984\].</Paragraph>
    <Paragraph position="1"> The second comparison identifies the types of constructions that exhibit text compression. These are common even in full sentences. For example, we found that similar sets of modifiers were used in the two different sub-languages \[Grishman 1984\]. However, the equipment failure messages had significantly more left and right modifier constructions than the medical, even though the equipment failure messages had about one-half the number of sentences of the patient histories. 236 sentences in the medical domain were analyzed and 123 in the Navy domain. The statistics are presented in Tables 1 and 2.</Paragraph>
    <Paragraph position="2"> In particular, there were significantly more noun modifiers of nouns constructions (Noun + Noun constructions) in the equipment failure messages than there were in the medical records, and more prepositional phrase modifiers of noun phrases. Further analysis suggested these constructions are symptomatic of two major mechanisms text compression in Navy messages: of complex noun sequences and nominalizations.</Paragraph>
    <Paragraph position="3"> Complex noun sequences. A major feature of noun phrases in this set of messages is the presence of many long sequences of left modifiers of nouns, (3).</Paragraph>
    <Paragraph position="4">  {3) (a) forward kingpost sliding padeye unit (b) coupler controller standby light (c) base plate insulator welds {d) recorder-reproducer tape transport (e) nbsv or ship-shore tty sat communications (f) fuze setter extend/retract cycle  Complex noun sequences like these can cause major problems in processing, since the proper bracketing requires an understanding of the semantic/syntactic relations between the components. \[Lehrberger 1982\] identifies similar sequences (empilage) in technical manuals. As he notes, this results from having to give highly descriptive names to parts in terms of their function and relation to other parts.</Paragraph>
    <Paragraph position="5"> Modifiers of nouns include nouns and adjectives. In  the sublanguage of Navy messages, unmarked verb modifiers of nouns also occur. This construction is not common in standard English or in the medical record sublanguage mentioned above. It is illustrated above in (2) and below in (4).</Paragraph>
    <Paragraph position="6"> (4) (a) receive sensitivity (b) operate mode (c) transmit antenna Because the verbs are unmarked for tense or aspect, they can be mistaken by the parsing procedure for imperative or present tense verbs. Furthermore, in this domain the problem is compounded by the frequent use of sentence fragments consisting of a verb and its object, with no subject present (1) repeated as (5) below.</Paragraph>
    <Paragraph position="7"> (5) Request antenna...</Paragraph>
    <Paragraph position="8">  Complex noun sequences also commonly arise from the omission of prepositions from prepositional phrases. The resulting long sequences of nouns are not easily bracketed correctly. In this data set, the omission of prepositions is restricted to place and time sequences (6 null In (6), prepositions marking time phrases have been omitted, and in (7) both time and place prepositions have been omitted.</Paragraph>
    <Paragraph position="9"> Nominalizations. The increased frequency of prepositional modifiers in the equipment failure messages was traced to the frequent use of nominalizations in Navy messages. Out of a preliminary set of 89 prepositional modifiers of nouns, 42 were identified as arguments to nominalized verbs (47%), the other 52% were attributive. Examples of argument prepositional phrases are given in  (8), attributive in (9).</Paragraph>
    <Paragraph position="10"> (8) (a) assistance from MOTU 12 (b) failure of amplifier (c) cause of casualty (d) completion of assistance (9) (a) short circuit between amplifier and power supply (b) short in cable (c) receipt NLT 4 OCT 82 (d) burned spots on connector  In these texts, in which nominalization serves as an important mechanism of text compression, it therefore becomes important to distinguish prepositional phrases that serve as arguments of nominalizations from attributive ones.</Paragraph>
    <Paragraph position="11"> The syntax of complex modifier sequences in noun phrases and the identification of nominalizations, both characteristic of text compression, need to be consistently defined f~,~ ~ r)roper understanding of the text being processed. By utilizing the semantic patterns that are derived from a sublanguage analysis, it becomes possible to properly bracket complex noun phrases. This is the subject of the next section.</Paragraph>
  </Section>
  <Section position="4" start_page="506" end_page="507" type="metho">
    <SectionTitle>
HI SEMANTIC PATTERNS IN
COMPLEX NOUN SEQUENCES
</SectionTitle>
    <Paragraph position="0"> Noun phrases in the equipment failure messages typically include numerous adjectival and noun modifiers on the head, and additional modifier types that are not so common in general English. The relationships expressed by this stacking are correspondingly complex. The sequences are highly descriptive, naming parts in terms of their function and relation to other parts, and also describing the status of parts and other objects in the sublanguage. Domain specific information can be used to derive the proper bracketing, but it is first necessary to identify the modifier-host semantic patterns through a distributional analysis of the texts. The basis for sub-language work is that the semantic patterns are a restricted, limited set. They talk about a limited number of classes and objects and express a limited number of relationships among these objects. These objects and relationships are derived through distributional analysis, and can ultimately be used to direct the parsing procedure.</Paragraph>
    <Paragraph position="1"> Complex noun sequences. Semantic patterns in complex noun phrases fall into two types: part names and other noun phrases. Names for pieces of equipment often contain complex noun sequences, i.e. stacked nouns. The relationships among the modifiers in the part names may indicate one of several semantic relations. They may indicate the levels of components. For example, assembly/component relationships are expressed. In circuit diode, diode is a component of a circuit. In antenna coupler, coupler is a component part of an antenna. Part names may also describe the function of the piece of equipment. For example, in the phrase high frequency transmit antenna, trqlnsmit is the function of the antenna.</Paragraph>
    <Paragraph position="2"> The semantic relations among the modifiers of a part are strictly ordered are shown in (10a); examples are provided in (10b).</Paragraph>
    <Paragraph position="3">  (10) (a) ID REPAIR SIGNAL FUNCTION PART.</Paragraph>
    <Paragraph position="4"> (b) CU-t~O07 antenna coupler; HF XMIT antenna;  deflection amplifier; UYA. 4 display system; primary HF receive antenna The component relations in part names are especially closely bound and are best regarded as a unit for processing. Thus antenna coupler in CU-~O07 antenna coupler can be considered a unit. We would not expect to find antenna CU-~O07 coupler or coupler CU-~007 antenna.</Paragraph>
    <Paragraph position="5"> In other noun phrases, i.e. those that are not part names, the head nouns can have other semantic categories. For example, looking back at the sentences in (3), the head noun of a noun sequence can be an equipment part ( unit, light ), a process that is performed on electrical signals ( cycle ), a part function (communica null tions ). In addition, it can be a repair action (alignment, repair), an assistance actions ( assistance ), and so on. Only modifiers with appropriate semantic and syntactic category can be adjoined. For example, in the phrase fuze setter eztend/retract cycle, semantic information is necessary to attain the correct bracketing. Since only function verbs can serve as noun modifiers, eztend/retraet can be analyzed as a modifier of cycle, a process word. Fuze setter, a part name, can be treated as a unit because noun sequences consisting of part names are generally local in nature. Fuze setter is prohibited from modifying eztend/retract, since verb modifiers do not themselves take noun modifiers.</Paragraph>
    <Paragraph position="6"> Other problems, such as the omissions of prepositions resulting in long noun sequences (ef. (8) and (0) above), can also be treated in this manner. By identifying the semantic classes of the noun in the object of the prepositionless prepositional phrase and its host's class, the occurrence of these prepositionless phrases can he restricted. The date and place strings can then be properly treated as a modifier constructions instead as head nouns.</Paragraph>
  </Section>
  <Section position="5" start_page="507" end_page="507" type="metho">
    <SectionTitle>
IV CONCLUSION
</SectionTitle>
    <Paragraph position="0"> Methods of text compression are not limited to omissions of lexical items. They also include mechanisms for maximizing the amount of information that can he expressed within a limited time and space. These mechanisms include increased frequency of complex noun sequences and also increased usage of nominalizations.</Paragraph>
    <Paragraph position="1"> We would expect to find similar methods of text compression in other types of scientific material and message traffic. The semantic relationships among the elements of a noun phrase permit the proper bracketing of complex noun sequences. These relationships are largely domain specific, although some patterns may be generalizable across domains \[Marsh 1084 I.</Paragraph>
    <Paragraph position="2"> The approach taken here for Navy messages, which uses suhlanguage seleetional patterns for disambiguation, was developed, designed, and implemented initially at the New York University Linguistic String Project for medical record processing \[Friedman 1984; Grishman 1983; Hirschman 1982 I. It was implemented with the capability for transfer to other domains. We anticipate using a similar mechanism, based partially on the analysis presented here, on Navy messages in the near future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML