XML Viewer - h89-1034

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-1034_metho.xml
Size: 16,080 bytes
Last Modified: 2025-10-06 14:12:19
<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1034">
  <Title>ANALYZING TELEGRAPHIC MESSAGES</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ANALYZING TELEGRAPHIC MESSAGES
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Ralph Grishman
John Sterling
New York University
</SectionTitle>
      <Paragraph position="0"> Most people have little difficulty reading telegraphic-style messages such as</Paragraph>
    </Section>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SHIPMENT GOLD BULLION ARRIVING STAGECOACH JAN. 7 3 PM
</SectionTitle>
    <Paragraph position="0"> even though lots of material has been omitted which would be required in &amp;quot;standard English&amp;quot;, such as articles, prepositions, and verbs. Our concern in this paper is how to process such messages by computer.</Paragraph>
    <Paragraph position="1"> Even though people don't send many telegrams anymore, this problem is still of importance because many military messages are written in this telegraphic style:</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="206" type="metho">
    <SectionTitle>
2 FLARES SIGHTED 230704Z6 SOUTH APPROX 5 MI SPA ESTABLISHED
</SectionTitle>
    <Paragraph position="0"> (here 230704Z6 is the time, and SPA is the Submarine Probability Area).</Paragraph>
    <Section position="1" start_page="0" end_page="206" type="sub_section">
      <SectionTitle>
Alternative Strategies
</SectionTitle>
      <Paragraph position="0"> The particular class of messages which we have studied are a set of Navy tactical messages called RAINFORM (ship) sighting messages \[8\]. Several other researchers have previously constructed systems to analyze these messages. In the NOMAD system \[1\] the knowledge was principally realized as procedures associated with individual words. This made it difficult to extend the system, as Granger has noted \[1\]. Some of the shortcomings of the internal knowledge representation were remedied in a later system named VOX \[5\] which used a conceptual grammar, mixing syntactic and semantic constraints. However, the power of the grammar was still quite limited when compared to grammars traditionally used in computational linguistics applications.</Paragraph>
      <Paragraph position="1"> In the development of our system, in contrast, we have taken as our starting point a relatively broad coverage grammar of standard English. More generally, it has been our goal to use, to the extent possible, system components which would be appropriate to a general-purpose English language analyzer. We see several benefits to such an approach: * Using general-purpose components minimizes the labor in porting the system to a new domain.</Paragraph>
      <Paragraph position="2"> * Using a standard English grammar makes it easier to analyze the complex constructions (involving subordinating and coordinating conjunctions, for example) which occur with some frequency in these messages.</Paragraph>
      <Paragraph position="3"> * Starting from a standard grammar clarifies the ways in which these messages differ from standard English.</Paragraph>
      <Paragraph position="4"> This approach is in keeping with earlier work at NYU, on medical records and equipment failure reports \[4,3\], and more recent work at UNISYS, primarily on equipment failure reports \[6,2\].</Paragraph>
      <Paragraph position="5"> In the next section, we briefly describe the overall structure of the message understanding system. In the two sections which follow, we focus on the two core problems of analyzing such telegraphic text: first, the problem of analyzing the structure of the text (&amp;quot;parsing&amp;quot;); second, the problem of recovering the arguments which are omitted in the telegraphic text.</Paragraph>
      <Paragraph position="6">  System structure The text processing system is organized as a pipeline consisting of the following modules:  1. A parser using an augmented context-free grammar consisting of context-free rules plus procedural restrictions. The grammar is modeled after the Linguistic String Project English Grammar \[7\]; the parser is based on a chart parsing algorithm.</Paragraph>
      <Paragraph position="7"> . A syntactic regularizer whose primary function is to convert all clauses into a standard operatorargument form. The regularizer is organized as a set of Montague-style translation rules associated with the individual productions of the parsing grammar.</Paragraph>
      <Paragraph position="8"> 3. A semantic analyzer which checks semantic class requirements for arguments of verbs, and which translates clauses and nominalizations into domain predicates.</Paragraph>
      <Paragraph position="9"> 4. Simplification rules, which perform certain simplifications on the output of the semantic analyzer (for example, conduct an attack ~ attack).</Paragraph>
      <Paragraph position="10"> 5. Reference resolution, which resolves anaphoric references.</Paragraph>
      <Paragraph position="11"> 6. Discourse analysis, which identifies implicit relations between events in the text.  The control structure is not strictly sequential. In particular, the parser, regularizer, and the checking functions of the semantic analyzer are run in parallel. Also, reference resolution and discourse analysis may be interleaved using a priority-based scheme (discussed below).</Paragraph>
      <Paragraph position="12"> The entire system has been run successfully on 25 messages drawn from the set of I~AINFORM sighting messages in \[8\]. These messages are, on average, roughly 25 words long.</Paragraph>
      <Paragraph position="13"> Analyzing sentence structure An noted above, we began our work on message analysis with a relatively broad coverage grammar of standard English. Furthermore, we generally followed the approach of Sager and Marsh \[4,3\] in treating the deviations not as instances of ill-formedness but rather as constructions specific to such telegraphic sublanguages. In our analysis of the RAINFORMs, we found two types of omissions. The first, which had been previously characterized by Sager and Marsh (in their analysis of medical reports and equipment failure messages), involved the omission of top-level sentence elements, such as sentence subjects (&amp;quot;\[We\] conducted attack at close range.&amp;quot;) and the verb &amp;quot;be&amp;quot; (&amp;quot;Results \[are\] unknown at this time.&amp;quot;). The second class can be generally characterized as function words which mark particular cases and types of complements. These include prepositions such as &amp;quot;of&amp;quot; and &amp;quot;at&amp;quot; (&amp;quot;Hydrophone effects \[at\] bearing \[of\] 173degt \[were\] classified \[as\] surface combatant ...&amp;quot;), &amp;quot;as&amp;quot;, and &amp;quot;to&amp;quot; in infinitival strings (&amp;quot;Intend \[to\] make sweep of area ...&amp;quot;). Modifying the grammar to allow for these omissions was quite straightforward: several definitions were added for sentence fragments, and prepositions, &amp;quot;as&amp;quot;, and &amp;quot;to&amp;quot; were allowed to be empty. What made the task less than trivial was controlling these omissions. Adding the definitions for sentence fragments alone (following Sager and Marsh) increased syntactic ambiguity, but a sequential analysis (first syntactic analysis, then semantic filtering) was still feasible. However, when the grammar was extended to include function word omission and run-on sentences, the degree of syntactic ambiguity became much greater. If you consider that, in the grammar, each noun can be a sentence fragment or a prepositional phrase (with a deleted preposition), and add the fact that run-on sentences with no punctuation are frequent: Sighted periscope an asroc \[anti-submarine rocket\] fired proceeded on to station visual contact lost, constellation helo hovering in vicinity.</Paragraph>
      <Paragraph position="14">  you can imagine the explosion in parses which would occur. Such telegraphic input is understandable, however, only because of the strong semantic clues which are available. ~Ve take advantage of these semantic constraints by applying basic semantic checks on the semantic classes of arguments and modifiers each time a noun phrase or a clause is completed during parsing.</Paragraph>
      <Paragraph position="15"> In addition, we associate a score with each partial and complete parse, and use a best-first search for parsing, l~oughly speaking, we associate a lower score with analyses which imply the existence of a larger number of omitted elements. The scoring mechanism serves to focus the search and thus greatly reduce the parsing time. In addition, it provides a means for preferring one analysis over another in some cases of syntactic ambiguity. For example, the &amp;quot;sentence&amp;quot; Two cats drinking milk two cats eating fish.</Paragraph>
      <Paragraph position="16"> would get, in addition to the analysis as a run-on sentence, Two cats \[are\] drinking milk \[.\] Two cats \[are\] eating fish., the analysis as a single sentence with missing main verb &amp;quot;be&amp;quot;, Two cats \[who are\] drinking milk \[are\] two cats \[who are\] eating fish.. We have experimented with several scoring schemes; our current scheme exacts a constant penalty for each omitted preposition, &amp;quot;to&amp;quot;, and &amp;quot;as&amp;quot;, and for each clause (including reduced relative clauses) and sentence fragment in the analysis. This scheme produces the correct analysis for the example just above.</Paragraph>
      <Paragraph position="17"> One further modification is required to handle zeroed prepositions. The semantic checks mentioned earlier operate from a set of case frames, one or more for each verb. Each case frame specifies a list of arguments and modifiers, and for each argument or modifier the case marker (such as subject or object or a list of prepositions) and the semantic class of the argument/modifier. An omitted preposition is marked in the analysis by the symbol prep and the semantic checking routine has been modified to accept prep in place of a particular preposition (but not to match positional markers such as subject or object).</Paragraph>
      <Paragraph position="18"> Recovering omitted and anaphoric arguments The second major task in analyzing the telegraphic messages is recoving the missing arguments. In the case frames, certain arguments are marked as essential; if they are omitted from the text, reference resolution attempts to fill them in. It does so using essentially the same mechanism employed for anaphora resolution. This commonality of mechanism has been previously noted by UNISYS \[6,2\].</Paragraph>
      <Paragraph position="19"> The basic anaphora resolution mechanism is quite simple, and is based on a hierarchy of semantic classes for the objects and events in the domain. If an argument is omitted, the case frame indicates the semantic class of the argument which was expected. If an argument is present and corresponds to a semantic class more specific than that required by the case frame, we take the semantic class of the argument. Reference resolution searches for the most recently mentioned entity or event of the same semantic class. For example, in analyzing Fired 2 missiles on Barsuk. Results of attack unknown.</Paragraph>
      <Paragraph position="20"> we would recognize firing as a type of attack and thus link attack in the second sentence to the event related by the first sentence.</Paragraph>
      <Paragraph position="21"> This mechanism is in fact too simple. Component (part/whole) relationships are sometimes needed in order to link anaphor and antecedent. Thus, to resolve My attacks in the message Exchange missile fire with Kynda .... My attacks successful.</Paragraph>
      <Paragraph position="22"> we must recognize that exchange involves two activities, my firing at Kynda and Kynda's firing at me. We can then resolve My attacks with the first of these activities and thus determine that it was my attacks on Kynda which were successful.</Paragraph>
      <Paragraph position="23"> Most of the anaphoric references in these messages can be correctly resolved using this combination of type and component relationships. In some cases, however, we need to make use of richer contextual information, about the relationship of the events in the message to one another. For example, in  Three missiles fired at Kobchic. One missile hit.</Paragraph>
      <Paragraph position="24"> reference resolution first uses the general rule that the omitted subject in a sentence fragment is &amp;quot;us&amp;quot; (the ship sending the message), in effect expanding the first sentence to &amp;quot;Three missiles fired \[by us\] at Kobchic.&amp;quot; It is then faced with the problem, in the second sentence, of whom the missiles hit, us or Kobchic, since both antecedents are salient at this point. To resolve this problem we use a set of discourse coherence rules, which capture the cause/effect and precondition/action relationships between the events in the domain. Reference resolution generates the alternate readings, and then discourse analysis scores a reading which matches the coherence rules higher than one which does not. In this case we have a rule that relates firing at a ship with hitting that ship, so the system prefers the analysis where Kobchic was hit.</Paragraph>
      <Paragraph position="25"> Both component information and contextual relationships are needed to process Visual sighting of periscope followed by attack ....</Paragraph>
      <Paragraph position="26"> First we fill in &amp;quot;us&amp;quot; as the implicit subject of &amp;quot;sighting&amp;quot;. There is no antecedent for attack, so we proceed to fill in the essential arguments of attack. The object of attack must be a ship. The two salient entities at this point are &amp;quot;us&amp;quot; and the periscope. Reference resolution finds a link, through the part-whole hierarchy, between periscope and submarine, a type of ship, so it creates a submarine entity. It then proposes &amp;quot;us&amp;quot; and this submarine as the possible objects of attack. In this domain, we are hunting for enemy ships, so sighting a vessel is typically followed by attacking it. We have included a coherence rule to that effect, so that the &amp;quot;attack on sub&amp;quot; reading is preferred. In other environments, we might flag this passage as ambiguous. Summary We have shown how highly telegraphic messages can be analyzed through straightforward extensions of the mechanisms employed for the syntactic and semantic analysis of standard English text.</Paragraph>
      <Paragraph position="27"> We have extended previous work on the grammatical analysis of telegraphic messages by allowing for the omission of function words as well as major sentence constituents. This substantially increases syntactic ambiguity, but we have found that this ambiguity can be controlled by applying semantic constraints during parsing and by using a &amp;quot;best-first&amp;quot; parser in which lower scores are associated with analyses which assume omitted function words.</Paragraph>
      <Paragraph position="28"> To recover missing arguments from telegraphic text, we have adopted a strategy in which such omitted arguments are treated as anaphoric elements. In order to resolve anaphoric ambiguities, we have extended the anaphora resolution procedure to take account of the implicit causal and enablement relations in the text. We generate alternative resolutions of anaphoric reference and then select the text analysis with the highest &amp;quot;coherence&amp;quot;: the analysis for which we can identify the greater number of intersentential relations. Acknowledgements This research was supported by the Defense Advanced Research Projects Agency under contract N00014-85-K-0163 from the Office of Naval Research. Most of the modifications to the parser required for these messages were programmed and tested by Mahesh Chitrao.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML