File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/83/j83-3005_abstr.xml

Size: 25,722 bytes

Last Modified: 2025-10-06 13:46:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="J83-3005">
  <Title>The NOMAD System: Expectation-Based Detection and Correction of Errors during Understanding of Syntactically and Semantically Ill-Formed Text 1</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The NOMAD system takes unedited English input in a constrained domain, and works interactively with the user to encode the message into database-readable form. The unedited texts in this domain are Naval ship-to-shore messages, written in 'telegraphic' English, often leaving out nouns and verbs, crucial punctuation (such as periods), and making use of ad hoc abbreviations of words. In addition to these problems of surface-text processing, these texts can contain problems of interpretation - that is, which of several objects is being referred to, or which possible goal inference is implied. These semantic processing problems are not easily detectable or solvable based on the surface text alone but rather require a data base of 1 This research was supported in part by the Naval Ocean Systems Center under contracts N-00123-81-C-1078 and N6600183-C-0255, and by the National Science Foundation under grant IST-81-20685.</Paragraph>
    <Paragraph position="1"> knowledge about the domain of discourse, in this case ship movements.</Paragraph>
    <Paragraph position="2"> Here are examples of each of these two types of problems. First, one with a number of surface-text errors:  (1) 'Locked on open fired destroyed'  Example (1) is missing crucial punctuation (no boundaries separating the three clauses from each other), is missing subjects and objects for all three verb phrases, and has a tense mismatch in the middle phrase ('open fired'). (This is an actual message in the corpus provided to us by the Navy, not a constructed example.) NOMAD's output from this example is: We aimed at an unknown object.</Paragraph>
    <Paragraph position="3"> We fired at the object.</Paragraph>
    <Paragraph position="4"> The object was destroyed.</Paragraph>
    <Paragraph position="5"> A second message has, in addition to some surface problems, a goal-based interpretation problem: Copyright 1984 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the Journal reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/83/030188-09503.00 188 American Journal of Computational Linguistics, Volume 9, Numbers 3-4, July-December 1983 Richard H. Granger The NOMAD System (2) 'Returned bombs to Kashin.' In addition to the surface problem of a missing subject, this example is apparently missing mention of some previous event, implied by the use of 'returned'; and it describes an ambiguous event, that is, either the peaceable delivery of bombs to the Kashin ship (a type of enemy ship) or a battle action of firing bombs in retaliation. Since the input is ambiguous without the previous message, NOMAD returns a number of alternative possible outputs to the user, marking one as &amp;quot;preferred&amp;quot;: (Preferred Interpretation): We fired some bombs at a Kashin ship.</Paragraph>
    <Paragraph position="6"> (Inferred): The Kashin ship fired at us previously.</Paragraph>
    <Paragraph position="7"> (Alternate Interpretation): We delivered some bombs to a Kashin ship.</Paragraph>
    <Paragraph position="8"> (Inferred): The Kashin ship had delivered some bombs to us previously.</Paragraph>
    <Paragraph position="9"> NOMAD is interactive: it produces multiple interpretations when necessary, and lets the message-sender choose among these alternatives. A typical scenario is: * the user (message-sender) will enter a 'telegraphic' message; * NOMAD will produce two different possible interpretations of the message in corrected English, and present them to the user; * the user will then choose one of the interpretations; and * a database-readable version of the correctlyinterpreted message is then forwarded from the ship to a central data base.</Paragraph>
    <Paragraph position="10"> Many of the approaches to understanding ill-formed input focus on syntactic errors separately from semantic errors (for example, Hayes and Mouradian 1981 and Kwasny and Sondheimer 1981). Both of these efforts essentially attempt to increase the flexibility of an ATN syntactic parser: the first by using 'parse suspension and continuation', relaxing constraints on consistency and permitting matches out of their correct order, and the second by relaxing the constraints required to traverse an ATN arc, and then providing 'deviance notes' specifying the differences between what was expected and what was actually seen. These efforts attempt to correct the surface form of the input, that is, to perform a transformation from an ill-formed English text to a well-formed English text. Their goals are not to produce a meaning representation of the input, and hence cannot be said to 'understand' the input. This also leads to the inability of these systems to generate alternative interpretations of text; once these systems have guessed at a parse, they cannot back up and re-parse in response to information from a user.</Paragraph>
    <Paragraph position="11"> The approach taken by Hayes and Carbonell (1981) is closer to that described in this paper, in that they do build meaning representations. However, there are still shortcomings; in particular, their systems cannot understand texts in which a missing or unknown word is the one that would have built the main semantic case frame. As will be seen below, NOMAD builds on the FOUL-UP system (Granger 1977) to handle such cases (which are frequent in our domain). Furthermore, like the systems described above, their systems cannot re-interpret a text when its initial interpretation turns out to be incorrect.</Paragraph>
    <Paragraph position="12"> We propose an integrated system of syntactic and semantic processing, in which world knowledge and syntactic knowledge are both applied during text proccessing to provide a number of possible interpretations of a text. Our focus is on interpretations: the goal of the system is to give rise to an unambiguous meaning representation. If surface-text problems occur during processing but an unambiguous interpretation can be provided and confirmed by the user, then the surface-text problems are ignored. It is only when interpretation problems arise that any noted surface-text problems will be consulted to see if they might have been the source of the interpretation problem. That is, we are attempting to attack the overall problem of processing text, of which the processing of ill-formed text is a necessary subpart. Our approach implies that the processing of ill-formed text 'falls out' of normal text processing, via the application of generalized error-correction processes that operate equally on syntax, semantics, and pragmatics, and are not designed specifically for the processing of ill-formed surface text. NOMAD builds on previous work on conceptual analysis (Riesbeck and Schank 1976, Birnbaum and Selfridge 1979), and on error detection and correction during conceptual analysis (Granger 1977, 1980, 1982a). Selfridge and Engelberg (1984), Lebowitz (1984), and Dyer (1983) have also recently taken approaches that are similar to the one proposed here, attempting to fully exploit the power of integrated understanding. NOMAD incorporates and integrates error detection and correction algorithms based on both syntactic and pragmatic error types, and is therefore capable of correctly processing a wide range of ill-formed texts within the knowledge domain of Navy messages. NOMAD has actually been installed and is being used for message processing by the Naval Ocean  context The FOUL-UP program (Figuring Out Unknown Lexemes in the Understanding Process; Granger 1977) was the first program that could figure out meanings of unknown words encountered during text understanding. FOUL-UP was an attempt to model the corresponding human ability commonly known at &amp;quot;figuring out a word from context&amp;quot;. FOUL-UP worked with the SAM system (Cullingford 1977), using the expectations generated by scripts (Schank and Abelson 1977) to restrict the possible meanings of a word, based on what object or action would have occurred in that position according to the script for the story.</Paragraph>
    <Paragraph position="13"> For instance, consider the following excerpt from a newspaper report of a car accident: (1) Friday, a car swerved off Route 69. The vehicle struck an embankment.</Paragraph>
    <Paragraph position="14"> The word &amp;quot;embankment&amp;quot; was unknown to the SAM system, but it had encoded predictions about certain attributes of the expected conceptual object of the PROPEL action (the object that the vehicle struck); namely, that it would be a physical object, and would function as an &amp;quot;obstruction&amp;quot; in the vehicle-accident script. (In addition, the conceptual analyzer (ELI Riesbeck and Schank 1976) had the expectation that the word in that sentence position would be a noun.) Hence, when the unknown word was encountered, FOUL-UP would make use of those expected attributes to construct a memory entry for the word &amp;quot;embankment&amp;quot;, indicating that it was a noun, a physical object, and an &amp;quot;obstruction&amp;quot; in vehicle-accident situations. It would then create a dictionary definition that the system would use from then on whenever the word was encountered in this context.</Paragraph>
    <Paragraph position="15"> 2.2. Syntactic (surface) and semantic (interpreation) text errors But even if the SAM system had known the word &amp;quot;embankment&amp;quot;, it would not have been able to handle a less edited version of the story, such as this 'telegraphic' message, which might have been sent in by an on-the-scene reporter: (2) Vehcle ace Rt69; car strck embankment; drivr dead one psngr inj; ser dmg to car full rpt frthcmng.</Paragraph>
    <Paragraph position="16"> While human readers would have little difficulty understanding this text, no existing computer programs could do so.</Paragraph>
    <Paragraph position="17"> The scope of this problem is wide; examples of texts that present &amp;quot;scruffy&amp;quot; difficulties to readers are completely unedited texts, such as messages composed in a hurry, with little or no re-writing, rough drafts, memos, transcripts of conversations, etc. Such texts may contain these problems, among others: missing words, ad hoc abbreviations of words, poor syntax, confusing order of presentation of ideas, misspellings, lack of punctuation. Even edited texts such as newspaper stories often contain misspellings, words unknown to the reader, and ambiguities; and even apparently very simple texts may contain alternative possible interpretations, which can cause a reader to construct erroneous initial inferences that must later be corrected (see Granger 1980, 1981a, 1981b).</Paragraph>
    <Paragraph position="18"> The following sections describe the NOMAD system, which incorporates FOUL-UP's abilities as well as significantly extended abilities to use syntactic and semantic expectations to resolve these difficulties, in the domain of Navy messages. NOMAD's processing is divided into two major categories:  (1) blame assignment, that is, the detection of an error and the attribution of that error to some source; and (2) error correction, the remedy for the source of the error.</Paragraph>
    <Paragraph position="19"> 3. How NOMAD Recognizes and Corrects</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Errors
3.1. Introduction
</SectionTitle>
      <Paragraph position="0"> NOMAD incorporates ideas from, and builds on, earlier work on conceptual analysis (for example, Reisbeck and Schank 1976, Birnbaum and Selfridge 1979), situation and intention inference (for example, Cullingford 1977, Wilensky 1978), and English generation (for example, Goldman 1973, McGuire 1980). What differentiates NOMAD significantly from its predecessors are its error recognition and error correction abilities, which enable it to read texts more complex than those that can be handled by other text understanding systems. null NOMAD operates by attempting to process texts left to right, with each word capable of suggesting new expectations (for example, a verb will follow, the previous noun group should serve as actor of the current act, etc.), and applying those suggested expectations to new inputs. When expectations are met, they result in additions to the ongoing meaning representation of the text; when they are not met, they result in 'surfacetext alerts', which are collected for potential later corrective processing.</Paragraph>
      <Paragraph position="1"> There are two types of 'errors' in NOMAD: surface-text errors and interpretation errors. Surface-text errors are potential problems that can be readily detected at surface-text processing time, including, for example, unknown words and any surface expectation violations, whether syntactic or semantic. For instance, a syntactic expectation failure such as 'no noun 190 American Journal of Computational Linguistics, Volume 9, Numbers 3-4, July-December 1983 Richard H. Granger The NOMAD System group appearing where one was expected' is a surface alert, but so is a semantic/pragmatic expectation failure such as 'target noun group was expected to describe an animate actor, but described an inanimate object instead'. Each of these is equally an expectation failure, and no difference need be drawn at this stage of processing between syntactic or semantic types. It will be seen later that, depending on the type of surface alert, different suggestions will be made as to where to look to 'assign blame' for the problem, and how to attempt to correct it.</Paragraph>
      <Paragraph position="2"> Interpretation failures, on the other hand, are defined as those that cannot be easily ascribable to the failure of some particular pre-defined surface expectation; these arise after some conceptual analysis has been successfully performed and the resulting representation fails to match pragmatic checks such as goal-based or script-based knowledge of the situation being described.</Paragraph>
      <Paragraph position="3"> Following is a list of nine categories of problems we have identified that occur often in scruffy unedited texts, five surface-text problems and four interpretation problems. Each problem is illustrated by a brief example from the domain of Navy messages. It will be seen that these errors often occur in pairs, with surface-text problems sometimes giving rise to interpretation problems. Note that while these problems are often referred to in this paper as 'errors' in fact some are not actual 'errors', strictly speaking, but are rather potential problem indicators that NOMAD recognizes, which may give rise to subsequent interpretation  problems.</Paragraph>
      <Paragraph position="4"> Surface-text problems 1. Unknown words.</Paragraph>
      <Paragraph position="5"> Enemy &amp;quot;scudded&amp;quot; bombs at us. - the verb is unknown to the system.</Paragraph>
      <Paragraph position="6"> 2. Missing subject, object, etc. of sentences.</Paragraph>
      <Paragraph position="7"> Sighted enemy ship. Fired. - the actor who fired is not explicitly stated.</Paragraph>
      <Paragraph position="8"> 3. Missing sentence and clause boundaries.</Paragraph>
      <Paragraph position="9"> Locked on opened fire. - two actions, aiming and firing.</Paragraph>
      <Paragraph position="10"> 4. Ambiguous word usage.</Paragraph>
      <Paragraph position="11"> Returned bombs to Kashin. - &amp;quot;returned&amp;quot; in the sense of retaliation after a previous attack, or &amp;quot;returned&amp;quot; in the sense of &amp;quot;peaceably delivered to&amp;quot;? 5. Lack of tense agreement.</Paragraph>
      <Paragraph position="12"> Open fired. - the intended tense of 'open' is transferred to 'fire'.</Paragraph>
      <Paragraph position="13"> Interpretation problems 1. Causality violation.</Paragraph>
      <Paragraph position="14"> Ship sighted overhead. - ships can't fly; probable message-sending error.</Paragraph>
      <Paragraph position="15"> 2. Goal violation.</Paragraph>
      <Paragraph position="16"> Returned bombs to Kashin. - one of two ambiguous interpretations of 'returned' (peaceably delivered) gives rise to apparent goal violation (delivering weapons to enemy).</Paragraph>
      <Paragraph position="17"> 3. User confirmation failure.</Paragraph>
      <Paragraph position="18"> NOMAD's failure is not confirmed by user. (Note that this is considered by NOMAD to be an interpretation problem even thought it may be due to the user's idiosyncrasies, as opposed to violation of some known semantic rule - the effect is the same.) 4. Object or event referenced out of known event  sequence.</Paragraph>
      <Paragraph position="19"> Midway lost contact on Kashin. - no previous contact mentioned; this often arises when typical known situations are mentioned in other than stereotypical (scripty) order.</Paragraph>
      <Paragraph position="20"> When these problems arise in a message, NOMAD must first recognize what the problem(s) is(are) (which is often difficult to do), and then attempt to correct the error(s). The following section outlines the overall processing algorithms NOMAD uses to process these errors.</Paragraph>
      <Paragraph position="21">  3.2. NOMAD's error-detection algorithm NOMAD's algorithm for detection and solution of errors follows a four-step process: 1. Set 'alert' flags wherever potential surface-text problems are detected.</Paragraph>
      <Paragraph position="22"> 2. Do only partial processing of surface text if necessary due to missing or ambiguous information (that is, do as much normal processing as possible in the face of missing information).</Paragraph>
      <Paragraph position="23"> 3. Check for interpretation problems (causal, goal, sequencing (script), or user confirmation errors) after surface sentence processing.</Paragraph>
      <Paragraph position="24"> 4. Try solutions based on surface 'alert' flag catego null ries.</Paragraph>
      <Paragraph position="25"> To illustrate this process, consider an ambiguous text, 'contact gained on kashin'. During the processing of this text, some surface-text alerts arise (for example, 'contact' can be either a noun or a verb' if it's a verb, then there's either a missing subject or an expected passive subject coming, etc.), and an interpretation ambiguity: the text can be interpreted as  meaning either (a) We established visual or radar contact with a kashin ship.</Paragraph>
      <Paragraph position="26"> (b) Our contact (that is, a ship in contact with us)  increased its speed in a chase after a kashin ship. In the case of 'contact gained on kashin', NOMAD's blame assignment algorithm moves through the above steps as follows:  1. (a) Set both 'ambiguous-word-sense' and 'ambiguous-part-of-speech' alerts for the word 'contact': it might be either a noun (that is, the ship that is currently our contact) or a verb (to establish radar or visual contact).</Paragraph>
      <Paragraph position="27"> American Journal of Computational Linguistics, Volume 9, Numbers 3-4, July-December 1983 191 Richard H. Granger The NOMAD System (b) Set 'ambiguous-word-sense' alert for word 'gained': it might mean either 'established' as in 'gained (established) radar contact', or 'advanced' as in 'gained (advanced) on enemy during chase'. 2. Product alternate interpretations based on alternate assumptions about word senses: 'established radar or visual contact with kashin', and 'our contact ship advanced on kashin'.</Paragraph>
      <Paragraph position="28"> 3. (a) Look for possible causality or goal violations: none found.</Paragraph>
      <Paragraph position="29"> (b) Ask user for confirmation: user confirms one interpretation but not the other.</Paragraph>
      <Paragraph position="30"> 4. Solution: Select interpretation confirmed by user.  Consider another example, 'Returned bombs to Kashin'. As noted above, one of two ambiguous interpretations of 'returned' in this text (that is, the '(peaceably) delivered' interpretation) gives rise to an apparent goal violation (delivering weapons to enemy). In the case of 'returned bombs to kashin', the blame assignment algorithm acts as follows:  1. (a) Set 'ambiguous-word-sense' alert for the word 'returned': it might have either of two categories of meaning, corresponding to 're-do a previously-done action (as in 'return the favor', 'return a transmission') or 're-deliver a previously-delivered object (as in 'return a (borrowed) book').</Paragraph>
      <Paragraph position="31"> (b) Set 'ambiguous-word-sense' alert for word 'bombs': it might mean either the verb 'to bomb', present tense, or the plural noun. The former interpretation (that bomb is a verb) also gives rise to a 'missing-clause-boundary' surface alert, since then the 'returned' and 'bombs' verbs would be next to each other.</Paragraph>
      <Paragraph position="32"> 2. Produce alternate interpretations based on alternate assumptions about word senses: 'Delivered object (bombs) to kashin' (after they had delivered some to us) or 'fired on kashin' (after they had fired on us). (The error-ridden alternate interpretations that arise from the verb sense of 'bombs' are also generated.) 3. (a) Look for possible causality or goal violations: With the 'delivery' interpretation, a potential violation of one of NOMAD's known goals is found: Actors of class (enemies) transferring possession of objects of class (weapons) to recipients of class (friends), and vice versa.</Paragraph>
      <Paragraph position="33"> (b) Order the interpretations in order of preference, based on both surface-class and interpretation-class errors; the goal-violation case above is not preferred, and the 'bombs-as-verb' case is not preferred, while the 'firing back at kashin' interpretation is preferred.</Paragraph>
      <Paragraph position="34"> (e) Present preferred interpretation to user; confirmed. (If this had failed, then unpreferred interpretations would have been presented.) 4. Solution: Select confirmed interpretation.</Paragraph>
      <Paragraph position="35"> 4. Blame Assignment in NOMAD  As evidenced in the above examples, there is no simple relationship between types of errors in the interpretation of the input, and possible solutions to those errors. This is primarily because the source of an interpretation error is difficult to identify. In general, interpretation problems can arise from any of a number of surface-text problems, including: 1. words with multiple word senses Returned bombs to Kashin. - see above discussion; 2. missing clause boundaries Challenged ship refused to heave to. - can be interpreted in any of the following ways: (a) We challenged a ship. They refused to heave to. (b) We challenged a ship. We refused to heave to. (c) The challenged ship refused to heave to.</Paragraph>
      <Paragraph position="36"> 3. elliptical or telegraphic sentence construction Contact gained on Kashin. - can be interpreted as: (a) We established visual or radar contact with a kashin ship. (b) Our contact (that is, a ship in contact with us) increased its speed in a chase after a kashin ship).</Paragraph>
      <Paragraph position="37"> As mentioned earlier, NOMAD's goal is to produce correct, unambiguous interpretations of input texts. Its ability to handle ill-formed surface text arises from a need to be able to find surface-text problems that give rise to interpretation problems; it attends to surface-text problems not because they are useful in their own right but only because they may be useful later in solving an interpretation problem. NOMAD collects both surface-text problems and interpretation problems as it processes a text, and for each interpretation problem, it attempts to find a corresponding surface problem that gave rise to it. Once it has an interpretation problem - surface problem pair, it suggests a solution for the overall problem based on the characteristics of both the surface problem and the interpretation problem. In cases where only a surface problem exists and no interpretation problem has arisen, the surface problem is simply ignored as being irrelevant to the true understanding goal of producing a correct, unambiguous interpretation. In cases where an interpretation problem exists but no surface-text problem can be linked to it, NOMAD suggests possible solutions to the interpretation problem that do not depend on surface problems.</Paragraph>
      <Paragraph position="38"> The 'blame assignment chart' below illustrates some of NOMAD's heuristics for finding surface-text alerts that might correspond to a given interpretation problem. null NOMAD's blame assignment algorithm is at the center of its ability to handle syntactically and semantically ill-formed text. Blame assignment in NOMAD is capable of dealing with problems at both the surface-text level and the interpretation level, especially where interpretation problems arise indirectly from surface- null level decisions; in general, there is no simple relationship among surface-text problems, interpretation problems, and potential solutions for these problems. 4,1. Recognizing and correcting surface errors For each of the five categories of surface problems handled by the system, NOMAD's method of recognizing and correcting the problem is briefly described here, along with actual English input and output from NOMAD.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML