File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-1049_metho.xml

Size: 14,656 bytes

Last Modified: 2025-10-06 14:12:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1049">
  <Title>ANALYZING EXPLICITLY-STRUCTURED DISCOURSE IN A LIMITED DOMAIN: TROUBLE AND FAILURE REPORTS*</Title>
  <Section position="4" start_page="266" end_page="266" type="metho">
    <SectionTitle>
ETHER REGULATOR IN UPPER TRANSLOCK WHEN INTERPORT SWITCH DEPRESSED. C. DETERIORATION
DUE TO AGE AND WEAR. D. REPLACED INTERPORT SWITCH WITH A NEW ONE FROM SUPPLY. E. NONE.
</SectionTitle>
    <Paragraph position="0"> TFRs are stored in a historical database. Although the formatted data can be mapped to specific fields of database records, which can then be accessed by a query language, the free-text portions are stored as undigested blocks of text. Currently, keyword search is the only method by which the text can be accessed.</Paragraph>
    <Paragraph position="1"> Problems with keyword search as a method of information retrieval are well-known 3, and this is an area in which NLP techniques can be applied, with potential benefits of increasing the efficiency and accuracy of information retrieval.</Paragraph>
    <Paragraph position="2"> As part of an internally and DARPA-funded R&amp;D project, we applied PUNDIT (\[Grishman 86a\], \[Dahl 87\], \[Dahl 86\]) to the analysis of TFRs. Previous applications of PUNDIT to the analysis of the Remarks field of Navy messages had required only a superficial level of discourse processing above the paragraph. But the richer discourse structure of TFRs required a more sophisticated approach, including a discourse interpretation module and a segment-based approach to focusing. But although the discourse structure of TFRs forced a number of issues, the fact that this structure is explicit and constant across discourses greatly facilitated the analysis of TFR discourse, to which we now turn.</Paragraph>
  </Section>
  <Section position="5" start_page="266" end_page="266" type="metho">
    <SectionTitle>
TFItS AS DISCOUItSE
</SectionTitle>
    <Paragraph position="0"> The perspective of a sentence-based grammar might lead us to ignore the formatted lines of a TFR, to consider as discourse only the textual portions, and to interpret each element of the latter as a full or a 'fragmentary' sentence (cf. \[Linebarger 88\]). On this approach, we would be prepared to analyze the following TFR extract as discourse:</Paragraph>
  </Section>
  <Section position="6" start_page="266" end_page="269" type="metho">
    <SectionTitle>
WHEN ATTEMPTING TO ERASE 2 METERS ON THE EVENT RECORDING STRIP, THE STRIP WOULD
CONTINOUSLY RUN. INVESTIGATION REVEALED THAT &amp;quot;NOYB&amp;quot; WAS BEING GENERATED. AGE AND USE.
REPLACED WITH NEW ITEM AND RETURNED OLD TO SUPPLY. NONE.
</SectionTitle>
    <Paragraph position="0"> However, it is immediately apparent that this approach would be incorrect: the discourse is incoherent.</Paragraph>
    <Paragraph position="1"> Two distinct problems may be identified. After the first two sentences, the remainder bear no apparent relation to preceding discourse. Secondly, one or more discourse entities appear to be missing: age and use - of what? Who (or what) replaced what with a new item? None - of what? The source of incoherency is two-fold: we are missing the initial context established by the interpretation of the formatted lines of the TFR, and we have ignored the basic unit of TFR discourse: the KEQUEST-RESPONSE PAIR. As it turns out, each of the elements of the formatted lines (henceforth the header) has a positional interpretation, and each of the labels A-E maps to a noun phrase label. Each label can be interpreted as a request for information. Now reconsider the TFR above in this light:  A. First indication of trouble: WHEN ATTEMPTING TO ERASE 2 METERS ON THE EVENT RECORDING STRIP, THE STRIP WOULD CONTINOUSLYRUN.</Paragraph>
    <Paragraph position="2"> B. Part failure: INVESTIGATION REVEALED THAT &amp;quot;NOYB&amp;quot; WAS BEING GENERATED.</Paragraph>
    <Paragraph position="3"> C. Probable cause: AGE AND USE.</Paragraph>
    <Paragraph position="4"> D. Action taken: REPLACED WITH NEW ITEM AND RETURNED OLD TO SUPPLY.</Paragraph>
    <Paragraph position="5"> E. Remarks: NONE.</Paragraph>
    <Paragraph position="6">  The discourse is now coherent. As can be seen, responses are interpreted relative to their labels, not to each other. The previously missing discourse entity for the referent of NONE is evoked by the label Remarks (i.e., No remarks), what was replaced is the failed part (identified by the part number), it is the speaker (JONES) who replaced it, and finally, the implicit argument of AGE AND USE is that same failed part. These results underline the need to consider the entire TFR as discourse, and to provide an account of the request-response pair as the basic unit of TFR discourse. In the following sections, we sketch such an account, and then turn to the evidence for higher-level structure.</Paragraph>
    <Paragraph position="7"> The Request-Response Pair Between the request and the response a special type of cohesive relation (\[Schiffrin 87\]) exists, similar to that which binds question-answer pairs. In fact, we claim that at the level of discourse interpretation, the request and response form a discontinuous predicate-argument structure 4. This view of the request-response pair arises from the need to account for the interpretation of pairs such as Probable cause: BROKEN WIRE, from which we are somehow able to conclude: The respondant believes that a broken wire caused the failure.</Paragraph>
    <Paragraph position="8"> Very briefly, we suggest that the mechanisms required to achieve this result are essentially those required (at the level of sentence grammar) for the interpretation of specificational copular sentencesb: lambdnabstraction, function application, and lambda-reduction. First, we take the heads of NP labels to be relational nouns with internal argument structure. For both (la) and (lb) below, we derive the representation in (2) by lambda-abstracting on the free variable. Function application and lambda-reduction yield the representation in (3), which is (non-coincidentally) also the representation of A broken wire caused the failure: la. The cause of failure was a broken wire.</Paragraph>
    <Paragraph position="9">  lb. Cause of failure: broken wire 2. \[Ax\[cause(x,failure)\]\] (wire) 3. cause(wire,failure)  Each label in the TFR marks the start of a request-response pair. But does this unit correspond to a discourse segment, and if so, what is the higher-level structure of the TFR? We studied patterns of reference in TFRs and found evidence for both explicit and implicit structure, as described below. The Role of the Message Header. The message header identifies the author of the report, the date on which it was sent, the date on which the problem occurred, the equipment, and the failed part. The dates are crucial to the temporal analysis of the message (which we shall not discuss here). Our analysis of the TFR corpus reveals the remaining entities (speaker, equipment, failed part) to be highly salient in the 4Specifically, we take the NP label to express an OPEN PROPOSITION (\[Prince 86\]), which can be viewed as an informationally incomplete predication; the response provides its argument.</Paragraph>
    <Paragraph position="10">  discourse: they are available for pronominal reference in segments A-E, without requiring reintroduction by a full NP.</Paragraph>
    <Paragraph position="11"> In addition, these entities fill implicit argument positions in the agentless passive, in possible intransitive uses of certain verbs (replace, return), and in some relational nouns (e.g. age, wear). These facts lead us to assign these three entities the distinguished status of global loci: entities which are always salient in the discourse context at the beginning of each new discourse segment.</Paragraph>
    <Paragraph position="12"> Sections A-E. To determine whether each of these sections (First indication of trouble, Part :failure, Probable cause, Action taken, Remarks) constitutes a discourse segment, we studied patterns of pronominal reference in the responses. The results were striking. In 804 occurrences of referential pronouns (707 of which were zero-subjects6), we found that only zero-subjects, /, we, and this refer beyond the boundary of the current request-response pair. 95% of the zero-subjects and all of the occurrences of I refer to the speaker. The remaining 5% of zero-subjects are distributed between reference to one of the global foci and segment-internal reference, with a slight bias towards the latter. It, he, they, these, those were found to refer purely locally (that did not occur). With the exception of this and the indexicals, pronominal reference is sensitive to the boundary of the request-response pair, and we conclude that each such pair is indeed a discourse segment.</Paragraph>
    <Paragraph position="13"> In the demonstrative this, however, we found unexpected evidence for additional implicit structure: when occurring in segment E (Remarks), this can refer to the failure, or problem, described in segments A-D. Now, \[Webber 88\] argues that demonstrative reference of this type is sensitive to the right frontier of the discourse tree: that is, 'the set of nodes comprising the most recent closed segment and all currently open segments' (Webber 1988:114). If, as we had assumed, segments A-E are sisters, then segment D (Action taken) is the most recently closed segment, and there are no segments open other than the current segment, E. But none of the occurrences of this in segment E refer to segment D. To make sense of the data, we were led to the conclusion that segments A-D form an unlabelled, implicit segment: the failure. The Remarks segment is then the sister of this implicit segment; after closing segment D, this higher segment is closed, and thus lies on the right frontier when E is opened. From these observations we posit the following structure for the TFR:</Paragraph>
    <Paragraph position="15"> The TFR application uses the PUNDIT natural-language processing system to analyze TFRs. The results of analysis ar e passed to a database module, which maps PUNDIT'S representations to pre-defined records in a Prolog relational database. This database can then be queried using a natural-language query facility (QFE). Here, we discuss only the analysis part of the application.</Paragraph>
    <Paragraph position="16"> In terms of user interaction, the TFR data-collection program superficially resembles traditional dataprocessing approaches to forms automation: the system prompts for each item on the form, and the user's response to each prompt is validated. If the response is judged invalid, an error message is issued and the user is reprompted.</Paragraph>
    <Paragraph position="17"> eAs in INSTALLED NEW ITEM, RETURNED OLD TO SUPPLY.</Paragraph>
    <Paragraph position="18">  Under the covers, however, the approach is quite different: the data-collection program is in fact a discourse manager, controlling and interpreting a dialogue between itself and the user. As the dialogue proceeds, it maintains a model of the discourse, calls PUNDIT'S syntactic and semantic/pragmatic components to analyze the user's responses, and then interprets the response in the context of the prompt to derive new propositions. In addition, it manages the availability of discourse entities, moving entities in and out of focus as the discourse proceeds from one segment to the next.</Paragraph>
  </Section>
  <Section position="7" start_page="269" end_page="270" type="metho">
    <SectionTitle>
IMPLEMENTATION
</SectionTitle>
    <Paragraph position="0"> The TFR Discourse Manager is implemented as a single top-level control module, written in Prolog, which uses PUNDIT as a resource. Its highest-level goals are to collect pre-defined information from the user and send the resulting information state to a database update module.</Paragraph>
    <Paragraph position="1"> At the level of user interaction, the module's goals are to process the request-response units corresponding to the header items and the segments A-E. In the header segment, the Discourse Manager prompts for each of the header items (speaker, date, part number, etc.), and calls PUNDIT to analyze the responses.</Paragraph>
    <Paragraph position="2"> The responses give rise to discourse entities, whose representations are added to the DISCOURSE LIST for subsequent full-NP reference. The three global foci (speaker, failed part, and equipment) are stored in a distinguished location in the discourse model.</Paragraph>
    <Paragraph position="3"> For each of the remaining segments (A-E), the processing is described below.</Paragraph>
    <Paragraph position="4">  Before the system can interpret the user's response to a prompt, it must first 'understand' what it is about to ask. This step, while intuitive, is actually required in order to create the context for interpreting the response. We look up the meaning of the prompt (stored as a lambda expression), create a discourse entity, and place it at the head of the focus list. This makes the prompt the most salient entity in the context when the response is processed, and allows for both pronominal and implicit reference, e.g. Probable cause: UNKNOWN. Having done this, we issue the prompt and collect the user's response.</Paragraph>
    <Paragraph position="5"> Analyze the Response Two levels of interpretation are provided. First, PUNDIT is called to analyze the response; next, the response entity is bound to a variable in the representation of the prompt, to derive a new proposition.</Paragraph>
    <Paragraph position="6"> Two types of call to PUNDIT are required, in order to handle both NP responses (BROKEN WIRE) and sentential or paragraph responses (BELIEVE PROBLEM TO HAVE BEEN CAUSED BY FAILURE OF UPPER WIDGET). If the response can be analyzed by PUNDIT's syntactic component as an NP, then a side-door to PUNDIT semantic and pragmatic analysis is used to provide a semantic interpretation and create a discourse entity.</Paragraph>
    <Paragraph position="7"> If the response cannot be analyzed as an NP, then the normal entrance points for syntactic and semantic/pragmatic analysis are used. This results in the creation of one or more situation entities, which are grouped together to form a higher-level response entity.</Paragraph>
    <Paragraph position="8"> Finally, the response entity is bound to the variable in the representation of the prompt, and lambda reduction is applied. The resulting representation is added to the discourse list, where it becomes available for subsequent full-NP reference (e.g. The failure..., The cause...).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML