File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0214_metho.xml

Size: 20,425 bytes

Last Modified: 2025-10-06 14:09:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0214">
  <Title>Discourse Annotation in the Monroe Corpus</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Aims of Monroe Project
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Parser Development
</SectionTitle>
      <Paragraph position="0"> One of the aims of the Monroe Project was to develop a wide coverage grammar for spoken dialogue. Since parsing is just an initial stage of natural language understanding, the project was focused not just on obtaining syntactic trees alone (as is done in many other parsed corpora, for example, Penn TreeBank (Marcus et al., 1993) or Tiger (Brants and Plaehn, 2000)). Instead, we aimed to develop a parser and grammar for the production of syntactic parses and semantic representations useful in discourse processing.</Paragraph>
      <Paragraph position="1"> The parser produces a domain-independent semantic representation with information necessary for referential and discourse processing, in particular, domain-independent representations of determiners and quantifiers (to be resolved by our reference module), domain-independent representations for discourse adverbials, and tense, aspect and modality information. This necessitated the development of a domain-independent logical form syntax and a domain-independent ontology as a source of semantic types for our representations (Dzikovska et al., 2004). In subsequent sections we discuss how the parser-generated representations are used as a basis for discourse annotation.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Reference Resolution Development
</SectionTitle>
      <Paragraph position="0"> In spoken dialogue, choice of referring expression is influential and influenced by the main entities being discussed and the intentions of the speaker. If an entity is mentioned frequently, and thus is very important to the current topic, it is usually pronominalized. Psycholinguistic studies show that salient terms are usually evoked as pronouns because of the lighter inference load they place on the listener. Because pronouns occur frequently in discourse, it is very important to know what they resolve to, so the entire sentence can be processed correctly. A corpus annotated for reference relations allows one to compare the performance of different reference algorithms. null</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Discourse Segmentation
</SectionTitle>
      <Paragraph position="0"> Another research area that can benefit from a discourse-annotated corpus is discourse structure.</Paragraph>
      <Paragraph position="1"> There has been plenty of theoretical work such as (Grosz and Sidner, 1986), (Moser and Moore, 1996) which shows that just as sentences can be decomposed into smaller constituents, a discourse can be decomposed into smaller units called discourse segments. Though there are many different ways to segment discourse, the common themes are that some sequences are more closely related than others (discourse segments) and that a discourse can be organized as a tree, with the leaves being the individual utterances and the interior nodes being discourse segments. The embeddedness of a segment effects which previous segments, and thus their entities, are accessible. As a discourse progresses, segments close and unless they are close to the root of the tree (have a low embedding) may not be accessible. null Discourse segmentation has implications for spoken dialogue systems. Properly detecting discourse structure can lead to improved reference resolution accuracy since competing antecedents in inaccessible clauses may be removed from consideration. Discourse segmentation is often closely related to plan and intention recognition, so recognizing one can lead to better detection of the other. Finally, segmentation reduces the size of the history or context maintained by a spoken dialogue system, thus decreasing the search space for referents.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Monroe Corpus Construction
</SectionTitle>
    <Paragraph position="0"> The Monroe domain is a series of task-oriented dialogs between human participants (Stent, 2001) designed to encourage collaborative problem-solving and mixed-initiative interaction. It is a simulated rescue operation domain in which a controller receives emergency calls and is assisted by a system or another person in formulating a plan to handle emergencies ranging from requests for medical assistance to civil disorder to snow storms. Available resources include maps, repair crews, plows, ambulances, helicopters and police.</Paragraph>
    <Paragraph position="1"> Each dialog consisted of the execution of one task which lasted about ten minutes. The two participants were told to construct a plan as if they were in an emergency control center. Each session was recorded to audio and video, then broken up into utterances under the guidelines of (Heeman and Allen, 1994). Finally, the segmented audio files were transcribed by hand. The entire Monroe corpus consists of 20 dialogs. The annotation work we report here is based on 5 dialogs totaling 1756 utterances 1.</Paragraph>
    <Paragraph position="2"> Discourse annotation of the Monroe Corpus consisted of three phases: first, a semi-automated annotation loop that resulted in parser-generated syntactic and semantic analyses for each sentence. Second, the corpus was manually annotated for reference information for pronouns and coreferential information for definite noun phrases. Finally, discourse segmentation was conducted manually. In the following sections we discuss each of the three phases in more detail.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Building the Parsed Corpus
</SectionTitle>
      <Paragraph position="0"> To build the annotated corpus, we needed to first have a parsed corpus as a source of discourse entities. We built a suite of tools to rapidly develop parsed corpora (Swift et al., 2004). These are Java GUI for annotating speech repairs, a LISP tool to parse annotated corpora and merge in changes, and a Java tool interface to manually check the automatically generated parser analyses (the CorpusTool). Our goal in building the parsed corpus is to obtain the output suitable for further annotation for reference and discourse information. In particular, the parser achieves the following: a0 Identifies the referring expressions. These are definite noun phrases, but also verb phrases and propositions which can be referred to by deictic pronouns such as that. All entities are assigned a unique variable name which can be used to identify the referent later.</Paragraph>
      <Paragraph position="1"> a0 Identifies implicit entities. These are implicit subjects of imperatives, and also some implicit arguments of relational nouns (e.g., the implied object in the phrase the weight) and of adverbials (e.g., the implied reference time in That happened before).</Paragraph>
      <Paragraph position="2"> a0 Identifies speech acts. These are based on the syntactic form of the utterance only, but they provide an initial analysis which can later be extended in annotation.</Paragraph>
      <Paragraph position="3"> Examples of the logical form representation for the sentence So the heart attack person can't go</Paragraph>
      <Paragraph position="5"/>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
:INPUT (THE HEART ATTACK PERSON))
</SectionTitle>
    <Paragraph position="0"> there (dialog s2, utterance 173) is shown in Figures 1 and 2. Figure 1 shows the full term for the noun phrase the heart attack person. It contains the term identifier :VAR V3283471, the logical form (:LF), the set of semantic features associated with the term (:SEM), and the list of words associated with the term (:INPUT). The semantic features are the domain-independent semantic properties of words encoded in our lexicon. We use them to express selectional restrictions (Dzikovska, 2004) and we are currently investigating their use in reference resolution. For discourse annotation, we primarily rely on the logical forms.</Paragraph>
    <Paragraph position="1"> The abbreviated logical form for the sentence is shown in Figure 2. It contains the speech act for the utterance, SA TELL, in the first term. There is a domain-independent term for the discourse adverbial So2, and the term for the main event, (LF::Move GO), which contains the tense and modal information in the :TMA field. The phrase the heart attack person is represented by two terms linked together with the :ASSOC-WITH relationship, to be resolved during discourse processing. Finally, there is a term for the adverbial modifier there, which also results in the implicit pronoun (the 2So is identified as a conjunct because it is a connective, and its meaning cannot be identified more specifically by the parser without pragmatic reasoning last term in the representation) denoting a place to which the movement is directed. The terms provide the basic building blocks to be used in the discourse annotation, and their unique identifiers are used as reference indices, as discussed in the next section.</Paragraph>
    <Paragraph position="2"> The corpus-building process consists of three stages: initial annotation, parsing and handchecking. The initial annotation prepares the sentences as suitable inputs to the TRIPS parser. It is necessary because handling speech repairs and utterance segmentation is a difficult task, which our parser cannot do automatically at this point. Therefore, we start with segmenting the discourse turns into utterances and marking the speech repairs using our tool. We also mark incomplete and ungrammatical utterances which cannot be successfully interpreted. null Once the corpus is annotated for repairs, we use our automated LISP testing tool to parse the entire corpus. Our parser skips over the repairs we marked, and ignores incomplete and ungrammatical utterances. Then, it marks utterances &amp;quot;AUTO-GOOD&amp;quot; and &amp;quot;AUTO-BAD&amp;quot; as a guideline for annotators. As a first approximation, the utterances where there is a parse covering the entire utterance are marked as &amp;quot;AUTO-GOOD&amp;quot; and those where there is not are marked as &amp;quot;AUTO-BAD&amp;quot;. Then these results are hand-checked by human annotators using our CorpusTool to inspect the analyses and either mark them as &amp;quot;GOOD&amp;quot;, or mark the incorrect parses as &amp;quot;BAD&amp;quot;, and add a reason code explaining the problem with the parse. Note that we use a strict criterion for accuracy so only utterances that have both a correct syntactic structure and a correct logical form can be marked as &amp;quot;GOOD&amp;quot;. The CorpusTool allows annotators to view the syntactic and semantic representations at different levels of granularity. The top-level LF tree shown in Figure 3 allows a number of crucial aspects of the representation to be checked quickly. Note that the entity identifiers are color-coded, which is a great help for checking variable mappings. If everything shown in the top-level representation is correct, the full LF with all terms expanded can be viewed. Similarly, levels of the parse tree can be hidden or expanded as needed.</Paragraph>
    <Paragraph position="3"> After the initial checking stage, we analyze the utterances marked &amp;quot;BAD&amp;quot; and make changes in the grammar and lexicon to address the BAD utterances whenever possible. Occasionally, when the problems are due to ambiguity, the parser is able to parse the utterance, but the interpretation it selects is not the correct one among possible alternatives. In this case, we manually select the correct parse and add it to the gold-standard corpus.</Paragraph>
    <Paragraph position="4"> Once the changes have been made, we re-parse the corpus. Our parsing tool determines automatically which parses have been changed and marks them to be re-checked by the human annotators.</Paragraph>
    <Paragraph position="5"> The CorpusTool has the functionality to quickly locate the utterances marked as changed for rechecking. This allows us to quickly conduct several iterations of re-checking and re-parsing, bringing the coverage in the completed corpus high enough so that it may now be annotated for reference information. The hand-checking scheme was found to be quite reliable, with a kappa of 0.79. Currently, 85% of the grammatical sentences are marked as GOOD in the gold-standard coverage of the 5 dialogs in the Monroe corpus.</Paragraph>
    <Paragraph position="6"> Several iterations of the check and re-parse cycle were needed to achieve parsing accuracy suitable for discourse annotation. Once the suitable accuracy level has been reached, the reference annotation process starts.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Adding Reference Information
</SectionTitle>
      <Paragraph position="0"> As in the parser development phase, we built a Java tool for annotating the parsed corpora for reference.</Paragraph>
      <Paragraph position="1"> First, the relevant terms were extracted from the LF representation of the semantic parse. These included all verbs, noun phrases, implicit pronouns, etc. Next, the sentences were manually marked for reference using the tool (PronounTool).</Paragraph>
      <Paragraph position="2"> There are many different ways to mark how entities refer. Our annotation scheme is based on the GNOME project scheme (Poesio, 2000) which annotates referential links between entities as well as their respective discourse and salience information.</Paragraph>
      <Paragraph position="3"> The main difference in our approach is that we do not annotate discourse units and certain semantic features, and most of the basic syntactic and semantic features are produced automatically for us in the parsing phase.</Paragraph>
      <Paragraph position="4"> We use standoff annotation to separate our coreference annotation from the syntactic and semantic parse annotations. The standoff file for pronouns consists of two fields for each pronoun to handle the reference information: relation, which specifies how the entities are related; and refers-to, which specifies the id of the term the referential entity in question points to.</Paragraph>
      <Paragraph position="5"> The focus for our work has been on coreferential pronouns and noun phrases, although we also annotated the classes of all other pronouns. Typically, the non-coreferential pronouns are difficult to annotate reliably since there are a myriad of different categories for bridging relations and for specifying  Because our focus was on coreferential entities, we had our annotators annotate only the main relation type for the non-coreferential pronouns since these could be done more reliably. The relations we used are listed below: Identity both entities refer to the same object (coreference) null Dummy non-referential pronouns (expletive or pleonastic) null Indexicals expressions that refer to the discourse speakers or temporal relations (ie. I, you, us, now) Action pronouns which refer to an action or event Demonstrative pronouns that refer to an utterance or series of utterances Functional pronouns that are indirectly related to another entity, most commonly bridging and one anaphora Set plural pronouns that refer to a collection of mentioned entities Hard pronouns that are too difficult to annotate Entities in identity, action and functional relations had refers-to fields that pointed to the id of a specific term (or terms if the entity was a plural composed of other entities). Dummy had no refers-to set since they were not included in the evaluation. Demonstrative pronouns had refers-to fields pointing to either utterance numbers or a list of utterance numbers in the case of a discourse segment. Finally, there were some pronouns for which it was too difficult to decide what they referred to, if anything. These typically were found in incomplete sentences without a verb to provide semantic information.</Paragraph>
      <Paragraph position="6"> After the annotation phase, a post-processing phase identifies all the noun phrases that refer to the same entity, and generates a unique chain-id for this entity. This is similar to the a0a2a1 a3a5a4 field in the GNOME scheme. The advantage of doing this processing is that it is possible for a referring expression to refer to a past instantiation that was not the last mentioned instantiation, which is usually what is annotated. As a result, it is necessary to mark all coreferential instantiations with the same identification tag.</Paragraph>
      <Paragraph position="7"> Figure 5 shows a snapshot of the PronounTool in use for the pronoun there in the second utterance of our example. The top pane has buttons to skip to the next or previous utterance with a pronoun or noun phrase. The lower pane has the list of extracted entities for easy viewing. The &amp;quot;Relation&amp;quot; box is a drop down menu consisting of the relations listed above.</Paragraph>
      <Paragraph position="8"> In this case, the identity relation has been selected for there. The next step is to select an entity from the context that the pronoun refers to. By clicking on the &amp;quot;Refers To&amp;quot; box, a context window pops up with all the entities organized in order of appearance in the discourse. The user selects the entity and clicks &amp;quot;Select&amp;quot; and the antecedent id is added to the refers-to field.</Paragraph>
      <Paragraph position="9"> Our aim with this part of the project (still in a preliminary stage) is to investigate whether a shallow discourse segmentation (which is generated automatically) is enough to aid in pronominal reference resolution. Previous work has focused on using complex nested tree structures to model discourse and dialogue. While this method may be the best way to go ultimately, empirical work has shown that it has been difficult to put into practice. There are many different schemes to choose from, for example Rhetorical Structure Theory (Mann and Thompson, 1986) or the stack model (Grosz and Sidner, 1986) and manually annotating with these schemes has variable reliability. Finally, annotating these schemes requires real-world knowledge, reasoning, and knowledge of salience and semantics, all of which make automatic segmentation difficult.</Paragraph>
      <Paragraph position="10"> However, past studies such as Tetreault and Allen (2003) show that for reference resolution, a highlystructured tree may be too constraining, so a shallower approach may be acceptable for studying the effect of discourse segmentation on resolution.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Discourse Segmentation
</SectionTitle>
      <Paragraph position="0"> Our preliminary segmentation scheme is as follows.</Paragraph>
      <Paragraph position="1"> In a collaborative domain, participants work on a task until completion. During the conversation, the participants raise questions, supply answers, give orders or suggestions and acknowledge each other's information and beliefs. In our corpus, these speech acts and discourse cues such as so and then are tagged automatically for reliable annotation. We use this information to decide when to begin and end a discourse segment.</Paragraph>
      <Paragraph position="2"> Roberts (1996) suggests that questions are good indicators of the start of a discourse segment be- null cause they open up a topic under discussion. An answer followed by a series of acknowledgments usually signal a segment close. Currently we annotate these segments manually by maintaining a &amp;quot;holdout&amp;quot; file for each dialog which contains a list of all the segments and their start, end and type information. null For example, given the discourse as shown in Figure 6, the discourse segments would be Figure 7. The starts of both segments are adjacent to sentences that are questions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML