File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1612_intro.xml

Size: 8,232 bytes

Last Modified: 2025-10-06 14:03:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1612">
  <Title>Learning Information Status of Discourse Entities</Title>
  <Section position="4" start_page="94" end_page="96" type="intro">
    <SectionTitle>
2 Data
</SectionTitle>
    <Paragraph position="0"> For our experiments we annotated a portion of the transcribed Switchboard corpus (Godfrey et al., 1992), consisting of 147 dialogues (Nissim et al., 2004).1 Inthefollowing sectionweprovideabrief description of the annotation categories.</Paragraph>
    <Section position="1" start_page="94" end_page="94" type="sub_section">
      <SectionTitle>
2.1 Annotation
Ourannotationofinformationstatusmainlybuilds
</SectionTitle>
      <Paragraph position="0"> on (Prince, 1992), and employs a distinction into old, mediated, and new entities similar to the work of (Strube, 1998; Eckert and Strube, 2001).</Paragraph>
      <Paragraph position="1"> All noun phrases (NPs) were extracted as markable entities using pre-existing parse information (Carletta et al., 2004). An entity was annotated as new if it has not been previously referred to and is yet unknown to the hearer. The tag mediated was instead used whenever an entity that is newly mentioned in the dialogue can be inferred by the hearer thanks to prior or general context.2 Typical examples of mediated entities are generally known objects (such as &amp;quot;the sun&amp;quot;, or &amp;quot;the Pope&amp;quot; (L&amp;quot;obner, 1985)), and bridging anaphors (Clark, 1975; Vieira and Poesio, 2000), where an entity is related to a previously introduced one. Wheneveranentitywasneithernewnormediated, itwas considered as old.</Paragraph>
      <Paragraph position="2">  ferrables.</Paragraph>
      <Paragraph position="3"> In order to account for the complexity of the notion of information status, the annotation also includes a sub-type classification for old and mediated entities that provides a finer-grained distinction with information on why a given entity is mediated (e.g., set-relation, bridging) or old (e.g., coreference, generic pronouns). In order to test the feasibility of automatically assigning information status to discourse entities, we took a modular approach and only considered the coarser-grained distinctions for this first study. Information about the finer-grained subtypes will be used in future work.</Paragraph>
      <Paragraph position="4"> In addition to the main categories, we used two more annotation classes: a tag non-applicable, used for entities that were wrongly extracted in the automatic selection of markables (e.g. &amp;quot;course&amp;quot; in &amp;quot;of course&amp;quot;), for idiomatic occurrences, and expletive uses of &amp;quot;it&amp;quot;; and a tag not-understood to be applied whenever an annotator did not fully understand the text. Instances annotated with these two tags, as well as all traces, which were left unannotated, were excluded from all our experiments.</Paragraph>
      <Paragraph position="5"> Inter-annotator agreement was measured using the kappa (K) statistics (Cohen, 1960; Carletta, 1996) on 1,502 instances (three Switchboard dialogues) marked by two annotators who followed specific written guidelines. Given that the task involves a fair amount of subjective judgement, agreement was remarkably high. Over the three dialogues, the annotation yielded K = .845 for the old/med/new classification (K = .788 when including the finer-grained subtype distinction).</Paragraph>
      <Paragraph position="6"> Specifically, &amp;quot;old&amp;quot; proved to be the easiest to distinguish, with K = .902; for &amp;quot;med&amp;quot; and &amp;quot;new&amp;quot; agreement was measured at K = .800 and K = .794, respectively. A value of K &gt; .76 is usually considered good agreement. Further details on the annotation process and corpus description are provided in (Nissim et al., 2004)</Paragraph>
    </Section>
    <Section position="2" start_page="94" end_page="96" type="sub_section">
      <SectionTitle>
2.2 Setup
</SectionTitle>
      <Paragraph position="0"> We split the 147 dialogues into a training, a development and an evaluation set. The training set contains40,865NPsdistributedover94dialogues, the development set consists of 23 dialogues for a total of 10,565 NPs, and the evaluation set comprises 30 dialogues with 12,624 NPs. Instances were randomised, so that occurrences of NPs from the same dialogue were possibly split across the different sets.</Paragraph>
      <Paragraph position="1">  Table 1 reports the distribution of classes for the training, development and evaluation sets. The distributions are similar, with a majority of old entities, followed by mediated entities, and lastly by  The target classes for our classification experiments are the annotation tags: old, mediated, and new. As baseline, we could take a simple &amp;quot;mostfrequent-class&amp;quot; assignment that would classify all entities as old, thus yielding an accuracy of 47.9% on the evaluation set (see Table 1). Although the &amp;quot;all-old&amp;quot; assumption makes a reasonable baseline,  itwouldnotprovideaparticularlyinterestingsolution from a practical perspective, since a dialogue should also contain not-old information. Thus, rather than adopting this simple strategy, we developed a more sophisticated baseline working on a set of hand-crafted rules.</Paragraph>
      <Paragraph position="2"> This hand-crafted algorithm is based on rather straightforward, intuitive rules, partially reflecting the instructions specified in the annotation guidelines. As shown in Figure 1, the top split is the NP type: whether the instance to classify is a pronoun, a proper noun, or a common noun. The other information that the algorithm uses is about complete or partial string overlapping with respect to the dialogue's context. For common nouns we also consider the kind of determiner (definite, indefinite, demonstrative, possessive, or bare).</Paragraph>
      <Paragraph position="3"> In order to obtain the NP type information, we exploited the pre-existing morpho-syntactic tree-bank annotation of Switchboard. Whenever the extraction failed, we assigned a type &amp;quot;other&amp;quot; and always backed-off these cases to old (the most frequent class in training data). Values for the other features were obtained by simple pattern matching and NP extraction.</Paragraph>
      <Paragraph position="4"> Evaluation measures The algorithm's performance is evaluated with respect to its general accuracy (Acc): the number of correctly classified instances over all assignments. Moreover, for each  case NP is a pronoun status := old case NP is a proper noun if first occurrence then status := med else status := old endif case NP is a common noun if identical string already mentioned then status := old else if partial string already mentioned then status := med else if determiner is def/dem/poss then status := med else status := new endif endif endif otherwise status := old  the assignment of information status to NPs. class (c), we report precision (P), recall (R), and f-</Paragraph>
      <Paragraph position="6"> The overall accuracy of the rule-based algorithm is 65.8%. Table 2 shows the results for each targetclassinboththedevelopmentandevaluation sets. We discuss results on the latter.</Paragraph>
      <Paragraph position="7"> Although a very high proportion of old entities is correctly retrieved (93.5%), this is done with relatively low precision (66.7%). Moreover, both precision and recall for the other classes are disappointing. Unsurprisingly, the rules that apply to common nouns (the most ambiguous with respect to information status) generate a large num- null ber of false positives. The rule that predicts an old entity in case of a full previous mention, for example, has a precision of only 39.8%. Better, but not yet satisfactory, is the precision of the rule that predicts a mediated entity for a common noun that has a previous partial mention (64.7%). The worst performing rule is the one that assigns the most frequent class (old) to entities of syntactic type&amp;quot;other&amp;quot;, withaprecisionof35.4%. Togivean idea of the correlation between NP type and information status, in Table 3 we report the distribution observed in the evaluation set.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML