File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1031_metho.xml

Size: 16,165 bytes

Last Modified: 2025-10-06 14:09:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1031">
  <Title>Towards Finding and Fixing Fragments: Using ML to Identify Non-Sentential Utterances and their Antecedents in Multi-Party Dialogue</Title>
  <Section position="4" start_page="247" end_page="247" type="metho">
    <SectionTitle>
2 The Tasks
</SectionTitle>
    <Paragraph position="0"> As we said in the introduction, the main task we want to tackle is to align (certain kinds of) NSUs and their antecedents. Now, what characterises this kind of NSU, and what are their antecedents? In the examples from the introduction, the NSUs can be resolved simply by looking at the previous utterance, which provides the material that is elided in them. In reality, however, the situation is not that simple, for three reasons: First, it is of course not always the previous utterance that provides this material (as illustrated by (2), where utterance 7 is resolved by utterance 1); in our data the average distance in fact is 2.5 utterances (see below).</Paragraph>
    <Paragraph position="1">  (2) 1 B: [. . . ] What else should be done ? 2 C: More intelligence .</Paragraph>
    <Paragraph position="2"> 3 More good intelligence .</Paragraph>
    <Paragraph position="3"> 4 Right .</Paragraph>
    <Paragraph position="4"> 5 D: Intelligent intelligence .</Paragraph>
    <Paragraph position="5"> 6 B: Better application of face and voice recognition .</Paragraph>
    <Paragraph position="6"> 7 C: More [. . . ] intermingling of the  agencies , you know .</Paragraph>
    <Paragraph position="7"> [ from NSI 20011115 ] Second, it's not even necessarily a single utterance that does this-it might very well be a span of utterances, or something that has to be inferred from such spans (parallel to the situation with pronouns, as discussed empirically e.g. in (Strube and M&amp;quot;uller, 2003)). (3) shows an example where a new topic is broached by using an NSU. It is possible to analyse this as an answer to the question under discussion &amp;quot;what shall we organise for the party?&amp;quot;, as (Fern'andez et al., 2004a) would do; a question, however, which is only implicitly posed by the previous discourse, and hence this is an example of an NSU that does not have an overt antecedent.</Paragraph>
    <Paragraph position="8">  (3) [after discussing a number of different topics]</Paragraph>
    <Paragraph position="10"> Lastly, not all NSUs should be analysed as being the result of ellipsis: backchannels for example (like the &amp;quot;Right&amp;quot; in utterance 4 in (2) above) seem to directly fulfil their discourse function without any need for reconstruction.2 To keep matters simple, we concentrate in this paper on NSUs of a certain kind, namely those that a) do not predominantly have a discourse-management function (like for example backchannels), but rather convey messages (i.e., propositions, questions or requests)--this is what distinguishes fragments from other NSUs--and b) have individual utterances as antecedents. In the terminology of (Schlangen and Lascarides, 2003), fragments of the latter type are resolution-via-identity-fragments, where the elided information can be identified in the context and need not be inferred (as opposed to resolution-viainference-fragments). Choosing only this special kind of NSUs poses the question whether this sub-group is distinguished from the general group of fragments by criteria that can be learnt; we will return to this below when we analyse the errors made by the classifier.</Paragraph>
    <Paragraph position="11"> We have defined two approaches to this task. One is to split the task into two sub-tasks: identifying fragments in a corpus, and identifying antecedents for fragments. These steps are naturally performed sequentially to handle our main task, but they also allow the fragment classification decision to come from another source--a language-model used in an automatic speech recognition system, for example-and to use only the antecedent-classifier. The other approach is to do both at the same time, i.e. to classify pairs of utterances into those that combine a fragment and its antecedent and those that don't. We report the results of our experiments with these tasks below, after describing the data we used.</Paragraph>
  </Section>
  <Section position="5" start_page="247" end_page="249" type="metho">
    <SectionTitle>
3 Corpus, Features, and Data Creation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="247" end_page="248" type="sub_section">
      <SectionTitle>
3.1 Corpus
</SectionTitle>
      <Paragraph position="0"> As material we have used six transcripts from the &amp;quot;NIST Meeting Room Pilot Corpus&amp;quot; (Garofolo et al., 2004), a corpus of recordings and transcriptions of multi-party meetings.3 Those six transcripts con2The boundaries are fuzzy here, however, as backchannels can also be fragmental repetitions of previous material, and sometimes it is not clear how to classify a given utterance. A similar problem of classifying fragments is discussed in (Schlangen, 2003) and we will not go further into this here.</Paragraph>
      <Paragraph position="1"> 3We have chosen a multi-party setting because we are ultimately interested in automatic summarisation of meetings. In this paper here, however, we view our task as a &amp;quot;stand-alone task&amp;quot;. Some of the problems resulting in the presence of many  average distance a - b (utterances): 2.5 a declarative 159 (52%) a interrogative 140 (46%) a unclassfd. 8 (2%) b declarative 235 (76%) b interrogative (23%) b unclassfd. 2 (0.7%) a being last in their turn 142 (46%) b being first in their turn 159 (52%)  notes antecedent, b fragment.) sist of 5,999 utterances, among which we identified 307 fragment-antecedent pairs.4,5 With 5.1% this is a lower rate than that reported for NSUs in other corpora (see above); but note that as explained above, we are actually only looking at a sub-class of all NSUs here.</Paragraph>
      <Paragraph position="2"> For these pairs we also annotated some more attributes, which are summarised in Table 1. Note that the average distance is slightly higher than that reported in (Schlangen and Lascarides, 2003) for (2-party) dialogue (1.8); this is presumably due to the presence of more speakers who are able to reply to an utterance. Finally, we automatically annotated all utterances with part-of-speech tags, using TreeTagger (Schmid, 1994), which we've trained on the switchboard corpus of spoken language (Godfrey et al., 1992), because it contains, just like our corpus, speech disfluencies.6 We now describe the creation of the data we used for training. We first describe the data-sets for the different tasks, and then the features used to represent the events that are to be classified.</Paragraph>
    </Section>
    <Section position="2" start_page="248" end_page="249" type="sub_section">
      <SectionTitle>
3.2 Data Sets
</SectionTitle>
      <Paragraph position="0"> Data creation for the fragment-identification task (henceforth simply fragment-task) was straightforspeakers are discussed below.</Paragraph>
      <Paragraph position="1">  corplex/TreeTagger/DecisionTreeTagger.html. ward: for each utterance, a number of features was derived automatically (see next section) and the correct class (fragment / other) was added. (Note that none of the manually annotated attributes were used.) This resulted in a file with 5,999 data points for classification. Given that there were 307 fragments, this means that in this data-set there is a ratio positives (fragments) vs. negatives (non-fragments) for the classifier of 1:20. To address this imbalance, we also ran the experiments with balanced data-sets with a ratio of 1:5.</Paragraph>
      <Paragraph position="2"> The other tasks, antecedent-identification (antecedent-task) and antecedent-fragmentidentification (combined-task) required the creation of data-sets containing pairs. For this we created an &amp;quot;accessibility window&amp;quot; going back from each utterance. Specifically, we included for each utterance a) all previous utterances of the same speaker from the same turn; and b) the three last utterances of every speaker, but only until one speaker took the turn again and up to a maximum of 6 previous utterances. To illustrate this method, given example (2) it would form pairs with utterance 7 as fragment-candidate and all of utterances 6-2, but not 1, because that violates condition b) (it is the second turn of speaker B).</Paragraph>
      <Paragraph position="3"> In the case of (2), this exclusion would be a wrong decision, since 1 is in fact the antecedent for 7. In general, however, this dynamic method proved good at capturing as many antecedents as possible while keeping the number of data points manageable. It captured 269 antecedent-fragment pairs, which had an average distance of 1.84 utterances. The remaining 38 pairs which it missed had an average distance of 7.27 utterances, which means that to capture those we would have had to widen the window considerably. E.g., considering all previous 8 utterances would capture an additional 25 pairs, but at the cost of doubling the number of data points. We hence chose the approach described here, being aware of the introduction of a certain bias.</Paragraph>
      <Paragraph position="4"> As we have said, we are trying to link utterances, one a fragment, the other its antecedent. The notion of utterance is however less well-defined than one might expect, and the segmentation of continuous speech into utterances is a veritable research problem on its own (see e.g. (Traum and Heeman, 1997)). Often it is arguable whether a prepositional  Structural features dis distance a - b, in utterances sspk same speaker yes/no nspk number speaker changes (= # turns) iqu number of intervening questions alt a last utterance in its turn? bft b first utterance in its turn? Lexical / Utterance-based features bvb (tensed) verb present in b? bds disfluency present in b? aqm a contains question mark awh a contains wh word bpr ratio of polar particles (yes, no, maybe, etc..) / other in b apr ratio of polar particles in a lal length of a lbe length of b nra ratio nouns / non-nouns in a nra ratio nouns / non-nouns in b rab ratio nouns in b that also occur in a rap ratio words in b that also occur in a god google similarity (see text)  phrase for example should be analysed as an adjunct (and hence as not being an utterance on its own) or as a fragment. In our experiments, we have followed the decision made by the transcribers of the original corpus, since they had information (e.g. about pauses) which was not available to us.</Paragraph>
      <Paragraph position="5"> For the antecedent-task, we include only pairs where b (the second utterance in the pair) is a fragment--since the task is to identify an antecedent for already identified fragments. This results in a data-set with 1318 data points (i.e., we created on average 4 pairs per fragment). This data-set is sufficiently balanced between positives and negatives, and so we did not create another version of it. The data for the combined-task, however, is much bigger, as it contains pairs for all utterances. It consists of 26,340 pairs, i.e. a ratio of roughly 1:90. For this reason we also used balanced data-sets for training, where the ratio was adjusted to 1:25.</Paragraph>
    </Section>
    <Section position="3" start_page="249" end_page="249" type="sub_section">
      <SectionTitle>
3.3 Features
</SectionTitle>
      <Paragraph position="0"> Table 2 lists the features we have used to represent the utterances. (In this table, and in this section, we denote the candidate for being a fragment with b and the candidate for being b's antecedent with a.) We have defined a number of structural features, which give information about the (discourse)structural relation between a and b. The rationale behind choosing them should be clear; iqu for example indicates in a weak way whether there might have been a topic change, and high nspk should presumably make an antecedent relation between a and b less likely.</Paragraph>
      <Paragraph position="1"> We have also used some lexical or utterance-based features, which describe lexical properties of the individual utterances and lexical relations between them which could be relevant for the tasks.</Paragraph>
      <Paragraph position="2"> For example, the presence of a verb in b is presumably predictive for its being a fragment or not, as is the length. To capture a possible semantic relationship between the utterances, we defined two features. The more direct one, rab, looks at verbatim re-occurrences of nouns from a in b, which occur for example in check-questions as in (4) below.</Paragraph>
      <Paragraph position="3">  (4) A: I saw Peter.</Paragraph>
      <Paragraph position="4"> B: Peter? (= Who is this Peter you saw?)  Less direct semantic relations are intended to be captured by god, the second semantic feature we use.7 It is computed as follows: for each pair (x,y) of nouns from a and b, Google is called (via the Google API) with a query for x, for y, and for x and y together. The similarity then is the average ratio of pair vs. individual term:</Paragraph>
      <Paragraph position="6"> We now describe the experiments we performed and their results.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="249" end_page="250" type="metho">
    <SectionTitle>
4 Experiments and Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="249" end_page="250" type="sub_section">
      <SectionTitle>
4.1 Experimental Setup
</SectionTitle>
      <Paragraph position="0"> For the learning experiments, we used three classifiers on all data-sets for the the three tasks:  we have used in that it can make use of &amp;quot;set-valued&amp;quot; features, e.g. strings; we have run this learner both with only the features listed above and with the utterances (and POS-tags) as an additional feature.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="250" end_page="250" type="metho">
    <SectionTitle>
* TIMBL (Tilburg Memory-Based Learner),
</SectionTitle>
    <Paragraph position="0"> (Daelemans et al., 2003), which implements a memory-based learning algorithm (IB1) which predicts the class of a test data point by looking at its distance to all examples from the training data, using some distance metric. In our experiments, we have used the weighted-overlap method, which assigns weights to all features.</Paragraph>
    <Paragraph position="2"> maximum entropy modelling (Berger et al., 1996).</Paragraph>
    <Paragraph position="3"> In our experiments, we used L-BFGS parameter estimation. null We also implemented a na&amp;quot;ive bayes classifier and ran it on the fragment-task, with a data-set consisting only of the strings and POS-tags.</Paragraph>
    <Paragraph position="4"> To determine the contribution of all features, we used an iterative process similar to the one described in (Kohavi and John, 1997; Strube and M&amp;quot;uller, 2003): we start with training a model using a base-line set of features, and then add each remaining feature individually, recording the gain (w.r.t. the f-measure (f(0.5), to be precise)), and choosing the best-performing feature, incrementally until no further gain is recorded. All individual training- and evaluation-steps are performed using 8-fold cross-validation (given the small number of positive instances, more folds would have made the number of instances in the test set set too small).</Paragraph>
    <Paragraph position="5"> The baselines were as follows: for the fragmenttask, we used bvb and lbe as baseline, i.e. we let the classifier know the length of the candidate and whether the candidate contains a verb or not. For the antecedent-task we tested a very simple baseline, containing only of one feature, the distance between a and b (dis). The baseline for the combinedtask, finally, was a combination of those two baselines, i.e. bvb+lbe+dis. The full feature-set for the fragment-task was lbe, bvb, bpr, nrb, bft, bds (since for this task there was no a to compute features of), for the two other tasks it was the complete set shown in Table 2.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML