File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0613_metho.xml
Size: 16,880 bytes
Last Modified: 2025-10-06 14:09:54
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0613"> <Title>Probabilistic Head-Driven Parsing for Discourse Structure</Title> <Section position="4" start_page="96" end_page="157" type="metho"> <SectionTitle> 2 Segmented Discourse Representation </SectionTitle> <Paragraph position="0"> Theory SDRT extends prior work in dynamic semantics (e.g., van Eijk and Kamp (1997)) via logical forms that feature rhetorical relations. The logical forms consist of speech act discourse referents which label content (either of a clause or of text segments). Rhetorical relations such as Explanation relate these referents. The resulting structures are called segmented discourse representation structures or SDRSs. An SDRS for the dialogue in Figure 1 is given in Figure 2; we have used the numbers of the elementary utterances from Redwoods as the speech act discourse referents but have omitted their labelled logical forms. Note that utterances 151 and 152, which do not contribute to the truth conditions of the dialogue, are absent we return to this shortly. There are several things to note about this SDRS.</Paragraph> <Paragraph position="1"> First, SDRT's dynamic semantics of rhetorical relations imposes constraints on the contents of its arguments. For example, Plan-Elab(150,h1) (standing for Plan-Elaboration) means that h1 provides information from which the speaker of 150 can elaborate a plan to achieve their communicative goal (to meet for two hours in the next couple of weeks). The relation Plan-Elab contrasts with Plan-Correction, which would relate the utterances in dialogue (1): (1) a. A: Can we meet at the weekend? b. B: I'm afraid I'm busy then.</Paragraph> <Paragraph position="2"> Plan-Correction holds when the content of the second utterance in the relation indicates that its communicative goals con ict with those of the rst one. In this case, A indicates he wants to meet next weekend, and B indicates that he does not (note that then resolves to the weekend). Utterances (1ab) would also be related with IQAP (Indirect Question Answer Pair): this means that (1b) provides suf cient information for the questioner A to infer a direct answer to his question (Asher and Lascarides, 2003).</Paragraph> <Paragraph position="3"> The relation Elaboration(153,h2) in Figure 2 means that the segment 154 to 155 resolves to a proposition which elaborates part of the content of the proposition 153. Therefore the twenty sixth in 154 resolves to the twenty sixth of July any other interpretation contradicts the truth conditional consequences of Elaboration. Alternation(154,155) has truth conditions similar to (dynamic) disjunction. Continuation(156,157) means that 156 and 157 have a common topic (here, this amounts to a proposition about when CAE is unavailable to meet).</Paragraph> <Paragraph position="4"> The second thing to note about Figure 2 is how one rhetorical relation can outscope another: this creates a hierarchical segmentation of the discourse. For example, the second argument to the Elaboration relation is the label h2 of the Alternationsegment relating 154 to 155. Due to the semantics of Elaboration and Alternation, this ensures that the dialogue entails that one of 154 or 155 is true, but it does not entail 154, nor 155.</Paragraph> <Paragraph position="5"> Finally, observe that SDRT allows for a situation where an utterance connects to more than one subsequent utterance, as shown here with</Paragraph> <Paragraph position="7"> fact, SDRT also allows two utterances to be related by multiple relations (see (1)) and it allows an utterance to rhetorically connect to multiple utterances in the context. These three features of SDRT capture the fact that an utterance can make more than one illocutionary contribution to the discourse. An example of the latter kind of structure is given in (2): (2) a. A: Shall we meet on Wednesday? b. A: How about one pm? c. B: Would one thirty be OK with you? The SDRS for this dialogue would feature the relations Plan-Correction(2b,2c), IQAP(2b,2c) and Q-Elab(2a,2c). Q-Elab, or Question-Elaboration, always takes a question as its second argument; any answers to the question must elaborate a plan to achieve the communicative goal underlying the rst argument to the relation. From a logical perspective, recognising Plan-Correction(2b,2c) and</Paragraph> <Paragraph position="9"> irr ind from Figure 1 in tree form.</Paragraph> </Section> <Section position="5" start_page="157" end_page="157" type="metho"> <SectionTitle> 3 Augmenting the Redwoods treebank </SectionTitle> <Paragraph position="0"> with discourse structures Our starting point is to create training material for probabilistic discourse parsers. For this, we have augmented dialogues from the Redwoods Treebank (Oepen et al., 2002) with their analyses within a fragment of SDRT (Baldridge and Lascarides, 2005). This is a very different effort from that being pursued for the Penn Discourse Treebank (Miltsakaki et al., 2004), which uses discourse connectives rather than abstract rhetorical relations like those in SDRT in order to provide theory neutral annotations.</Paragraph> <Paragraph position="1"> Our goal is instead to leverage the power of the semantics provided by SDRT's relations, and in particular to do so for dialogue as opposed to monologue. Because the SDRS-representation scheme, as shown in Figure 2, uses graph structures that do not conform to tree constraints, it cannot be combined directly with statistical techniques from sentential parsing. We have therefore designed a headed tree encoding of SDRSs, which can be straightforwardly modeled with standard parsing techniques and from which SDRSs can be recovered.</Paragraph> <Paragraph position="2"> For instance, the tree for the dialogue in Figure 1 is given in Figure 3. The SDRS in Figure 2 is recovered automatically from it. In this tree, utterances are leaves which are immediately dominated by their tag, indicating either the sentence mood (indicative, interrogative or imperative) or that it is irrelevant, a pause or a pleasantry (e.g., hello), annotated as pls.</Paragraph> <Paragraph position="3"> Each non-terminal node has a unique head daughter: this is either a Segment node, Pass node, or a leaf utterance tagged with its sentence mood. Non-terminal nodes may in addition have any number of daughter irr, pause and pls nodes, and an additional daughter labelled with a rhetorical relation.</Paragraph> <Paragraph position="4"> The notion of headedness has no status in the semantics of SDRSs themselves. The heads of these discourse trees are not like verbal heads with sub-categorization requirements in syntax; here, they are nothing more than the left argument of a rhetorical relation, like 154 in Alternation(154,155). Nonetheless, de ning one of the arguments of rhetorical relations as a head serves two main purposes. First, it enables a fully deterministic algorithm for recovering SDRSs from these trees. Second, it is also crucial for creating probabilistic head-driven parsing models for discourse structure.</Paragraph> <Paragraph position="5"> Segment and Pass are non-rhetorical node types.</Paragraph> <Paragraph position="6"> The former explicitly groups multiple utterances.</Paragraph> <Paragraph position="7"> The latter allows its head daughter to enter into relations with segments higher in the tree. This allows us to represent situations where an utterance attaches to more than one subsequent utterance, such as 153 in dialogue (1). Annotators manually annotate the rhetorical relation, Segment and Pass nodes and determine their daughters. They also tag the individual utterances with one of the three sentence moods or irr, pause or pls. The labels for segments (e.g., h0 and h1 in Figure 3) are added automatically. Nonveridical relations such as Alternation also introduce segment labels on their parents; e.g., h2 in Figure 3.</Paragraph> <Paragraph position="8"> The SDRS is automatically recovered from this tree representation as follows. First, each relation node generates a rhetorical connection in the SDRS: its rst argument is the discourse referent of its parent's head daughter, and the second is the discourse referent of the node itself (which unless stated otherwise is its head daughter's discourse referent). For example, the structure in Figure 3 yields Request-Elab(149,150), Alternation(154,155) and Elaboration(153,h2). The labels for the relations in the SDRS which determine segmentation must also be recovered. This is easily done: any node which has a segment label introduces an outscopes relation between that and the discourse referents of the node's daughters. This produces, for example, outscopes(h0,149), outscopes(h1,153) and outscopes(h2,154). It is straightforward to determine the labels of all the rhetorical relations from these conditions. Utterances such as 151 and 152, which are attached with pause and irr to indicate that they have no overall truth conditional effect on the dialogue, are ignored when constructing the SDRS, so SDRT does not assign these terms any semantics.</Paragraph> <Paragraph position="9"> Overall, this algorithm generates the SDRS in Figure 2 from the tree in Figure 3.</Paragraph> <Paragraph position="10"> Thus far, 70 dialogues have been annotated and reviewed to create our gold-standard corpus. On average, these dialogues have 237.5 words, 31.5 utterances, and 8.9 speaker turns. In all, there are 30 different rhetorical relations in the inventory for this annotation task, and 6 types of tags for the utterances themselves: ind, int, imp, pause, irr and pls.</Paragraph> <Paragraph position="11"> Finally, we annotated all 6,000 utterances in the Verbmobil portion of Redwoods with the following: whether the time mentioned (if there is one) is a good time to meet (e.g., I'm free then or Shall we meet at 2pm?) or a bad time to meet (e.g., I'm busy then or Let's avoid meeting at the weekend). These are used as features in our model of discourse structure (see Section 5). We use these so as to minimise using directly detailed features from the utterances themselves (e.g. the fact that the utterance contains the word free or busy, or that it contains a negation), which would lead to sparse data problems given the size of our training corpus. We ultimately aim to learn good-time and bad-time from sentence-level features extracted from the 6,000 Redwoods analyses, but we leave this to future work.</Paragraph> </Section> <Section position="6" start_page="157" end_page="157" type="metho"> <SectionTitle> 4 Generative parsing models </SectionTitle> <Paragraph position="0"> There is a signi cant body of work on probabilistic parsing, especially that dealing with the English sentences found in the annotated Penn Treebank. One of the most important developments in this work is that of Collins (2003). Collins created several lexicalised head-driven generative parsing models that incorporate varying levels of structural information, such as distance features, the complement/adjunct distinction, subcategorization and gaps. These models are attractive for constructing our discourse trees, which contain heads that establish non-local dependencies in a manner similar to that in syntactic parsing. Also, the co-dependent tasks of determining segmentation and choosing the rhetorical connections are both heavily in uenced by the content of the utterances/segments which are being considered, and lexicalisation allows the model to probabilistically relate such utterances/segments very directly. Probabilistic Context Free Grammars (PCFGs) determine the conditional probability of a right-hand side of a rule given the left-hand side, P(RHSjLHS). Collins instead decomposes the calculation of such probabilities by rst generating a head and then generating its left and right modi ers independently. In a supervised setting, doing this gathers a much larger set of rules from a set of labelled data than a standard PCFG, which learns only rules that are directly observed.1 The decomposition of a rule begins by noting that rules in a lexicalised PCFG have the form:</Paragraph> <Paragraph position="2"> where h is the head word, H(h) is the label of the head constituent, P(h) is its parent, and Li(li) and Ri(ri) are the n left and m right modi ers, respectively. It is also necessary to include STOP symbols Ln+1 and Rm+1 on either side to allow the Markov process to properly model the sequences of modi ers. By assuming these modi ers are generated independently of each other but are dependent on the head and its parent, the probability of such expansions can be calculated as follows (where Ph, Pl and Pr are the probabilities for the head, leftmodi ers and right-modi ers respectively):</Paragraph> <Paragraph position="4"> This provides the simplest of models. More conditioning information can of course be added from any structure which has already been generated. For example, Collins' model 1 adds a distance feature that indicates whether the head and modi er it is generating are adjacent and whether a verb is in the string between the head and the modi er.</Paragraph> <Paragraph position="5"> In Section 3, we outlined how SDRSs can be represented as headed trees. This allows us to create parsing models for discourse that are directly inspired by those described in the previous section. These models are well suited for our discourse parsing task. They are lexicalised, so there is a clear place in the discourse model for incorporating features from utterances: simply replace lexical heads with whole utterances, and exploit features from those utterances in discourse parsing in the same manner as lexical features are used in sentential parsing.</Paragraph> <Paragraph position="6"> Discourse trees contain a much wider variety of kinds of information than syntactic trees. The leaves of these trees are sentences with full syntactic and semantic analyses, rather than words. Furthermore, each dialogue has two speakers, and speaker style can change dramatically from dialogue to dialogue.</Paragraph> <Paragraph position="7"> Nonetheless, the task is also more constrained in that there are fewer overall constituent labels, there are only a few labels which can act as heads, and trees are essentially binary branching apart from constituents containing ignorable utterances.</Paragraph> <Paragraph position="8"> The basic features we use are very similar to those for the syntactic parsing model given in the previous section. The feature P is the parent label that is the starting point for generating the head and its modiers. H is the label of the head constituent. The tag t is also used, except that rather than being a part-ofspeech, it is either a sentence mood label (ind, int, or imp) or an ignorable label (irr, pls, or pause). The word feature w in our model is the rst discourse cue phrase present in the utterance.2 In the absence of a cue phrase, w is the empty string. The distance feature [?] is true if the modi er being generated is adjacent to the head and false otherwise. To incorporate a larger context into the conditioning information, we also utilize a feature HCR, which encodes the child relation of a node's head.</Paragraph> <Paragraph position="9"> We have two features that are particular to dialogue. The rst ST, indicates whether the head utterance of a segment starts a turn or not. The other, TC, encodes the number of turn changes within a segment with one of the values 0, 1, or , 2.</Paragraph> <Paragraph position="10"> Finally, we use the good/bad-time annotations discussed in Section 3 for a feature TM indicating Head features Modi er features P t w HCR ST TC TM P t w H [?] HCR ST TC TM Model 1 check check check check check check check check Model 2 check check check check check check check check check check check check Model 3 check check check check check check check check check check check check check check Model 4 check check check check check check check check check check check check check check check check Figure 4: The features active for determining the head and modi er probabilities in each of the four models. one of the following values for the head utterance of a segment: good time, bad time, neither, or both.</Paragraph> <Paragraph position="11"> With these features, we create the four models given in Figure 4. As example feature values, consider the Segment node labelled h1 in Figure 3. Here, the features have as values: P=Segment, H=Pass, t=ind (the tag of utterance 153), w=Actually (see 153 in Figure 1), HCR=Elaboration, ST=false, TC=0, and TM=good time.</Paragraph> <Paragraph position="12"> As is standard, linear interpolation with back-off levels of decreasing speci city is used for smoothing. Weights for the levels are determined as in Collins (2003).</Paragraph> </Section> class="xml-element"></Paper>