File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-1030_metho.xml
Size: 20,866 bytes
Last Modified: 2025-10-06 14:08:13
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1030"> <Title>Sentence Level Discourse Parsing using Syntactic and Lexical Information</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The Corpus </SectionTitle> <Paragraph position="0"> For the experiments described in this paper, we use a publicly available corpus (RST-DT, 2002) that contains 385 corpus comes conveniently partitioned into a Training set of 347 articles (6132 sentences) and a Test set of 38 articles (991 sentences). Each document in the corpus is paired with a discourse structure (tree) that was manually built in the style of Rhetorical Structure Theory (Mann and Thompson, 1988). (See (Carlson et al., 2003) for details concerning the corpus and the annotation process.) Out of the 385 articles in the corpus, 53 have been independently annotated by two human annotators. We used this doubly-annotated subset to compute human agreement on the task of discourse structure derivation. In our experiments we used as discourse structures only the discourse sub-trees spanning over individual sentences.</Paragraph> <Paragraph position="1"> Because the discourse structures had been built on top of sentences already associated with syntactic trees from the Penn Treebank, we were able to create a composite corpus which allowed us to perform an empirically driven syntax-discourse relationship study. This composite corpus was created by associating each sentence a2 in the discourse corpus with its corresponding Penn Treebank syntactic parse tree a2a4a3a6a5a8a7a10a9a6a11a12a7a14a13a14a11a12a15a17a16a19a18a20a18a22a21a23a2a20a24 and its corresponding sentence-level discourse tree a25a26a13a10a2a27a11a29a28a20a30a31a16a32a2a27a18a27a15a17a16a19a18a20a18a22a21a23a2a20a24 . Although human annotators were free to build their discourse structures without enforcing the existence of well-formed discourse sub-trees for each sentence, in about 95% of the cases in the (RST-DT, 2002) corpus, there exists a discourse sub-tree a25a33a13a10a2a27a11a29a28a20a30a34a16a19a2a27a18a27a15a17a16a19a18a20a18a6a21a35a2a20a24 associated with each sentence a2 . The remaining 5% of the sentences cannot be used in our approach, as no well-formed discourse tree can be associated with these sentences.</Paragraph> <Paragraph position="2"> Therefore, our Training section consists of a set of 5809 triples of the form a2a33a37a38a2a4a3a22a5a8a7a10a9a6a11a12a7a14a13a14a11a12a15a39a16a20a18a20a18a6a21a35a2a20a24a29a37a40a25a26a13a14a2a27a11a29a28a20a30a34a16a32a2a27a18a4a15a39a16a20a18a20a18a6a21a35a2a20a24a40a41 which are used to train the parameters of the statistical models. Our Test section consists of a set of 946 triples of a similar form, which are used to evaluate the performance of our discourse parser.</Paragraph> <Paragraph position="3"> The (RST-DT, 2002) corpus uses 110 different rhetorical relations. We found it useful to also compact these relations into classes, as described by Carlson et al. (2003), and operate with the resulting 18 labels as well (seen as coarser granularity rhetorical relations). Operating with different levels of granularity allows one to get deeper insight into the dif culties of assigning the appropriate rhetorical relation, if any, to two adjacent text spans.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The Discourse Segmenter </SectionTitle> <Paragraph position="0"> We break down the problem of building sentence-level discourse trees into two sub-problems: discourse segmentation and discourse parsing. Discourse segmentation is covered by this section, while discourse parsing is covered by Section 4.</Paragraph> <Paragraph position="1"> Discourse segmentation is the process in which a given text is broken into non-overlapping segments called elementary discourse units (edus). In the present work, elementary discourse units are taken to be clauses or clause-like units that are unequivocally the NUCLEUS or SATELLITE of a rhetorical relation that holds between two adjacent spans of text (see (Carlson et al., 2003) for details). Our approach to discourse segmentation breaks the problem further into two sub-problems: sentence segmentation and sentence-level discourse segmentation. The problem of sentence segmentation has been studied extensively, and tools such as those described by Palmer and Hearst (1997) and Ratnaparkhi (1998) can handle it well. In this section, we present a discourse segmentation algorithm that deals with segmenting sentences into elementary discourse units.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The Discourse Segmentation Model </SectionTitle> <Paragraph position="0"> The discourse segmenter proposed here takes as input a sentence and outputs its elementary discourse unit boundaries. Our statistical approach to sentence segmentation uses two components: a statistical model which assigns a probability to the insertion of a discourse boundary after each word in a sentence, and a segmenter, which uses the probabilities computed by the model for inserting discourse boundaries. We rst focus on the statistical model.</Paragraph> <Paragraph position="1"> A good model of discourse segmentation needs to account both for local interactions at the word level and for global interactions at more abstract levels. Consider, for example, the syntactic tree in Figure 2. According to our hypothesis, the discourse boundary inserted between the words says and it is best explained not by the words alone, but by the lexicalized syntactic structure [VP(says) [VBZ(says)a42 SBAR(will)]], signaled by the boxed nodes in Figure 2. Hence, we hypothesize that the discourse boundary in our example is best explained by the global interaction between the verb (the act of saying) and its clausal complement (what is being said).</Paragraph> <Paragraph position="3"> course boundaries depending on the lexical heads involved. null Given a sentence a2a44a43a46a45a39a47a12a45a49a48a51a50a52a50a4a50a10a45a49a53a54a50a4a50a52a50a55a45a49a56 , we rst nd the syntactic parse tree a7 of a2 . We used in our experiments both syntactic parse trees obtained using Charniak's parser (2000) and syntactic parse trees from the PennTree bank. Our statistical model assigns a segmenting probability a57a58a21a23a59 a53a40a60a45 a53 a37a55a7a55a24 for each word a45 a53 , where a59 a53a62a61a62a63 boundary, no-boundarya64 . Because our model is concerned with discourse segmentation at sentence level, we de ne a57a58a21 boundarya60a45 a56 a37a40a7a55a24a65a43a67a66 , i.e., the sentence boundary is always a discourse boundary as well.</Paragraph> <Paragraph position="4"> Our model uses both lexical and syntactic features for determining the probability of inserting discourse boundaries. We apply canonical lexical head projection rules (Magerman, 1995) in order to lexicalize syntactic trees. For each word a45 , the upper-most node with lexical head a45 which has a right sibling node determines the features on the basis of which we decide whether to insert a discourse boundary. We denote such node a68a70a69 , and the features we use are node a68a71a69 , its parent a68a73a72 , and the siblings of a68a74a69 . In the example in Figure 2, we determine whether to insert a discourse boundary after the word says using as features node a68a75a72a65a43a77a76a6a78a79a21a10a80a33a81a33a82a34a80a33a24 and its children a68 a69 a43a83a76a22a84a86a85a34a21a14a80a33a81a26a82a34a80a32a24 and a68a74a87a88a43a90a89a33a84a92a91a6a93a79a21a95a94a8a96a32a97a6a97a22a24 . We use our corpus to estimate the likelihood of inserting a discourse boundary between word a45 and the next word using formula (1),</Paragraph> <Paragraph position="6"> where the numerator represents all the counts of the rule a68 a72 a102a105a50a52a50a4a50a40a68 a69 a68a104a87a51a50a52a50a52a50 for which a discourse boundary has been inserted after word a45 , and the denominator represents all the counts of the rule.</Paragraph> <Paragraph position="7"> Because we want to account for boundaries that are motivated lexically as well, the counts used in formula (1) are de ned over lexicalized rules. Without lexicalization, the syntactic context alone is too general and fails to distinguish genuine cases of discourse boundaries from incorrect ones. As can be seen in Figure 3, the same syntactic context may indicate a discourse boundary when the lexical headspassedandwithoutare present, but it may not indicate a boundary when the lexical heads priced and at are present.</Paragraph> <Paragraph position="8"> The discourse segmentation model uses the corpus presented in Section 2 in order to estimate probabilities for inserting discourse boundaries using equation (1). We also use a simple interpolation method for smoothing lexicalized rules to accommodate data sparseness.</Paragraph> <Paragraph position="9"> Once we have the segmenting probabilities given by the statistical model, a straightforward algorithm is used to implement the segmenter. Given a syntactic tree a7 , the algorithm inserts a boundary after each word a45 for which a57a58a21 boundarya60a45a75a37a55a7a55a24a107a106a109a108a54a50a111a110 .</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The Discourse Parser </SectionTitle> <Paragraph position="0"> In the setting presented here, the input to the discourse parser is a Discourse Segmented Lexicalized Syntactic Tree (i.e., a lexicalized syntactic parse tree in which the discourse boundaries have been identi ed), henceforth called a DS-LST. An example of a DS-LST in the tree in Figure 2. The output of the discourse parser is a discourse parse tree, such as the one presented in Figure 1.</Paragraph> <Paragraph position="1"> As in other statistical approaches, we identify two components that perform the discourse parsing task. The rst component is the parsing model, which assigns a probability to every potential candidate parse tree. Formally, given a discourse tree a112a113a15 and a set of parameters a114 , the parsing model estimates the conditional probability a57a58a21a115a112a113a15 a60a114 a24 . The most likely parse is then given by formula (2).</Paragraph> <Paragraph position="3"> The second component is called the discourse parser, and it is an algorithm for nding a112a58a15 a116a119a118a10a120a119a121 . We rst focus on the parsing model.</Paragraph> <Paragraph position="4"> A discourse parse tree can be formally represented as a set of tuples. The discourse tree in Figure 1, for example, can be formally written as the set of tuples a63 ATTRIBUTION-SN[1,1,3]a37 ENABLEMENT-NS[2,2,3]a64 . A tuple is of the form a129a70a130a13a38a37a40a123a131a37a119a132a32a133 , and denotes a discourse relation a129 that holds between the discourse span that contains edus a13 through a123 , and the discourse span that contains edus a123a135a134a58a66 througha132 . Each relation a129 also signals explicitly the nuclearity assignment, which can be NUCLEUS-SATELLITE (NS), SATELLITE-NUCLEUS (SN), or NUCLEUS-NUCLEUS (NN). This notation assumes that all relations a129 are binary relations. The assumption is justi ed empirically: 99% of the nodes of the discourse trees in our corpus are binary nodes. Using only binary relations makes our discourse model easier to build and reason with.</Paragraph> <Paragraph position="5"> In what follows we make use of two functions: function a16a19a18a20a136 applied to a tuple a129a70a130a13a38a37a55a123a135a37a23a132a33a133 yields the discourse relation a129 ; function a25a6a2 applied to a tuple a129a70a130a13a38a37a55a123a135a37a23a132a33a133 yields the structure a130a13a38a37a55a123a135a37a23a132a33a133 . Given a set of adequate parameters a114 , our discourse model estimates the goodness of a discourse parse tree a112a58a15 using formula (3).</Paragraph> <Paragraph position="7"> For each tuple a11 a61 a112a58a15 , the probability a57a137a120 estimates the goodness of the structure of a11 . We expect these probabilities to prefer the hierarchical structure (1, (2, 3)) over ((1,2), 3) for the discourse tree in Figure 1. For each tuple a11 a61 a112a113a15 , the probability a57 a87 estimates the goodness of the discourse relation of a11 . We expect these probabilities to prefer the rhetorical relation ATTRIBUTION-NS over CONTRAST-NN for the relation between spans 1 and a130a145a54a37a40a146a19a133 in the discourse tree in Figure 1. The overall probability of a discourse tree is obtained multiplying the structural probabilities a57a62a120 and the relational probabilities a57a62a87 for all the tuples in the discourse tree.</Paragraph> <Paragraph position="8"> Our discourse model uses as a114 the information present in the input DS-LST. However, given such a tree a147a62a15 as input, one cannot estimate probabilities such as a57a58a21a115a112a113a15 a60 a147a62a15a44a24 without running into a severe sparseness problem. To overcome this, we map the input DS-LST into a more abstract representation that contains only the salient features of the DS-LST. This mapping leads to the notion of a dominance set over a discourse segmented lexicalized syntactic tree. In what follows, we de ne this notion and show that it provides adequate parameterization for the discourse parsing problem.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 The Dominance Set of a DS-LST </SectionTitle> <Paragraph position="0"> The dominance set of a DS-LST contains feature representations of a discourse segmented lexicalized syntactic tree. Each feature is a representation of the syntactic and lexical information that is found at the point where two edus are joined together in a DS-LST. Our hypothesis is that such attachment points in the structure of a DS-LST (the boxed nodes in the tree in Figure 4) carry the most indicative information with respect to the potential discourse tree we want to build. A set representation of the attachment points of a DS-LST is called the dominance set of a DS-LST.</Paragraph> <Paragraph position="1"> For each edu a148 we identify a word a45 in a148 as the head word of edu a148 and denote it a149 . a149 is de ned as the word with the highest occurrence as a lexical head in the lexicalized tree among all the words in a148 . The node in which a149 occurs highest is called the head node of edu a148 and is denoted a68a74a150 . The edu which has as head node the root of the DS-LST is called the exception edu. In our example, the head word for edu 2 is a149a90a43a109a94a8a96a33a97a22a97 , and its head node is a68 a150 a43a151a89a32a84a92a91a6a93a8a21a152a94a8a96a33a97a6a97a26a24 ; the head word for edu 3 is a149a153a43a155a154a54a156 , and its head node is a68 a150 a43a77a89a34a21a115a154a92a156a6a24 . The exception edu is edu 1.</Paragraph> <Paragraph position="2"> For each edu a148 which is not the exception edu, there exists a node which is the parent of the head node of a148 , and the lexical head of this node is guaranteed to belong to a different edu than a148 , call it a157 . We call this node the attachment node of a148 and denote it a68a71a158 . In our example, the attachment node of edu 2 is a68a71a158a159a43a160a76a6a78a8a21a14a80a33a81a26a82a34a80a32a24 , and its lexical head says belongs to edu 1; the attachment node of edu 3 is a68a74a158a161a43a162a76a6a78a8a21a152a163a79a80a33a164a22a24 , and its lexical headuse belongs to edu 2. We write formally that two edus a148 and a157 are linked through a head node a68a71a150 and an attachment node a68 a158 as a21a115a148a58a37a165a68 a150 a24a107a166a151a21a35a157a167a37a40a68 a158 a24 . The dominance set of a DS-LST is given by all the edu pairs linked through a head node and an attachment node in the DS-LST. Each element in the dominance set represents a dominance relationship between the edus involved. Figure 4 shows the dominance set a112 for our example DS-LST. We say that edu 2 is dominated by edu 1 (shortly written a145a58a166a168a66 ), and edu 3 is dominated by edu 2 (a146a74a166a169a145 ).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 The Discourse Model </SectionTitle> <Paragraph position="0"> Our discourse parsing model uses the dominance set a112 of a DS-LST as the conditioning parameter a114 in equation (3). The discourse parsing model we propose uses the dominance set a112 to compute the probability of a discourse parse tree a112a58a15 according to formula (4).</Paragraph> <Paragraph position="2"> Different projections of a112 are used to accurately estimate the structure probabilities a57a62a120 and the relation probabilities a57a173a87 associated with a tuple in a discourse tree. The projection functions a172 a13a14a136a95a7a10a18a27a16a20a120 and a172 a13a14a136a95a7a10a18a27a16a4a87 ensure that, for each tuple a11 a61 a112a113a15 , only the information in a112 relevant to a11 is to be conditioned upon. In the case of a57a51a120 (the probability of the structure a130a13a38a37a40a123a131a37a119a132a32a133 ), we lter out the lexical heads and keep only the syntactic labels; also, we lter out all the elements of a112 which do not have at least one edu inside the span of a11 . In our running example, for instance, for a11a39a43 ENABLEMENT-NSa130a145a92a37a165a145a54a37a40a146a32a133 , a172 a13a14a136a95a7a10a18a27a16 a120 a21a115a11a19a37a165a112a170a24a99a43 a63 a21a23a145a92a37a38a147a137a174a104a175a44a129a73a24a74a166a83a21a10a66a26a37a165a176a177a57a75a24a29a37a4a21a115a146a86a37a165a147a51a24a71a166a178a21a35a145a54a37a165a176a75a57a75a24a12a64 . The span of a11 is a130a145a92a37a165a146a19a133 , and set a112 has two elements involving edus The bank also says</Paragraph> <Paragraph position="4"> from it, namely the dominance relationships a145a161a166a179a66 and a146a70a166a109a145 . To decide the appropriate structure, a172 a13a14a136a95a7a10a18a27a16a19a120 keeps them both; this is because a different dominance relationship between edus 1 and 2, namely a66a128a166a101a145 , would most likely in uence the structure probability of a11 .</Paragraph> <Paragraph position="5"> In the case of a57a144a87 (the probability of the relation a129 ), we keep both the lexical heads and the syntactic labels, but lter out the edu identi ers (clearly, the relation between two spans does not depend on the positions of the spans involved); also, we lter out all the elements of a112 whose dominance relationship does not hold across the two sub-spans of a11 . In our running example, for a11a44a43 ENABLEMENT-NSa130a145a54a37a165a145a92a37a165a146a19a133 , a172 a13a14a136a95a7a10a18a27a16a27a87a33a21a115a11a19a37a165a112a170a24a99a43 a63 a147a17a21a95a7a10a28a32a24a74a166a180a176a75a57a58a21a115a30a79a2a27a18a19a24a38a64 . The two sub-spans of a11 are a130a145a54a37a165a145a20a133 and a130a146a54a37a165a146a19a133 , and only the dominance relationship a146a168a166a181a145 holds across these spans; the other dominance relationship in a112 , a145a182a166a179a66 , does not in uence the choice for the relation label of a11 .</Paragraph> <Paragraph position="6"> The conditional probabilities involved in equation (4) are estimated from the training corpus using maximum likelihood estimation. A simple interpolation method is used for smoothing to accommodate data sparseness. The counts for the dependency sets are also smoothed using symbolic names for the edu identi ers and accounting only for the distance between them.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 The Discourse Parser </SectionTitle> <Paragraph position="0"> Our discourse parser implements a classical bottom-up algorithm. The parser searches through the space of all legal discourse parse trees and uses a dynamic programming algorithm. If two constituents are derived for the same discourse span, then the constituent for which the model assigns a lower probability can be safely discarded. null Figure 5 shows a discourse structure created in a bottom-up manner for the DS-LST in Figure 2. Tuple ENABLEMENT-NS[2,2,3] has a score of 0.40, obtained as the product between the structure probability a57 a120 of 0.47 and the relation probability a57 a87 of 0.88. Tuple ATTRIBUTION-SN[1,1,3] has a score of 0.37 for the structure, and a score of 0.009 for the relation. The nal score for the entire discourse structure is 0.001. All probabilities used were estimated from our training corpus. According to our discourse model, the discourse structure in Figure 5 is the most likely among all the legal discourse structures for our example sentence.</Paragraph> </Section> </Section> class="xml-element"></Paper>