File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/j00-3005_abstr.xml
Size: 9,173 bytes
Last Modified: 2025-10-06 13:41:41
<?xml version="1.0" standalone="yes"?> <Paper uid="J00-3005"> <Title>Marcu Rhetorical Parsing of Unrestricted Texts</Title> <Section position="2" start_page="396" end_page="398" type="abstr"> <SectionTitle> 2. Foundation </SectionTitle> <Paragraph position="0"> The hypothesis that underlies this work is that connectives, cohesion, shallow processing, and a well-constrained mathematical model of valid rhetorical structure trees (RS-trees) can be used to implement algorithms that determine * the elementary units of a text, i.e., the units that constitute the leaves of the RS-tree of that text; * the rhetorical relations that hold between elementary units and between spans of text; * the relative importance (nucleus or satellite) and the size of the spans subsumed by these rhetorical relations.</Paragraph> <Paragraph position="1"> In what follows, I examine each facet of this hypothesis intuitively and explain how it contributes to the derivation of a rhetorical parsing algorithm, i.e., an algorithm that takes as input free, unrestricted text and that determines its valid RS-trees. For each facet, I consider first the arguments that support the hypothesis and then discuss potential difficulties.</Paragraph> <Section position="1" start_page="396" end_page="397" type="sub_section"> <SectionTitle> 2.1 Determining the Elementary Units Using Connectives and Shallow Processing </SectionTitle> <Paragraph position="0"> 2.1.1 Pro Arguments. Recent developments in the linguistics of punctuation (Nunberg 1990; Briscoe 1996; Pascual and Virbel 1996; Say and Akman 1996; Shiuan and Ann 1996) have emphasized the role that punctuation can have in solving a variety of natural language processing tasks ranging from syntactic parsing to information 1 In this paper, I use the terms connective and cue phrase interchangeably. And I use the term discourse marker to refer to a connective that has a discourse function, i.e., a connective that signals a rhetorical relation that holds between two text spans.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 26, Number 3 packaging. For example, if a sentence consists of three arguments separated by semicolons, it is likely that one can determine the boundaries of these arguments without relying on sophisticated forms of syntactic analysis. Shallow processing is sufficient to recognize the occurrences of the semicolons and to break the sentence into three elementary units.</Paragraph> <Paragraph position="2"> In a corpus study (described in Section 3), I noticed that in most of the cases in which a connective such as Although occurred at the beginning of a sentence, it marked the left boundary of an elementary unit whose right boundary was given by the first subsequent occurrence of a comma. Hence, it is likely that by using only shallow techniques and knowledge about connectives, one can determine, for example, that the elementary units of sentence (2) are those enclosed within square brackets.</Paragraph> <Paragraph position="3"> (2) \[Although Brooklyn College does not yet have a junior-year-abroad program,\] \[a good number of students spend summers in Europe.\] 2.1.2 Difficulties. Obviously, by relying only on orthography, connectives, and shallow processing it is unlikely that one will be capable of correctly determining all elementary units of an RS-tree. It may very well be the case that knowledge about how Although is used in texts can be exploited to determine the elementary units of texts, but not all connectives are used as consistently as Although is. Just consider, for instance, the highly ambiguous connective and. In some cases, and plays a sentential, syntactic role, while in others, it plays a discourse role, i.e., it signals a rhetorical relation that holds between two textual units. For example, in sentence (3), the first and is sentential, i.e., it makes a semantic contribution to the interpretation of the complex noun phrase &quot;John and Mary&quot;, while the second and has a discourse function, i.e., it signals a rhetorical relation of SEQUENCE that holds between the units enclosed within square brackets.</Paragraph> <Paragraph position="4"> (3) \[John and Mary went to the theatre\] \[and saw a nice play.\] If a system is to use connectives to determine elementary unit boundaries, it would need to figure out that a boundary is required before the second occurrence of and (the occurrence that has a discourse function), but not before the first occurrence. It seems clear that shallow processing is insufficient to properly solve this problem. It remains an open question, however, to what degree shallow processing and knowledge about connectives can be successfully used to determine the elementary units of texts. Our results show (see Section 4), that using only such lean knowledge resources, elementary unit boundaries can be determined with approximately 80% accuracy.</Paragraph> </Section> <Section position="2" start_page="397" end_page="398" type="sub_section"> <SectionTitle> 2.2 Determining Rhetorical Relations Using Connectives </SectionTitle> <Paragraph position="0"> Linguistic and psycholinguistic research has shown that connectives are consistently used by humans both as cohesive ties between adjacent clauses and sentences (Halliday and Hasan 1976) and as &quot;macroconnectors&quot; that signal relations that hold between large textual units. For example, in stories, connectives such as so, but, and and mark boundaries between story parts (Kintsch 1977). In naturally occurring conversations, so marks the terminal point of a main discourse unit and a potential transition in a participant's turn, whereas and coordinates idea units and continues a speaker's action (Schiffrin 1987). In narratives,</Paragraph> </Section> <Section position="3" start_page="398" end_page="398" type="sub_section"> <SectionTitle> Marcu Rhetorical Parsing of Unrestricted Texts </SectionTitle> <Paragraph position="0"> connectives signal structural relations between elements and are crucial for the understanding of the stories (Segal and Duchan 1997). In general, cue phrases are used consistently by both speakers and writers to highlight the most important shifts in their narratives, mark intermediate breaks, and signal areas of topical continuity (Bestgen and Costermans 1997; Schneuwly 1997). Therefore, it is likely that connectives can be used to determine rhetorical relations that hold both between elementary units and between large spans of text.</Paragraph> <Paragraph position="1"> The number of discourse markers in a typical text--approximately one marker for every two clauses (Redeker 1990)--is sufficiently large to enable the derivation of rich rhetorical structures for texts. 2 More importantly, the absence of markers correlates with a preference of readers to interpret the unmarked textual units as continuations of the topics of the units that precede them (Segal, Duchan, and Scott 1991).</Paragraph> <Paragraph position="2"> Hence, when there is no connective between two sentences, for example, it is likely that the second sentence elaborates on the first.</Paragraph> <Paragraph position="3"> 2.2.2 Difficulties. The above arguments tell us that connectives are used often and that they signal relations that hold both between elementary units and large spans of texts. Hence, previous research tells us only that connectives are potentially useful in determining the rhetorical structure of texts. Unfortunately, they cannot be used straightforwardly because they are ambiguous.</Paragraph> <Paragraph position="4"> * In some cases, connectives have a sentential function, while in other cases, they have a discourse function. Unless we can determine when a connective has a discourse function, we cannot use connectives to hypothesize rhetorical relations.</Paragraph> <Paragraph position="5"> * Connectives do not explicitly signal the size of the textual spans that they relate.</Paragraph> <Paragraph position="6"> * Connectives can signal more than one rhetorical relation. That is, there is no one-to-one mapping between the use of connectives and the rhetorical relations that they signal.</Paragraph> <Paragraph position="7"> I address each of these three problems in turn.</Paragraph> <Paragraph position="8"> Sentential and Discourse Uses of Connectives. Empirical studies on the disambiguation of cue phrases (Hirschberg and Litman 1993) have shown that just by considering the orthographic environment in which they occur, one can distinguish between sentential and discourse uses in about 80% of cases and that these results can be improved with machine learning techniques (Litman 1996) or genetic algorithms (Siegel and McKeown 1994). I have taken Hirschberg and Litman's research one step further and designed a comprehensive corpus analysis of cue phrases that enabled me to design algorithms that improved their results and coverage. The corpus analysis is discussed in Section 3. The algorithm that determines elementary unit boundaries and identifies discourse uses of cue phrases is discussed in Section 4.</Paragraph> </Section> </Section> class="xml-element"></Paper>