File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1013_metho.xml
Size: 23,085 bytes
Last Modified: 2025-10-06 14:14:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1013"> <Title>The Rhetorical Parsing of Natural Language Texts</Title> <Section position="4" start_page="97" end_page="248" type="metho"> <SectionTitle> 3 A corpus analysis of discourse markers </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="97" end_page="97" type="sub_section"> <SectionTitle> 3.1 Materials </SectionTitle> <Paragraph position="0"> We used previous work on cue phrases (Halliday and Hasan, 1976; Grosz and Sidner, 1986; Martin, 1992; Hirschberg and Litman, 1993; Knott, 1995; Fraser, 1996) to create an initial set of more than 450 potential discourse markers. For each potential discourse marker, we then used an automatic procedure that extracted from the Brown corpus a set of text fragments. Each text fragment contained a &quot;window&quot; of approximately 200 words and an emphasized occurrence of a marker. On average, we randomly selected approximately 19 text fragments per marker, having few texts for the markers that do not occur very often in the corpus and up to 60 text fragments for markers such as and, which we considered to be highly ambiguous. Overall, we randomly selected more than 7900 texts.</Paragraph> <Paragraph position="1"> All the text fragments associated with a potential cue phrase were paired with a set of slots in which an analyst described the following. 1. The orthographic environment that characterizes the usage of the potential discourse marker. This included occurrences of periods, commas, colons, semicolons, etc. 2. The type of usage: Sentential, Discourse, or Both. 3. The position of the marker in the textual unit to which it belonged: Beginning, Medial, or End. 4. The right boundary of the textual unit associated with the marker. 5. The relative position of the textual unit that the unit containing the marker was connected to: Before or After. 6. The rhetorical relations that the cue phrase signaled. 7. The textual types of the units connected by the discourse marker: from Clause to Multiple_Paragraph. 8. The rhetorical status of each textual unit involved in the relation: Nucleus or Satellite. The algorithms described in this paper rely on the results derived from the analysis of 1600 of the 7900 text fragments.</Paragraph> </Section> <Section position="2" start_page="97" end_page="97" type="sub_section"> <SectionTitle> 3.2 Procedure </SectionTitle> <Paragraph position="0"> After the slots for each text fragment were filled, the results were automatically exported into a relational database. The database was then examined semi-automatically with the purpose of deriving procedures that a shallow analyzer could use to identify discourse usages of cue phrases, break sentences into clauses, and hypothesize rhetorical relations between textual units.</Paragraph> <Paragraph position="1"> For each discourse usage of a cue phrase, we derived the following: * A regular expression that contains an unambiguous cue phrase instantiation and its orthographic environment. A cue phrase is assigned a regular expression if, in the corpus, it has a discourse usage in most of its occurrences and if a shallow analyzer can detect it and the boundaries of the textual units that it connects. For example, the regular expression &quot;\[,\] although&quot; identifies such a discourse usage.</Paragraph> <Paragraph position="2"> * A procedure that can be used by a shallow analyzer to determine the boundaries of the textual unit to which the cue phrase belongs. For example, the procedure associated with &quot;\[,\] although&quot; instructs the analyzer that the textual unit that pertains to this cue phrase starts at the marker and ends at the end of the sentence or at a position to be determined by the procedure associated with the subsequent discourse marker that occurs in that sentence.</Paragraph> <Paragraph position="3"> * A procedure that can be used by a shallow analyzer to hypothesize the sizes of the textual units that the cue phrase relates and the rhetorical relations that may hold between these units. For example, the procedure associated with &quot;\[,\] although&quot; will hypothesize that there exists a CONCESSION between the clause to which it belongs and the clause(s) that went before in the same sentence. For most markers this procedure makes disjunctive hypotheses of the kind shown in (2) above.</Paragraph> </Section> <Section position="3" start_page="97" end_page="248" type="sub_section"> <SectionTitle> 3.3 Results </SectionTitle> <Paragraph position="0"> At the time of writing, we have identified 1253 occurrences of cue phrases that exhibit discourse usages and associated with each of them procedures that instruct a shallow analyzer how the surrounding text should be broken into textual units. This information is used by an algorithm that concurrently identifies discourse usages of cue phrases and determines the clauses that a text is made of. The algorithm examines a text sentence by sentence and determines a set of potential discourse markers that occur in each sentence, It then applies left to fight the procedures that are associated with each potential marker.</Paragraph> <Paragraph position="1"> These procedures have the following possible effects: * They can cause an immediate breaking of the current sentence into clauses. For example, when an &quot;\[,\] although&quot; marker is found, a new clause, whose right boundary is just before the occurrence of the marker, is created. The algorithm is then recursively applied on the text that is found between the occurrence of&quot;\[,\] although&quot; and the end of the sentence.</Paragraph> <Paragraph position="2"> * They can cause the setting of a flag. For example, when an &quot;Although &quot; marker is found, a flag is set to instruct the analyzer to break the current sentence at the first occurrence of a comma.</Paragraph> <Paragraph position="3"> * They can cause a cue phrase to be identified as having a discourse usage. For example, when the cue phrase &quot;Although&quot; is identified, it is also assigned a discourse usage. The decision of whether a cue phrase is considered to have a discourse usage is sometimes based on the context in which that phrase occurs, i.e., it depends on the occurrence of other cue phrases. For example, an &quot;and&quot; will not be assigned a discourse usage in most of the cases; however, when it occurs in conjunction with &quot;although&quot;, i.e., &quot;and although&quot;, it will be assigned such a role.</Paragraph> <Paragraph position="4"> The most important criterion for using a cue phrase in the marker identification procedure is that the cue phrase (together with its orthographic neighborhood) is used as a discourse marker in at least 90% of the examples that were extracted from the corpus. The enforcement of this criterion reduces on one hand the recall of the discourse markers that can be detected, but on the other hand, increases significantly the precision. We chose this deliberately because, during the corpus analysis, we noticed that most of the markers that connect large textual units can be identified by a shallow analyzer. In fact, the discourse marker that is responsible for most of our algorithm recall failures is and. Since a shallow analyzer cannot identify with sufficient precision whether an occurrence of and has a discourse or a sentential usage, most of its occurrences are therefore ignored. It is true that, in this way, the discourse structures that we build lose some potential finer granularity, but fortunately, from a rhetorical analysis perspective, the loss has insignificant global repercussions: the vast majority of the relations that we miss due to recall failures of and are JOINT and SEQUENCE relations that hold between adjacent clauses.</Paragraph> <Paragraph position="5"> Evaluation. To evaluate our algorithm, we randomly selected three texts, each belonging to a different genre: 1. an expository text of 5036 words from Scientific American; 2. a magazine article of 1588 words from 7~me; 3. a narration of 583 words from the Brown Corpus.</Paragraph> <Paragraph position="6"> Three independent judges, graduate students in computational linguistics, broke the texts into clauses. The judges were given no instructions about the criteria that they had to apply in order to determine the clause boundaries; rather, they were supposed to rely on their intuition and preferred definition of clause. The locations in texts that were labelled as clause boundaries by at least two of the three judges were considered to be &quot;valid clause boundaries&quot;. We used the valid clause boundaries assigned by judges as indicators of discourse usages of cue phrases and we determined manually the cue phrases that signalled a discourse relation. For example, if an &quot;and&quot; was used in a sentence and if the judges agreed that a clause boundary existed just before the &quot;and&quot;, we assigned that &quot;and&quot; a discourse usage. Otherwise, we assigned it a sentential usage. Hence, we manually determined all discourse usages of cue phrases and all discourse boundaries between elementary units.</Paragraph> <Paragraph position="7"> We then applied our marker and clause identification algorithm on the same texts. Our algorithm found 80.8% of the discourse markers with a precision of 89.5% (see INPUT: a text T.</Paragraph> <Paragraph position="8"> 1. Determine the set D of all discourse markers and the set Ur of elementary textual units in T.</Paragraph> <Paragraph position="9"> 2. Hypothesize a set of relations R between the elements of Ur.</Paragraph> <Paragraph position="10"> 3. Use a constraint satisfaction procedure to determine all the discourse trees of T.</Paragraph> <Paragraph position="11"> 4. Assign a weight to each of the discourse trees and determine the tree(s) with maximal weight.</Paragraph> <Paragraph position="12"> table 1), a result that outperforms Hirschberg and Litman's (1993). The same algorithm identified correctly 81.3 % of the clause boundaries, with a precision of 90.3 % (see table 2). We are not aware of any surface-form-based algorithms that achieve similar results.</Paragraph> </Section> </Section> <Section position="5" start_page="248" end_page="248" type="metho"> <SectionTitle> 4 Building up discourse trees </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="248" end_page="248" type="sub_section"> <SectionTitle> 4.1 The rhetorical parsing algorithm </SectionTitle> <Paragraph position="0"> The rhetorical parsing algorithm is outlined in figure 2.</Paragraph> <Paragraph position="1"> In the first step, the marker and clause identification algorithm is applied. Once the textual units are determined, the rhetorical parser uses the procedures derived from the corpus analysis to hypothesize rhetorical relations between the textual units. A constraint-satisfaction procedure similar to that described in (Marcu, 1996) then determines all the valid discourse trees (see (Marcu, 1997) for details). The rhetorical parsing algorithm has been fully implemented in C++.</Paragraph> <Paragraph position="2"> Discourse is ambiguous the same way sentences are: more than one discourse structure is usually produced for a text. In our experiments, we noticed, at least for English, that the &quot;best&quot; discourse trees are usually those that are skewed to the right. We believe that the explanation of this observation is that text processing is, essentially, a left-to-rightprocess. Usually, people write texts so that the most important ideas go first, both at the paragraph and at the text level) The more text writers add, the more they elaborate on the text that went before: as a consequence, incremental discourse building consists mostly of expansion of the right branches. In order to deal with the ambiguity of discourse, the rhetorical parser computes a weight for each valid discourse tree and retains only those that are maximal. The weight function reflects how skewed to the right a tree is.</Paragraph> </Section> <Section position="2" start_page="248" end_page="248" type="sub_section"> <SectionTitle> 4.2 The rhetorical parser in operation </SectionTitle> <Paragraph position="0"> Consider the following text from the November 1996 issue of Scientific American (3). The words in italics denote the discourse markers, the square brackets denote l In fact, journalists axe trained to employ this &quot;pyramid&quot; approach to writing consciously (Cumming and McKercher, 1994).</Paragraph> <Paragraph position="1"> the boundaries of elementary textual units, and the curly brackets denote the boundaries of parenthetical textual units that were determined by the rhetorical parser (see Marcu (1997) for details); the numbers associated with the square brackets are identification labels.</Paragraph> <Paragraph position="2"> (3) \[With its distant orbit {-- 50 percent farther from the sun than Earth --}and slim atmospheric blanket, 1\] \[Mars experiences frigid weather conditions. 2\] \[Surface temperatures typically average about -60 degrees Celsius (-76 degrees Fahrenheit) at the equator and can dip to -123 degrees C near the poles)\] \[Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion:\] \[but any liquid water formed in this way would evaporate almost instantly 5\] \[because of the low atmospheric pressure. 6 \] \[Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop, 7\] \[most Martian weather involves blowing dust or carbon dioxide)\] \[Each winter,for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dryice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap. 9\] \[Yet even on the summer pole, { where the sun remains in the sky all day long,} temperatures never warm enough to melt frozen water) deg\] Since parenthetical information is related only to the elementary unit that it belongs to, we do not assign it an elementary textual unit status. Such an assignment will only create problems at the formal level as well, because then discourse structures can no longer be represented as binary trees.</Paragraph> <Paragraph position="3"> On the basis of the data derived from the corpus ,analysis, the algorithm hypothesizes the following set of relations between the textual units: rhet_rel(JUSTIFICATION, 1,2) V rhet..rel(CONDITION, 1,2) rhet_rel(ELABORATION, 3, \[1,2\]) V rhet_reI(ELABORATION, \[3, 6\], \[ 1,2\]) rhet_rel(El_ABOgATlON, \[4, 6\], 3) V rhet_ret(ELABOr~YlON, \[4, 6\], \[1, 3\]) rhet_rel(CONTRAST, 4, 5) (4) rhet_rel(EVIDENCE, 6, 5) rhet_reI(ELABORATION, \[7, 10\], \[1,6\]) rhet_rel(CONCESSION, 7, 8) rhet_rel(EXAMPLE, 9, \[7, 8\]) V rhet_rel(EXAMPLE, \[9, 10\], \[7, 8\]) rhet_rel(ANTITHESlS, 9, 10) V rhet_rel(ANTITHESlS, \[7,9\], 10) The algorithm then determines all the valid discourse trees that can be built for elementary units 1 to 10, given the constraints in (4). In this case, the algorithm constructs 8 different trees. The trees are ordered according to their weights. The &quot;best&quot; tree for text (3) has weight 3 and is fully represented in figure 3. The PostScript file corresponding to figure 3 was automatically generated by degtbit'P&quot; and sl~m frigid weather \[ dagr-- Fahzenheit) &quot;Cdegnmut &quot; ' 1 - \] I t a~osphcafiCblanket, oonthlion3. I'g at tl~ eq ..... d i !,but): water-icewal~r' andclouds blowing du~ orcarbon dioxide. \[ accemnlttedl~'i ......... fa~n gh to n~ltwat~. (I) . (2) l \[ C/an dip to .123 t ~&quot; ~meti~esdevelop,. (8) previotLslyfrozen (10) ........... \[ aegr~s C n~ tl~ / \ (7) ~ carbon ,~oxi,tpoles. ' ........... evaporates from the (3) ! op pc,~li t.. polar cap. (9) ' \ Only the midday sun I - 50 ~rc~nt at Izopical ___ ~1 farther from the latitudes b warm \[ Evidence . where the sun r~.~ml in the sky SUla I~lm Earth - enough to thaw ice \[ ( becanse ) all day long, on ~on.</Paragraph> <Paragraph position="4"> ............ !.'2 ............. / &quot;&quot;*'.</Paragraph> <Paragraph position="5"> but any liquid \[ .... : ......</Paragraph> <Paragraph position="6"> water formed in \[ , because ofthe low this way would \[ &quot; atmo~het~c</Paragraph> <Paragraph position="8"> a back-end ,algorithm that uses &quot;dot&quot;, a preprocessor for drawing directed graphs. The convention that we use is that nuclei are surrounded by solid boxes and satellites by dotted boxes; the links between a node and the subordinate nucleus or nuclei are represented by solid arrows, and the links between a node and the subordinate satellites by dotted lines. The occurrences of parenthetical information are marked in the text by a-P- and a unique subordinate satellite that contains the parenthetical information. null</Paragraph> </Section> <Section position="3" start_page="248" end_page="248" type="sub_section"> <SectionTitle> 4.3 Discussion and evaluation </SectionTitle> <Paragraph position="0"> We believe that there are two ways to evaluate the correctness of the discourse trees that an automatic process builds. One way is to compare the automatically derived trees with trees that have been built manually. Another way is to evaluate the impact that the discourse trees that we derive automatically have on the accuracy of other natural language processing tasks, such as anaphora resolution, intention recognition, or text summarization. In this paper, we describe evaluations that follow both these avenues.</Paragraph> <Paragraph position="1"> Unfortunately, the linguistic community has not yet built a corpus of discourse trees against which our rhetorical parser can be evaluated with the effectiveness that traditional parsers are. To circumvent this problem, two analysts manually built the discourse trees for five texts that ranged from 161 to 725 words. Although there were some differences with respect to the names of the relations that the analysts used, the agreement with respect to the status assigned to various units (nuclei and satellites) and the overall shapes of the trees was significant.</Paragraph> <Paragraph position="2"> In order to measure this agreement we associated an importance score to each textual unit in a tree and computed the Spearman correlation coefficients between the importance scores derived from the discourse trees built by each analyst? The Spearman correlation coefficient between the ranks assigned for each textual unit on the bases of the discourse trees built by the two analysts was very high: 0.798, atp < 0.0001 level of significance. The differences between the two analysts came mainly from their interpretations of two of the texts: the discourse trees of one analyst mirrored the paragraph structure of the texts, while the discourse trees of the other mirrored a logical organization of the text, which that analyst believed to be important.</Paragraph> <Paragraph position="3"> The Spearman correlation coefficients with respect to the importance of textual units between the discourse trees built by our program and those built by each analyst were 0.480, p < 0.0001 and 0.449, p < 0.0001. These lower correlation values were due to the differences in the overall shape of the trees and to the fact that the granularity of the discourse trees built by the program was not as fine as that of the trees built by the analysts. Besides directly comparing the trees built by the program with those built by analysts, we also evaluated the impact that our trees could have on the task of summarizing text. A summarization program that uses the rhetorical parser described here recalled 66% of the sentences considered important by 13 judges in the same five texts, with a precision of 68%. In contrast, a random procedure recalled, on average, only 38.4% of the sentences considered important by the judges, with a precision of 38.4%. And the Microsoft Office 97 summarizer recalled 41% of the important sentences with a precision of 39%.</Paragraph> <Paragraph position="4"> We discuss at length the experiments from which the data presented above was derived in (Marcu, 1997).</Paragraph> <Paragraph position="5"> The rhetorical parser presented in this paper uses only the structural constraints that were enumerated in section 2. Co-relational constraints, focus, theme, anaphoric links, and other syntactic, semantic, and pragmatic factors do not yet play a role in our system, but we nevertheless expect them to reduce the number of valid discourse trees that can be associated with a text. We also expect that other robust methods for determining coherence relations between textual units, such as those described by Harabagiu and Moldovan (1995), will improve the accuracy of the routines that hypothesize the rhetorical relations that hold between adjacent units.</Paragraph> <Paragraph position="6"> We are not aware of the existence of any other rhetorical parser for English. However, Sumita et ,'d. (1992) report on a discourse analyzer for Japanese. Even if one ignores some computational &quot;bonuses&quot; that can be easily exploited by a Japanese discourse analyzer (such as co-reference and topic identification), there are still some key differences between Sumita's work and ours. Particularly important is the fact that the theoretical foundations of Sumita et al.'s analyzer do not seem to be able to accommodate the ambiguity of discourse markers: in their axe independent of each other, against the alternative hypothesis that the rank of a variable is correlated with the rank of another variable. The value of the statistic ranges from -1, indicating that high ranks of one variable occur with low ranks of the other variable, through 0, indicating no correlation between tile variables, to + 1, indicating that high ranks of one variable occur with high ranks of the other variable.</Paragraph> <Paragraph position="7"> system, discourse markers are considered unambiguous with respect to the relations that they signal. In contrast, our system uses a mathematical model in which this ambiguity is acknowledged and appropriately treated. Also, the discourse trees that we build are very constrained structures (see section 2): as a consequence, we do not overgenerate invalid trees as Sumita et al. do. Furthermore, we use only surface-based methods for determining the markers and textual units and use clauses as the minimal units of the discourse trees. In contrast, Sumita et al. use deep syntactic and semantic processing techniques for determining the markers and the textual units and use sentences as minimal units in the discourse structures that they build. A detailed comparison of our work with Sumita et al.'s and others' work is given in (Marcu, 1997).</Paragraph> </Section> </Section> class="xml-element"></Paper>