File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0305_metho.xml
Size: 17,472 bytes
Last Modified: 2025-10-06 14:09:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0305"> <Title>Attribution and the (Non-)Alignment of Syntactic and Discourse Arguments of Connectives</Title> <Section position="4" start_page="0" end_page="29" type="metho"> <SectionTitle> 2 Overview of the PDTB </SectionTitle> <Paragraph position="0"> The PDTB builds on the DLTAG approach to discourse structure (Webber and Joshi, 1998; Webber et al., 1999; Webber et al., 2003) in which connectives are discourse-level predicates which project predicate-argument structure on a par with verbs at the sentence level. Initial work on the PDTB has been described in Miltsakaki et al. (2004a), Miltsakaki et al. (2004b), Prasad et al. (2004).</Paragraph> <Paragraph position="1"> The key contribution of the PDTB design framework is its bottom-up approach to discourse structure: Instead of appealing to an abstract (and arbitrary) set of discourse relations whose identification may confound multiple sources of discourse meaning, we start with the annotation of discourse connectives and their arguments, thus exposing a clearly defined level of discourse representation.</Paragraph> <Paragraph position="2"> The PDTB annotates as explicit discourse connectives all subordinating conjunctions, coordinating conjunctions and discourse adverbials. These predicates establish relations between two abstract objects such as events, states and propositions (Asher, 1993).</Paragraph> <Paragraph position="3"> We use Conn to denote the connective, and Arg1 and Arg2 to denote the textual spans from which the abstract object arguments are computed.</Paragraph> <Paragraph position="4"> In (1), the subordinating conjunction since establishes a temporal relation between the event of the earthquake hitting and a state where no music is played by a certain woman. In all the examples in this paper, as in (1), Arg1 is italicized, Arg2 is in boldface, and Conn is underlined.</Paragraph> <Paragraph position="5"> (1) She hasn't played any music since the earthquake hit.</Paragraph> <Paragraph position="6"> What counts as a legal argument? Since we take discourse relations to hold between abstract objects, we require that an argument contains at least one clause-level predication (usually a verb - tensed or untensed), though it may span as much as a sequence of clauses or sentences. The two exceptions are nominal phrases that express an event or a state, and discourse deictics that denote an abstract object. For example, discourse adverbials like as a result are distinguished from clausal adverbials like strangely which require only a single abstract object (Forbes, 2003).</Paragraph> <Paragraph position="7"> Each connective has exactly two arguments. The argument that appears in the clause syntactically associated with the connective, we call Arg2. The other argument is called Arg1. Both Arg1 and Arg2 can be in the same sentence, as is the case for subordinating conjunctions (e.g., because). The linear order of the arguments will be Arg2 Arg1 if the subordinate clause appears sentence initially; Arg1 Arg2 if the subordinate clause appears sentence finally; and undefined if it appears sentence medially. For an adverbial connective like however,Arg1isinthe prior discourse. Hence, the linear order of its arguments will be Arg1 Arg2.</Paragraph> <Paragraph position="8"> Because our annotation is on the same corpus as the PTB, annotators may select as arguments textual spans that omit content that can be recovered from syntax. In (2), for example, the relative clause is selected as Arg1 of even though, and its subject can be recovered from its syntactic analysis in the PTB. In (3), the subject of the infinitival clause in Arg1 is similarly available.</Paragraph> <Paragraph position="9"> (2) Workers described &quot;clouds of blue dust&quot; that hung over parts of the factory even though exhaust fans ventilated the air.</Paragraph> <Paragraph position="10"> (3) The average maturity for funds open only to institutions, considered by some to be a stronger indicator because those managers watch the market closely, reached a high point for the year - 33 days.</Paragraph> <Paragraph position="11"> The PDTB also annotates implicit connectives between adjacent sentences where no explicit connective occurs. For example, in (4), the two sentences are contrasted in a way similar to having an explicit connective like but occurring between them. Annotators are asked to provide, when possible, an explicit connective that best describes the relation, and in this case in contrast was chosen.</Paragraph> <Paragraph position="12"> (4) The $6 billion that some 40 companies are looking to raise in the year ending March 21 compares with only $2.7 billion raise on the capital market in the previous year. IMPLICIT - in contrast In fiscal 1984, before Mr. Gandhi came into power, only $810 million was raised.</Paragraph> <Paragraph position="13"> When complete, the PDTB will contain approximately 35K annotations: 15K annotations of the 100 explicit connectives identified in the corpus and 20K annotations of implicit connectives.</Paragraph> </Section> <Section position="5" start_page="29" end_page="31" type="metho"> <SectionTitle> 3 Annotation of attribution </SectionTitle> <Paragraph position="0"> Wiebe and her colleagues have pointed out the importance of ascribing beliefs and assertions expressed in text to the agent(s) holding or making them (Riloff and Wiebe, 2003; Wiebe et al., 2004; Wiebe et al., 2005). They have also gone a considerable way towards specifying how such subjective material should be annotated (Wiebe, 2002). Since we take discourse connectives to convey semantic predicate-argument relations between abstract objects, one can distinguish a variety of cases dependingontheattribution of the discourse relation or its The annotation guidelines for the PDTB are available at http://www.cis.upenn.edu/AOpdtb.</Paragraph> <Paragraph position="1"> arguments; that is, whether the relation or arguments are ascribed to the author of the text or someone other than the author.</Paragraph> <Paragraph position="2"> Case 1: The relation and both arguments are attributed to the same source. In (5), the concessive relation between Arg1 and Arg2, anchored on the connective even though is attributed to the speaker Dick Mayer, because he is quoted as having said it. Even where a connective and its arguments are not included in a single quotation, the attribution can still be marked explicitly as shown in (6), where only Arg2 is quoted directly but both Arg1 and Arg2 can be attibuted to Mr. Prideaux. Attribution to some speaker can also be marked in reported speech as shown in the annotation of so that in (7).</Paragraph> <Paragraph position="3"> (5) &quot;Now, Philip Morris Kraft General Foods' parent company is committed to the coffee business and to increased advertising for Maxwell House,&quot; says Dick Mayer, president of the General Foods USA division.</Paragraph> <Paragraph position="4"> &quot;Even though brand loyalty is rather strong for coffee, we need advertising to maintain and strengthen it.&quot; (6) B.A.T isn't predicting a postponement because the units &quot;are quality businesses and we are encouraged by the breadth of inquiries,&quot; said Mr. Prideaux.</Paragraph> <Paragraph position="5"> (7) Like other large Valley companies, Intel also noted that it has factories in several parts of the nation, so that a breakdown at one location shouldn't leave customers in a total pinch.</Paragraph> <Paragraph position="6"> Wherever there is a clear indication that a relation is attributed to someone other than the author of the text, we annotate the relation with the feature value SA for &quot;speaker attribution&quot; which is the case for (5), (6), and (7). The arguments in these examples are given the feature value IN to indicate that they &quot;inherit&quot; the attribution of the relation. If the relation and its arguments are attributed to the writer, they are given the feature values WA and IN respectively. null Relations are attributed to the writer of the text by default. Such cases include many instances of relations whose attribution is ambiguous between the writer or some other speaker. In (8), for example, we cannot tell if the relation anchored on although is attributed to the spokeswoman or the author of the text. As a default, we always take it to be attributed to the writer.</Paragraph> <Paragraph position="7"> Case 2: One or both arguments have a different attribution value from the relation. While the default value for the attribution of an argument is the attribution of its relation, it can differ as in (8). Here, as indicated above, the relation is attributed to the writer (annotated WA) by default, but Arg2 is attributed to Delmed (annotated SA, for some speaker other than the writer, and other than the one establishing the relation).</Paragraph> <Paragraph position="8"> (8) The current distribution arrangement ends in March 1990 , although Delmed said it will continue to provide some supplies of the peritoneal dialysis products to National Medical, the spokeswoman said.</Paragraph> <Paragraph position="9"> Annotating the corpus with attribution is necessary because in many cases the text containing the source of attribution is located in a different sentence. Such is the case for (5) where the relation conveyed by even though, and its arguments are attributed to Dick Mayer.</Paragraph> <Paragraph position="10"> We are also adding attribution values to the annotation of the implicit connectives. Implicit connectives express relations that are inferred by the reader. In such cases, the author intends for the reader to infer a discourse relation. As with explicit connectives, we have found it useful to distinguish implicit relations intended by the writer of the article from those intended by some other author or speaker. To give an example, the implicit relation in (9) is attributed to the writer. However, in (10) both Arg1 and Arg2 have been expressed by the speaker whose speech is being quoted. In this case, the implicit relation is attributed to the speaker.</Paragraph> <Paragraph position="11"> (9) Investors in stock funds didn't panic the weekend after mid-October's 190-point market plunge. IMPLICIT-instead Most of those who left stock funds simply switched into money market funds.</Paragraph> <Paragraph position="12"> (10) &quot;People say they swim, and that may mean they've been to the beach this year,&quot; Fitness and Sports. &quot;It's hard to know if people are responding truthfully.</Paragraph> <Paragraph position="13"> IMPLICIT-because People are too embarrassed to say they haven't done anything.&quot; The annotation of attribution is currently underway. The final version of the PDTB will include annotations of attribution for all the annotated connectives and their arguments.</Paragraph> <Paragraph position="14"> Note that in the Rhetorical Structure Theory (RST) annotation scheme (Carlson et al., 2003), attribution is treated as a discourse relation. We, on the other hand, do not treat attribution as a discourse relation. In PDTB, discourse relations (associated with an explicit or implicit connective) hold between two abstracts objects, such as events, states, etc. Attribution relates a proposition to an entity, not to another proposition, event, etc. This is an important difference between the two frameworks. One consequence of this difference is briefly discussed in Footnote 4 in the next section.</Paragraph> </Section> <Section position="6" start_page="31" end_page="32" type="metho"> <SectionTitle> 4 Arguments of Subordinating </SectionTitle> <Paragraph position="0"> Conjunctions in the PTB A natural question that arises with the annotation of arguments of subordinating conjunctions (SUB-CONJS) in the PDTB is to what extent they can be detected directly from the syntactic annotation in the PTB. In the simplest case, Arg2 of a SUBCONJ is its complement in the syntactic representation. This is indeed the case for (11), where since is analyzed as a preposition in the PTB taking an S complement which is Arg2 in the PDTB, as shown in Figure 1.</Paragraph> <Paragraph position="1"> (11) Since the budget measures cash flow, anew$1direct loan is treated as a $1 expenditure.</Paragraph> <Paragraph position="2"> Furthermore, in (11), since together with its complement (Arg2) is analyzed as an SBAR which modifies the clause anew$1 direct loan is treated as a $1 expenditure, and this clause is Arg1 in the PDTB.</Paragraph> <Paragraph position="3"> Can the arguments always be detected in this way? In this section, we present statistics showing that this is not the case and an analysis that shows that this lack of congruence between the PDTB and the PTB is not just a matter of annotator disagreement. null Consider example (12), where the PTB requires annotators to include the verb of attribution said and its subject Delmed in the complement of although.Butalthough as a discourse connective denies the expectation that the supply of dialysis products will be discontinued when the distribution arrangement ends. It does not convey the expectation that Delmed will not say such things. On the other hand, in (13), the contrast established by while is between the opinions of two entities i.e., advocates and their opponents.</Paragraph> <Paragraph position="4"> This distinction is hard to capture in an RST-based parsing framework (Marcu, 2000). According to the RST-based annotation scheme (Carlson et al., 2003) 'although Delmed said' and 'while opponents argued' are elementary discourse units (12) The current distribution arrangement ends in March 1990, although Delmed said it will continue to provide some supplies of the peritoneal dialysis products to National Medical, the spokeswoman said.</Paragraph> <Paragraph position="5"> (13) Advocates said the 90-cent-an-hour rise, to $4.25 an hour by April 1991, is too small for the working poor, while opponents argued that the increase will still hurt small business and cost many thousands of jobs.</Paragraph> <Paragraph position="6"> In Section 5, we will identify additional cases. What we will then argue is that it will be insufficient to train an algorithm for identifying discourse arguments simply on the basis of syntactically analysed text.</Paragraph> <Paragraph position="7"> We now present preliminary measurements of these and other mismatches between the two corpora for SUBCONJS. To do this we describe a procedural algorithm which builds on the idea presented at the start of this section. The statistics are preliminary in that only the annotations of a single annotator were considered, and we have not attempted to exclude cases in which annotators disagree.</Paragraph> <Paragraph position="8"> We consider only those SUBCONJS for which both arguments are located in the same sentence as the connective (which is the case for approximately 99% of the annotated instances). The syntactic configuration of such relations pattern in a way shown in Figure 1. Note that it is not necessary for any of BVD3D2D2, BTD6CVBD,orBTD6CVBE to have a single node in the parse tree that dominates it exactly. In Figure 1 we do obtain a single node for BVD3D2D2,andBTD6CVBE but for BTD6CVBD,itis the set of nodes CUC6C8BNCEC8CV that dominate it exactly. Connectives like so that,andeven if are not dominated by a single node, and cases where the annotator has decided that a (parenthetical) clausal element is not minimally necessary to the interpretation of BTD6CVBE will necessitate choosing multiple nodes that dominate BTD6CVBE exactly.</Paragraph> <Paragraph position="9"> Given the node(s) in the parse tree that dominate BVD3D2D2 (CUC1C6CV in Figure 1), the algorithm we present tries to find node(s) in the parse tree that dominate BTD6CVBD and BTD6CVBE exactly using the operation of tree subtraction (Sections 4.1, and 4.2). We then discuss its execution on (11) in Section 4.3.</Paragraph> <Paragraph position="10"> annotated in the same way: as satellites of the relation Attribution. RST does not recognize that satellite segments, such as the ones given above, sometimes participate in a higher RST relation along with their nuclei and sometimes not.</Paragraph> <Section position="1" start_page="32" end_page="32" type="sub_section"> <SectionTitle> 4.1 Tree subtraction </SectionTitle> <Paragraph position="0"> We will now define the operation of tree subtraction the graphical intuition for which is given in Figure 2. Let CC be the set of nodes in the tree.</Paragraph> <Paragraph position="1"> Definition 4.1. The ancestors of any node D8 BE CC, We denote this by DCA0CH BP CI.</Paragraph> <Paragraph position="2"> The nodes DE BE CI are the highest descendants of DC, which do not dominate any node DD BE CH and are not dominated by any node in CH .</Paragraph> </Section> <Section position="2" start_page="32" end_page="32" type="sub_section"> <SectionTitle> 4.2 Algorithm to detect the arguments </SectionTitle> <Paragraph position="0"> This ensures the inclusion of complementizers and subordinating conjuctions associated with the clause in Arg1.</Paragraph> <Paragraph position="1"> The convention adopted by the PDTB was to include such elements in the clause with which they were associated.</Paragraph> </Section> </Section> class="xml-element"></Paper>