File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0305_metho.xml

Size: 27,785 bytes

Last Modified: 2025-10-06 14:10:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0305">
  <Title>Annotating Attribution in the Penn Discourse TreeBank</Title>
  <Section position="4" start_page="0" end_page="31" type="metho">
    <SectionTitle>
2 The Penn Discourse TreeBank (PDTB)
</SectionTitle>
    <Paragraph position="0"> The PDTB contains annotations of discourse relations and their arguments on the Wall Street Journal corpus (Marcus et al., 1993). Following the approach towards discourse structure in (Webber et al., 2003), the PDTB takes a lexicalized ap- null proacha2 towards the annotation of discourse relations, treating discourse connectives as the anchors of the relations, and thus as discourse-level predicates taking two abstract objects (AOs) as their arguments. For example, in (1), the subordinating conjunction since is a discourse connective that anchors a TEMPORAL relation between the event of the earthquake hitting and a state where no music is played by a certain woman. (The 4digit number in parentheses at the end of examples gives the WSJ file number of the example.)  (1) She hasn't played any music since the earthquake hit. (0766)  There are primarily two types of connectives in the PDTB: &amp;quot;Explicit&amp;quot; and &amp;quot;Implicit&amp;quot;. Explicit connectives are identified form four grammatical classes: subordinating conjunctions (e.g., because, when, only because, particularly since), subordinators (e.g., in order that), coordinating conjunctions (e.g., and, or), and discourse adverbials (e.g., however, otherwise). In the examples in this paper, Explicit connectives are underlined. For sentences not related by an Explicit connective, annotators attempt to infer a discourse relation between them by inserting connectives (called &amp;quot;Implicit&amp;quot; connectives) that best convey the inferred relations. For example, in (2), the inferred CAUSAL relation between the two sentences was annotated with because as the Implicit connective. Implicit connectives together with their sense classification are shown here in small caps.</Paragraph>
    <Paragraph position="1"> (2) Also unlike Mr. Ruder, Mr. Breeden appears to be in a position to get somewhere with his agenda.</Paragraph>
    <Paragraph position="2"> Implicit=BECAUSE (CAUSE) As a former White House aide who worked closely with Congress, he is savvy in the ways of Washington. (0955) Cases where a suitable Implicit connective could not be annotated between adjacent sentences are annotated as either (a) &amp;quot;EntRel&amp;quot;, where the second sentence only serves to provide some further description of an entity in the first sentence (Example 3); (b) &amp;quot;NoRel&amp;quot;, where no discourse relation or entity-based relation can be inferred; and  (c) &amp;quot;AltLex&amp;quot;, where the insertion of an Implicit connective leads to redundancy, due to the relation being alternatively lexicalized by some &amp;quot;nonconnective&amp;quot; expression (Example 4).</Paragraph>
    <Paragraph position="3"> (3) C.B. Rogers Jr. was named chief executive officer of this business information concern. Implicit=EntRel Mr. Rogers, 60 years old, succeeds J.V. White, 64, who will remain chairman and chairman of the executive committee (0929).</Paragraph>
    <Paragraph position="4"> (4) One in 1981 raised to $2,000 a year from $1,500  the amount a person could put, tax-deductible, into the tax-deferred accounts and widened coverage to people under employer retirement plans. Implicit=AltLex (consequence) [This caused] an explosion of IRA promotions by brokers, banks, mutual funds and others. (0933) Arguments of connectives are simply labelled Arg2, for the argument appearing in the clause syntactically bound to the connective, and Arg1, for the other argument. In the examples here, Arg1 appears in italics, while Arg2 appears in bold.</Paragraph>
    <Paragraph position="5"> The basic unit for the realization of an AO argument of a connective is the clause, tensed or untensed, but it can also be associated with multiple clauses, within or across sentences. Nominalizations and discourse deictics (this, that), which can also be interpreted as AOs, can serve as the argument of a connective too.</Paragraph>
    <Paragraph position="6"> The current version of the PDTB also contains attribution annotations on discourse relations and their arguments. These annotations, however, used the earlier core scheme which is subsumed in the extended scheme described in this paper.</Paragraph>
    <Paragraph position="7"> The first release of the Penn Discourse TreeBank, PDTB-1.0 (reported in PDTB-Group (2006)), is freely available from http://www.seas.upenn.edu/~pdtb.</Paragraph>
    <Paragraph position="8"> PDTB-1.0 contains 100 distinct types of Explicit connectives, with a total of 18505 tokens, annotated across the entire WSJ corpus (25 sections). Implicit relations have been annotated in three sections (Sections 08, 09, and 10) for the first release, totalling 2003 tokens (1496 Implicit connectives, 19 AltLex relations, 435 EntRel tokens, and 53 NoRel tokens). The corpus also includes a broadly defined sense classification for the implicit relations, and attribution annotation with the earlier core scheme. Subsequent releases of the PDTB will include Implicit relations annotated across the entire corpus, attribution annotation using the extended scheme proposed here, and fine-grained sense classification for both Explicit and Implicit connectives.</Paragraph>
  </Section>
  <Section position="5" start_page="31" end_page="36" type="metho">
    <SectionTitle>
3 Annotation of Attribution
</SectionTitle>
    <Paragraph position="0"> Recent work (Wiebe et al., 2005; Prasad et al., 2005; Riloff et al., 2005; Stoyanov et al., 2005), has shown the importance of recognizing and representing the source and factuality of information in certain NLP applications. Information extraction systems, for example, would perform better  bya3 prioritizing the presentation of factual information, and multi-perspective question answering systems would benefit from presenting information from different perspectives.</Paragraph>
    <Paragraph position="1"> Most of the annotation approaches tackling these issues, however, are aimed at performing classifications at either the document level (Pang et al., 2002; Turney, 2002), or the sentence or word level (Wiebe et al., 2004; Yu and Hatzivassiloglou, 2003). In addition, these approaches focus primarily on sentiment classification, and use the same for getting at the classification of facts vs. opinions. In contrast to these approaches, the focus here is on marking attribution on more analytic semantic units, namely the Abstract Objects (AOs) associated with predicate-argument discourse relations annotated in the PDTB, with the aim of providing a compositional classification of the factuality of AOs. The scheme isolates four key properties of attribution, to be annotated as features: (1) source, which distinguishes between different types of agents (Section 3.1); (2) type, which encodes the nature of relationship between agents and AOs, reflecting the degree of factuality of the AO (Section 3.2); (3) scopal polarity, which is marked when surface negated attribution reverses the polarity of the attributed AO (Section 3.3), and (4) determinacy, which indicates the presence of contexts due to which the entailment of attribution gets cancelled (Section 3.4). In addition, to further facilitate the task of identifying attribution, the scheme also aims to annotate the text span complex signaling attribution (Section 3.5) Results from annotations using the earlier attribution scheme (PDTB-Group, 2006) show that a significant proportion (34%) of the annotated discourse relations have some non-Writer agent as the source for either the relation or one or both arguments. This illustrates the simplest case of the ambiguity inherent for the factuality of AOs, and shows the potential use of the PDTB annotations towards the automatic classification of factuality.</Paragraph>
    <Paragraph position="2"> The annotations also show that there are a variety of configurations in which the components of the relations are attributed to different sources, suggesting that recognition of attributions may be a complex task for which an annotated corpus may be useful. For example, in some cases, a relation together with its arguments is attributed to the writer or some other agent, whereas in other cases, while the relation is attributed to the writer, one or both of its arguments is attributed to different agent(s). For Explicit connectives. there were 6 unique configurations, for configurations containing more than 50 tokens, and 5 unique configurations for Implicit connectives.</Paragraph>
    <Section position="1" start_page="32" end_page="33" type="sub_section">
      <SectionTitle>
3.1 Source
</SectionTitle>
      <Paragraph position="0"> The source feature distinguishes between (a) the writer of the text (&amp;quot;Wr&amp;quot;), (b) some specific agent introduced in the text (&amp;quot;Ot&amp;quot; for other), and (c) some generic source, i.e., some arbitrary (&amp;quot;Arb&amp;quot;) individual(s) indicated via a non-specific reference in the text. The latter two capture further differences in the degree of factuality of AOs with nonwriter sources. For example, an &amp;quot;Arb&amp;quot; source for some information conveys a higher degree of factuality than an &amp;quot;Ot&amp;quot; source, since it can be taken to be a &amp;quot;generally accepted&amp;quot; view.</Paragraph>
      <Paragraph position="1"> Since arguments can get their attribution through the relation between them, they can be annotated with a fourth value &amp;quot;Inh&amp;quot;, to indicate that their source value is inherited from the relation.</Paragraph>
      <Paragraph position="2"> Given this scheme for source, there are broadly two possibilities. In the first case, a relation and both its arguments are attributed to the same source, either the writer, as in (5), or some other agent (here, Bill Biedermann), as in (6). (Attribution feature values assigned to examples are shown below each example; REL stands for the discourse relation denoted by the connective; Attribution text spans are shown boxed.)  (5) Since the British auto maker became a takeover target last month, its ADRs have jumped about  As Example (5) shows, text spans for implicit Writer attributions (corresponding to implicit communicative acts such as I write, or I say), are not marked and are taken to imply Writer attribution by default (see also Section 3.5).</Paragraph>
      <Paragraph position="3"> In the second case, one or both arguments have a different source from the relation. In (7), for example, the relation and Arg2 are attributed to the writer, whereas Arg1 is attributed to another agent (here, Mr. Green). On the other hand, in (8) and (9), the relation and Arg1 are attributed to the writer, whereas Arg2 is attributed to another agent.</Paragraph>
      <Paragraph position="4">  (7) When Mr. Green won a $240,000 verdict in a land condemnation case against the state in June 1983, he says Judge O'Kicki unexpectedly awarded him an additional $100,000. (0267) REL Arg1 Arg2 [Source] Wr Ot Inh (8) Factory orders and construction outlays were largely flat in December while purchasing agents said manufacturing shrank further in October. (0178) REL Arg1 Arg2 [Source] Wr Inh Ot (9) There, on one of his first shopping trips, Mr. Paul picked up several paintings at stunning prices. a4a5a4a5a4 Afterward, Mr. Paul is said by Mr. Guterman to have phoned Mr. Guterman, the New York developer selling the collection, and gloated. (2113)</Paragraph>
      <Paragraph position="6"> Example (10) shows an example of a generic source indicated by an agentless passivized attribution on Arg2 of the relation. Note that passivized attributions can also be associated with a specific source when the agent is explicit, as shown in (9). &amp;quot;Arb&amp;quot; sources are also identified by the occurrences of adverbs like reportedly, allegedly, etc.</Paragraph>
      <Paragraph position="7"> (10) Although index arbitrage is said to add liquidity to markets, John Bachmann, a4a5a4a5a4 says too much liq- null We conclude this section by noting that &amp;quot;Ot&amp;quot; is used to refer to any specific individual as the source. That is, no further annotation is provided to indicate who the &amp;quot;Ot&amp;quot; agent in the text is. Furthermore, as shown in Examples (11-12), multiple &amp;quot;Ot&amp;quot; sources within the same relation do not indicate whether or not they refer to the same or different agents. However, we assume that the text span annotations for attribution, together with an independent mechanism for named entity recognition and anaphora resolution can be employed to identify and disambiguate the appropriate references.</Paragraph>
      <Paragraph position="8">  (11) Suppression of the book, Judge Oakes observed , would operate as a prior restraint and thus involve the First Amendment. Moreover, and here Judge Oakes went to the heart of the question , &amp;quot;Responsible biographers and historians constantly use primary sources, letters, diaries, and memoranda. (0944) REL Arg1 Arg2 [Source] Wr Ot Ot (12) The judge was considered imperious, abrasive and  ambitious, those who practiced before him say .</Paragraph>
      <Paragraph position="9"> Yet, despite the judge's imperial bearing, no one ever had reason to suspect possible wrongdoing,</Paragraph>
    </Section>
    <Section position="2" start_page="33" end_page="34" type="sub_section">
      <SectionTitle>
3.2 Type
</SectionTitle>
      <Paragraph position="0"> The type feature signifies the nature of the relation between the agent and the AO, leading to different inferences about the degree of factuality of the AO. In order to capture the factuality of the AOs, we start by making a three-way distinction of AOs into propositions, facts and eventualities (Asher, 1993). This initial distinction allows for a more semantic, compositional approach to the annotation and recognition of factuality. We define the attribution relations for each AO type as follows: (a) Propositions involve attribution to an agent of his/her (varying degrees of) commitment towards the truth of a proposition; (b) Facts involve attribution to an agent of an evaluation towards or knowledge of a proposition whose truth is taken for granted (i.e., a presupposed proposition); and (c) Eventualities involve attribution to an agent of an intention/attitude towards an eventuality. In the case of propositions, a further distinction is made to capture the difference in the degree of the agent's commitment towards the truth of the proposition, by distinguishing between &amp;quot;assertions&amp;quot; and &amp;quot;beliefs&amp;quot;. Thus, the scheme for the annotation of type ultimately uses a four-way distinction for AOs, namely between assertions, beliefs, facts, and eventualities. Initial determination of the degree of factuality involves determination of the type of the AO.</Paragraph>
      <Paragraph position="1"> AO types can be identified by well-defined semantic classes of verbs/phrases anchoring the attribution. We consider each of these in turn.</Paragraph>
      <Paragraph position="2"> Assertions are identified by &amp;quot;assertive predicates&amp;quot; or &amp;quot;verbs of communication&amp;quot; (Levin, 1993) such as say, mention, claim, argue, explain etc.</Paragraph>
      <Paragraph position="3"> They take the value &amp;quot;Comm&amp;quot; (for verbs of Communication). In Example (13), the Ot attribution on Arg1 takes the value &amp;quot;Comm&amp;quot; for type. Implicit writer attributions, as in the relation of (13), also take (the default) &amp;quot;Comm&amp;quot;. Note that when an argument's attribution source is not inherited (as in Arg1 in this example) it also takes its own independent value for type. This example thus conveys that there are two different attributions expressed within the discourse relation, one for the relation and the other for one of its arguments, and that both involve assertion of propositions.</Paragraph>
      <Paragraph position="4">  (13) When Mr. Green won a $240,000 verdict in a land condemnation case against the state in June 1983,  In the absence of an independent occurrence of attribution on an argument, as in Arg2 of Example (13), the &amp;quot;Null&amp;quot; value is used for the type on the argument, meaning that it needs to be derived by independent (here, undefined) considerations under the scope of the relation. Note that unlike the &amp;quot;Inh&amp;quot; value of the source feature, &amp;quot;Null&amp;quot; does not indicate inheritance. In a subordinate clause, for example, while the relation denoted by the subordinating conjunction may be asserted, the clause content itself may be presupposed, as seems to be the case for the relation and Arg2 of (13). However, we found these differences difficult to determine at times, and consequently leave this undefined in the current scheme.</Paragraph>
      <Paragraph position="5"> Beliefs are identified by &amp;quot;propositional attitude verbs&amp;quot; (Hintikka, 1971) such as believe, think, expect, suppose, imagine, etc. They take the value &amp;quot;PAtt&amp;quot; (for Propostional Attitude). An example of a belief attribution is given in (14).</Paragraph>
      <Paragraph position="6"> (14) Mr. Marcus believes spot steel prices will continue to fall through early 1990 and then reverse themselves. (0336)  Karttunen, 1971) such as regret, forget, remember, know, see, hear etc. They take the value &amp;quot;Ftv&amp;quot; (for Factive) for type (Example 15). In the current scheme, this class does not distinguish between the true factives and semi-factives, the former involving an attitute/evaluation towards a fact, and the latter involving knowledge of a fact.</Paragraph>
      <Paragraph position="7"> (15) The other side , he argues knows Giuliani has always been pro-choice, even though he has personal</Paragraph>
      <Paragraph position="9"> Lastly, eventualities are identified by a class of verbs which denote three kinds of relations between agents and eventualities (Sag and Pollard, 1991). The first kind is anchored by verbs of influence like persuade, permit, order, and involve one agent influencing another agent to perform (or not perform) an action. The second kind is anchored by verbs of commitment like promise, agree, try, intend, refuse, decline, and involve an agent committing to perform (or not perform) an action. Finally, the third kind is anchored by verbs of orientation like want, expect, wish, yearn, and involve desire, expectation, or some similar mental orientation towards some state(s) of affairs. These sub-distinctions are not encoded in the annotation, but we have used the definitions as a guide for identifying these predicates. All these three types are collectively referred to and annotated as verbs of control. Type for these classes takes the value &amp;quot;Ctrl&amp;quot; (for Control). Note that the syntactic term control is used because these verbs denote uniform structural control properties, but the primary basis for their definition is nevertheless semantic.</Paragraph>
      <Paragraph position="10"> An example of the control attribution relation anchored by a verb of influence is given in (16).</Paragraph>
      <Paragraph position="11"> (16) Eward and Whittington had planned to leave the bank earlier, but Mr. Craven had persuaded them to remain until the bank was in a healthy position.</Paragraph>
      <Paragraph position="12">  Note that while our use of the term source applies literally to agents responsible for the truth of a proposition, we continue to use the same term for the agents for facts and eventualities. Thus, for facts, the source represents the bearers of attitudes/knowledge, and for considered eventualities, the source represents intentions/attitudes.</Paragraph>
    </Section>
    <Section position="3" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
3.3 Scopal Polarity
</SectionTitle>
      <Paragraph position="0"> The scopal polarity feature is annotated on relations and their arguments to primarily identify cases when verbs of attribution are negated on the surface - syntactically (e.g., didn't say, don't think) or lexically (e.g., denied), but when the negation in fact reverses the polarity of the attributed relation or argument content (Horn, 1978). Example (17) illustrates such a case. The 'but' clause entails an interpretation such as &amp;quot;I think it's not a main consideration&amp;quot;, for which the negation must take narrow scope over the embedded clause rather than the higher clause. In particular, the interpretation of the CONTRAST relation denoted by but requires that Arg2 should be interpreted under the scope of negation.</Paragraph>
      <Paragraph position="1">  (17) &amp;quot;Having the dividend increases is a supportive element in the market outlook, but I don't think it's a main consideration,&amp;quot; he says. (0090)  tions on attribution verbs, an argument of a connective is marked &amp;quot;Neg&amp;quot; for scopal polarity when the interpretation of the connective requires the surface negation to take semantic scope over the lower argument. Thus, in Example (17), scopal polarity is marked as &amp;quot;Neg&amp;quot; for Arg2.</Paragraph>
      <Paragraph position="2"> When the neg-lowered interpretations are not present, scopal polarity is marked as the default &amp;quot;Null&amp;quot; (such as for the relation and Arg1 of Example 17).</Paragraph>
      <Paragraph position="3"> It is also possible for the surface negation of attribution to be interpreted as taking scope over the relation, rather than an argument. We have not observed this in the corpus yet, so we describe this case with the constructed example in (18). What the example shows is that in addition to entailing (18b) - in which case it would be annotated parallel to Example (17) above - (18a) can also entail (18c), such that the negation is intrepreted as taking semantic scope over the &amp;quot;relation&amp;quot; (Lasnik, 1975), rather than one of the arguments. As the scopal polarity annotations for (18c) show, lowering of the surface negation to the relation is marked as &amp;quot;Neg&amp;quot; for the scopal polarity of the relation. null (18) a. John doesn't think Mary will get cured because she took the medication.</Paragraph>
      <Paragraph position="4"> b. a7a8 John thinks that because Mary took the medication, she will not get cured.</Paragraph>
      <Paragraph position="5">  c. a7a8 John thinks that Mary will get cured not because she took the medication (but because she has started practising yoga.)  We note that scopal polarity does not capture the appearance of (opaque) internal negation that may appear on arguments or relations themselves.</Paragraph>
      <Paragraph position="6"> For example, a modified connective such as not because does not take &amp;quot;Neg&amp;quot; as the value for scopal polarity, but rather &amp;quot;Null&amp;quot;. This is consistent with our goal of marking scopal polarity only for lowered negation, i.e., when surface negation from the attribution is lowered to either the relation or argument for interpretation.</Paragraph>
    </Section>
    <Section position="4" start_page="35" end_page="35" type="sub_section">
      <SectionTitle>
3.4 Determinacy
</SectionTitle>
      <Paragraph position="0"> The determinacy feature captures the fact that the entailment of the attribution relation can be made indeterminate in context, for example when it appears syntactically embedded in negated or conditional contexts.. The annotation attempts to capture such indeterminacy with the value &amp;quot;Indet&amp;quot;. Determinate contexts are simply marked as the default &amp;quot;Null&amp;quot;. For example, the annotation in (19) conveys the idea that the belief or opinion about the effect of higher salaries on teachers' performance is not really attributed to anyone, but is rather only being conjectured as a possibility.</Paragraph>
      <Paragraph position="1"> (19) It is silly libel on our teachers to think they would educate our children better if only they got a few thousand dollars a year more. (1286)</Paragraph>
    </Section>
    <Section position="5" start_page="35" end_page="36" type="sub_section">
      <SectionTitle>
3.5 Attribution Spans
</SectionTitle>
      <Paragraph position="0"> In addition to annotating the properties of attribution in terms of the features discussed above, we also propose to annotate the text span associated with the attribution. The text span is annotated as a single (possibly discontinuous) complex reflecting three of the annotated features, namely source, type and scopal polarity. The attribution span also includes all non-clausal modifiers of the elements contained in the span, for example, adverbs and appositive NPs. Connectives, however, are excluded from the span, even though they function as modifiers. Example (20) shows a discontinuous annotation of the attribution, where the parenthetical he argues is excluded from the attribution phrase the other side knows, corresponding to the factive attribution.</Paragraph>
      <Paragraph position="1"> (20) The other side , he argues knows Giuliani has always been pro-choice, even though he has personal reservations. (0041)  Inclusion of the fourth feature, determinacy, is not &amp;quot;required&amp;quot; to be included in the current scheme because the entailment cancelling contexts  cana9 be very complex. For example, in Example (19), the conditional interpretation leading to the indeterminacy of the relation and its arguments is due to the syntactic construction type of the entire sentence. It is not clear how to annotate the indeterminacy induced by such contexts. In the example, therefore, the attribution span only includes the anchor for the type of the attribution. Spans for implicit writer attributions are left unmarked since there is no corresponding text that can be selected. The absence of a span annotation is simply taken to reflect writer attribution, together with the &amp;quot;Wr&amp;quot; value on the source feature. null Recognizing attributions is not trivial since they are often left unexpressed in the sentence in which the AO is realized, and have to be inferred from the prior discourse. For example, in (21), the relation together with its arguments in the third sentence are attributed to Larry Shapiro, but this attribution is implicit and must be inferred from the first sentence. null (21) &amp;quot;There are certain cult wines that can command these higher prices,&amp;quot; says Larry Shapiro of Marty's, a4a5a4a5a4 &amp;quot;What's different is that it is happening with young wines just coming out. We're seeing it partly because older vintages are growing more scarce.&amp;quot; (0071)</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="36" end_page="36" type="metho">
    <SectionTitle>
REL Arg1 Arg2
</SectionTitle>
    <Paragraph position="0"> [Source] Ot Inh Inh The spans for such implicit &amp;quot;Ot&amp;quot; attributions mark the text that provides the inference of the implicit attribution, which is just the closest occurrence of the explicit attribution phrase in the prior text.</Paragraph>
    <Paragraph position="1"> The final aspect of the span annotation is that we also annotate non-clausal phrases as the anchors attribution, such as prepositional phrases like according to X, and adverbs like reportedly, allegedly, supposedly. One such example is shown in (22).</Paragraph>
    <Paragraph position="2"> (22) No foreign companies bid on the Hiroshima project, according to the bureau . But the Japanese practice of deep discounting often is cited by Americans as a classic barrier to entry in Japan's market. (0501)</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML