File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2704_metho.xml
Size: 25,986 bytes
Last Modified: 2025-10-06 14:09:22
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2704"> <Title>Proposition Bank II: Delving Deeper</Title> <Section position="4" start_page="1" end_page="1" type="metho"> <SectionTitle> 2 PropBank I </SectionTitle> <Paragraph position="0"> PropBank (Kingsbury & Palmer, 2002) is an annotation of the Wall Street Journal portion of the Penn Treebank II (Marcus, 1994) with `predicate-argument' structures, using sense tags for highly polysemous words and semantic role labels for each argument. An important goal is to provide consistent semantic role labels across different syntactic realizations of the same verb, as in the The window] broke. PropBank can provide frequency counts for (statistical) analysis or generation components in a machine translation system, but provides only a shallow semantic analysis in that the annotation is close to the syntactic structure and each verb is its own predicate.</Paragraph> <Paragraph position="1"> In PropBank, semantic roles are defined on a verb-byverb basis. An individual verb's semantic arguments are simply numbered, beginning with 0. Polysemous verbs have several Framesets, corresponding to a relatively coarse notion of word senses, with a separate set of numbered roles, a roleset, defined for each Frameset.</Paragraph> <Paragraph position="2"> For instance, leave has both a DEPART Frameset ([ in my will].) While most Framesets have three or four numbered roles, as many as six can appear, in particular for certain verbs of motion. Verbs can take any of a set of general, adjunct-like arguments (ARGMs), such as LOC (location), TMP (time), DIS (discourse connectives), PRP (purpose) or DIR (direction). Negations (NEG) and modals (MOD) are also marked.</Paragraph> <Paragraph position="3"> The same annotation philosophy has been extended to the Penn Chinese Proposition Bank (Xue and Palmer, 2003). The Chinese PropBank annotation is performed on a smaller (250k words) and yet growing corpus annotated with syntactic structures (Xue et al 2004). The same syntactic alternations that form the basis for the English PropBank annotation also exist in robust quantities in Chinese, even though it may not be the case that the same exact verbs (meaning verbs that are close translations of one another) have the exact same range of syntactic realization for Chinese and English. For example, in (1), &quot;xin-nian/New Year zhao-daihui/reception&quot; plays the same role in (a) and (b), which is the event or activity held, even though it occurs in different syntactic positions. Assigning the same argument label, Arg1, to both instances, captures this regularity. It is worth noting that the predicate &quot;juxing/hold&quot; does not have passive morphology in (1a), despite of what its English translation suggests. Like the English PropBank, the adjunct-like elements receive more general labels like TMP or LOC, as also illustrated in (1). The tag set for Chinese and English PropBanks are to a large extent similar and more details can be found in (Xue and Palmer, 2003).</Paragraph> <Paragraph position="4"> For polysemous verbs that take different sets of semantic roles, we also distinguish different Framesets. (2) and (3) illustrate the different Framesets of &quot;tongguo/pass&quot;, which correspond loosely with major senses of the verb. The Frameset in (2) roughly means &quot;pass by voting&quot; while the Frameset illustrated by (3) means &quot;pass through&quot;. The different Framesets are generally reflected in the different alternation patterns, which can serve as a cue for statistical systems performing Frameset disambiguation. (2) is similar to the causative/inchoative alternation (Levin, 1993). In contrast, (3) shows object drop.</Paragraph> <Paragraph position="5"> (2) a. [ARG0 guo-hui/Congress] zui-jin/recently tongguo/pass le/ASP [ARG1 zhou-ji/interstate yin-hangfa/banking law] &quot;The U.S. Congress recently passed the inter-state banking law.&quot; b. [ARG1 zhou-ji/interstate yin-hang-fa/banking law] zui-jin/recently tong-guo/pass le/ASP &quot;The inter-state banking law passed recently.&quot; (3) a. [ARG0 huo-che/train] zheng-zai/now tong- null guo/pass [ARG1 sui-dao/tunnel] &quot;The train is passing through the tunne.&quot; b. [ARG0 huo-che/train] zheng-zai/now tongguo/pass. null &quot;The train is passing.&quot; There are also some notable differences between Chinese PropBank and English PropBank. In general, the verbs in the Chinese PropBank are less polysemous, with the vast majority of the verbs having just one Frameset. On the other hand, the Chinese PropBank has more verbs (including static verbs which are generally translated into adjectives in English) normalized by the corpus size.</Paragraph> <Paragraph position="6"> 3 Adding Event Variables to PropBank Event variables provide a rich analytical tool for analyzing verb meaning. Positing that there is an event variable allows for a straightforward representation of the logical form of adverbial modifiers, the capturing of pronominal reference to events, and the representation of nouns that refer to events. For example, event variables make it possible to have direct reference to an event with a noun phrase, as in (4a) destruction, and to refer back to an event with a pronoun (as illustrated in (4b) That): (4) a. The destruction of Pompeii happened in the 1 st century.</Paragraph> <Paragraph position="7"> b. Brutus stabbed Caesar. That was a pivotal event in history.</Paragraph> <Paragraph position="8"> PropBank I annotations can be translated straightforwardly into logical representations with event variables, as illustrated in (5), with relations being defined as predicates of events, and Args and ArgMs representing relations between event variables and corresponding phrases.</Paragraph> <Paragraph position="9"> (5) a. Mr. Bush met him privately, in the White House, on Thursday.</Paragraph> <Paragraph position="10"> b. PropBank annotation As the representation in (5c) shows, we adopt Neo-davidsonian analysis of events, which follows Parsons (1990) in treating arguments on a par with modifiers in the event structure. An alternative analysis is the original Davidsonian analysis of events (Davidson 1967), where the arguments of the verb are analyzed as its logical arguments.</Paragraph> <Paragraph position="11"> Our choice of a Neodavidsonian representation is motivated by its predictions with respect to obligatoriness of arguments. Under the Davidsonian approach, arguments are logical arguments of the verb and thus must be implied by the meaning of the sentence, either explicitly or implicitly (i.e. existentially quantified). On the other hand, it has been a crucial assumption in PropBank that not all roles must necessarily be present in each sentence. For example, the Frameset for the verb serve, shown in (6a) has three roles: Arg0, Arg1, and Arg2. Actual usages of the verb, on the other hand, do not require the presence of all three roles. For example, the sentence in (6b), as its PropBank annotation in (6c) shows, does not include Arg1.</Paragraph> <Paragraph position="12"> b. Each new trading roadblock is likely to be beaten by institutions seeking better ways *trace* to serve their high-volume clients.</Paragraph> <Paragraph position="13"> c. Arg0: *trace* -> institutions REL: serve Arg2: their high-volume clients As the representations in (7) illustrate, only the Neo-davidsonian representation gives the correct interpretation of this sentence.</Paragraph> <Paragraph position="14"> (7) Davidsonian representation: [?]e [?]z serve(e, institutions, z, their high-volume clients) Neodavidsonian representation: [?]e serve(e)&Arg0(e, institutions)&Arg2(e, their high-volume clients) Assuming a Neodavidsonian representation, we can analyze all Args and certain types of modifiers as predicates of events. The types of ArgMs that can be analyzed as predicates of event variables are shown below: * MNR: to manage businesses profitably * TMP: to run the company for 23 years * LOC: to use the notes on the test * DIR: to jump up * CAU: because of ...</Paragraph> <Paragraph position="15"> * PRP: in order to ...</Paragraph> <Paragraph position="16"> Whereas for the most part, translating these adverbials into modifiers of event variables does not require manual annotation, certain constructions need human revision. For example, in the sentence in (8a) the temporal ArgM 'for the past five years' does not modify the event variable e introduced by the verb manage, as our automatic translation would predict. The revised analysis of this sentence, given in (8b), follows Krifka 1989, who proposed that negated sentences refer to maximal events - events that have everything that happened during their running time as a part. Annotation of this sentence would thus require us to introduce an additional event variable, the maximal event e', which has a duration 'for the past five years' and has no event of unions managing wage increases as part.</Paragraph> <Paragraph position="17"> (8) a. For the past five years, unions have not managed to win wage increases.</Paragraph> <Paragraph position="18"> b. [?]e' TMP(e', 'for the past five years') & [?]!e(e<e' & managing(e) & Arg0(e, unions) & Arg1(e, 'win wage increases')) Further annotation involves linking empty categories in PropBank to event variables in cases of control, as illustrated in (9), where event variables can be viewed as the appropriate antecedents for PRO, marked as '*' below: (9) The car collided with a lorry, * killing both drivers. And, finally, we will consider tagging variables according to the aspectual class of the eventuality they denote, such as states or events. Events, such as John built a house, involve some kind of change and usually imply that some condition, which obtains when the event begins, is terminated by the event. States, on the other hand, do not involve any change and hold for varying amounts of time. It does not make sense to ask how long a state took (as opposed to events), and whether the state is culminated or finished.</Paragraph> <Paragraph position="19"> This distinction between states and events plays an important role for the temporal analysis of discourse, as the following examples (from Kamp and Reyle 1993) illustrate: (10) a. A man entered the White Hart. Bill served him a beer.</Paragraph> <Paragraph position="20"> b. I arrived at the Olivers' cottage on Friday night. It was not a propitious beginning to my visit. She was ill and he in a foul mood.</Paragraph> <Paragraph position="21"> If a non-initial sentence denotes an event, then it is typically understood as following the event described by the preceding sentence. For example, in (10a), the event of Bill serving a beer is understood as taking place after the event of 'a man entering the White Hart' was completed. On the other hand, states are interpreted as temporally overlapping with the time of the preceding sentence, as illustrated in (10b). The sentences she was ill and he was in a foul mood seem to describe a state of affairs obtaining at the time of the speaker's arrival. As this example illustrates, there are different types of temporal relations between eventualities (as we will call both events and states) and adverbials that modify them, such as temporal overlap and temporal containment. Furthermore, these relations crucially depend on the aspectual properties of the sentence. Translation of PB annotations to logical representations with eventuality variables and tagging these variables according to their aspectual type would thus make it possible to provide an analysis of temporal relations. This analysis should also be compatible with a higher level of annotation of temporal structure (e.g. Ferro et al, 2001).</Paragraph> </Section> <Section position="5" start_page="1" end_page="2" type="metho"> <SectionTitle> 4 Annotation of Nominal Coreference </SectionTitle> <Paragraph position="0"> Our approach to coreference annotation is based on the recognition of the different types of relationships that might be called &quot;coreference&quot;. The most straightforward case is that of two semantically definite NPs that refer to identical entities, as in (11). Anaphoric relations (very broadly defined) are those in which one NP (or possessive adjective) has no referential value of its own but depends on an antecedent for its interpretation.</Paragraph> <Paragraph position="1"> In some cases this can be relatively simple, as in (12), in which the pronoun He takes John Smith as its antecedent. However, in some cases, as in (13), the antecedent may not even be a referring expression, or can, as in (14), refer to an entity that may or may not exist, with the non-existent a car being the antecedent of it. The anaphor does not have to be an NP, as in (15), in which the possessive their, which takes many companies as its antecedent, is an adjective.</Paragraph> <Paragraph position="2"> (11) John Smith of Company X arrived yesterday. Mr.</Paragraph> <Paragraph position="3"> Smith said that...&quot; (12) John Smith of Company X arrived yesterday. He said that...&quot; (13) No team spoke about its system.</Paragraph> <Paragraph position="4"> (14) I want to buy a car. I need it to go to work.</Paragraph> <Paragraph position="5"> (15) Many companies raised their payouts by more than 10%.</Paragraph> <Paragraph position="6"> Another level of complexity is raised by NPs that are not anaphors, in that they have their own reference (perhaps abstract or nonexistent), but are not in an identity relationship with an antecedent, but rather describe a property of that antecedent. Typical cases of this are predicate nominals, as in (16), or appositives, as in (17), and other cases as in (18).</Paragraph> <Paragraph position="7"> (16) Larry is a university lecturer.</Paragraph> <Paragraph position="8"> (17) Larry, the chair of his department, became president. null (18) The stock price fell from $4.02 to $3.85 As has been discussed (e.g., van Deemter & Kibble, 2001), such cases have fundamentally different properties than either the identity relationships of (11) or the anaphoric relationships of (12)-(15).</Paragraph> <Paragraph position="9"> Annotation of nominal co-reference is being done in two passes. The first pass involves annotation of true co-reference between semantically definite NPs`. The issue here is to consider what the semantically definite nouns are. Initially, they are defined as proper nouns (named entities), either as NPs (America) or prenominal adjectives (American politicians).</Paragraph> <Paragraph position="10"> (19) The last time the S&P 500 yield dropped below 3% was in the summer of 1987... There have been only seven other times when the yield on the S&P 500 dropped....</Paragraph> <Paragraph position="11"> It is reasonable to expand this to definite descriptions, so that in (19), the S&P 500 yield and the yield on the S&P 500 are marked as coreferring. However, some definite NPs can refer to clauses, not NPs, such as The pattern in (20), and we will not do such cases of clausal antecedents on the first pass.</Paragraph> <Paragraph position="12"> (20) The index fell 40% in 1975 and jumped 80% in 1976. The pattern is an unusual one.</Paragraph> <Paragraph position="13"> Anaphoric relations are being done on a &quot;need-toannotate&quot; basis. For each anaphoric NP or possessive adjective, the annotator needs to determine its antecedent. As discussed, this is a different type of relation than identity, and this distinction will be noted in the annotation. The issue here is what we consider an anaphoric element to be. We consider all cases of pronouns, possessives, reflexives, and NPs with that/those to be potential cases of anaphors (again, broadly defined). However, as with definite NPs, we only mark those that have an NP antecedent, and not clausal antecedents. For example, in (21), it refers to the current 3.3% reading, and so would be marked as being in an antecedent-anaphor relation. In (22), it refers to having the dividend increases, which is not an NP, and so would not be marked as being in an anaphor relation in the first pass. Similar considerations apply to potential anaphors like those NP, that NP, etc.</Paragraph> <Paragraph position="14"> (21) ...the current 3.3% reading isn't as troublesome as it might have been.</Paragraph> <Paragraph position="15"> (22) Having the dividend increases is a supportive element in the market outlook, but I don't think it's a main consideration&quot;.</Paragraph> <Paragraph position="16"> Note that placing the burden on the anaphors to determine what gets marked as being in an anaphor-antecedent leaves it open as to what the antecedent might be, other than the requirement just mentioned of it being an NP. Not only might it be non-referring NPs as in (13) or (14), it could even be a generic, as in (23), in which books is the antecedent for they.</Paragraph> <Paragraph position="17"> (23) I like books. They make me smile.</Paragraph> <Paragraph position="18"> The second pass will tackle the more difficult issues: 1. Descriptive NPs, as in (16)-(18). While the information provided by these cases would be extremely valuable for information extraction and other systems, there are some uncertain issues here, mostly focusing on how such descriptors describe the antecedent at different moments in time and/or space. The crucial question is therefore what to take the descriptor to be.</Paragraph> <Paragraph position="19"> (24) Henry Higgins might become the president of Dreamy Detergents.</Paragraph> <Paragraph position="20"> For example, in (18), it can't be just $4.02 and $3.85, since this does not include information about *when* the stock price had such values. The same issue arises for (17). As van Deemter & Kibble point out, such cases can interact with issues of modality in uncertain ways, as illustrated in (24). Just saying that in (24) the president of Dreamy Detergents is in the same type of relationship with Henry Higgins as a university lecturer is with Larry in (16) would be very misleading.</Paragraph> <Paragraph position="21"> 2. Clausal antecedents - Here we will handle cases of it and other anaphor elements and definite NPs referring to non-NPs as antecedents, as in (21). This will most likely be done by referring to the eventuality variable associated with the antecedent.</Paragraph> <Paragraph position="22"> 5 Linking to the Penn Discourse Treebank (PDTB) The Penn Discourse Treebank (PDTB) is currently being built by the PDTB team at the University of Pennsylvania, providing the next appropriate level of annotation: the annotation of the predicate argument structure of connectives (Miltsakaki et al 2004a/b). The PDTB project is based on the idea that discourse connectives can be thought of as predicates with their associated argument structure. This perspective of discourse is based on a series of papers extending lexicalized tree-adjoining grammar (LTAG) to discourse (DLTAG), beginning with Webber and Joshi (1998).</Paragraph> <Paragraph position="23"> This level of annotation is quite complex for a variety of reasons, such as the lack of available literature describing discourse connectives and frequent occurrences of empty (lexically null) connectives between two sentences that cannot be ignored. Also, unlike the predicates at the sentence level, some of the discourse connectives, especially discourse adverbials, take their arguments anaphorically and not structurally, requiring an intimate association with event variable representation.</Paragraph> <Paragraph position="24"> The long-range goal of the PDTB project is to develop a large scale and reliably annotated corpus that will encode coherence relations associated with discourse connectives, including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure and supporting the extraction of a range of inferences associated with discourse connectives. This annotation will reference the Penn Treebank (PTB) annotations as well as PropBank.</Paragraph> <Paragraph position="25"> In PDTB, a variety of connectives are considered, such as subordinate and coordinate conjunctions, adverbial connectives and implicit connectives amounting to a total of approximately 20,000 annotations; 10,000 im- null The PDTB annotations are deliberately kept independent of DLTAG framework for two reasons: (1) to make the annotated corpus widely useful to researchers working in different frameworks and (2) to make the annotation task easier, thereby increasing interannotator reliability. plicit connectives and 10,000 annotations of the 250 explicit connectives identified in the corpus (for details see (Miltsakaki et al 2004a and Miltsakaki et al 2004b). Current annotations in PDTB are performed by four annotators. Individual annotation proceeds one connective at a time. This way, the annotators quickly gain experience with that connective and develop a better understanding of its predicate-argument characteristics. For the annotation of implicit connectives, the annotators are required to provide an explicit connective that best expressed the inferred relation.</Paragraph> <Paragraph position="26"> The PDTB is expected to be released by November 2005. The final version of the corpus will also contain characterizations of the semantic roles associated with the arguments of each type of connective as well as links to PropBank.</Paragraph> </Section> <Section position="6" start_page="2" end_page="2" type="metho"> <SectionTitle> 6. Annotation of Word Senses </SectionTitle> <Paragraph position="0"> The critical question with respect to sense tagging involves the choice of senses. In other words, which sense inventory, and which level of granularity with respect to that sense inventory? The PropBank Frames Files for the verbs include coarse-grained sense distinctions based primarily on usages of a verb that have different numbers of predicate-arguments. These are termed Framesets - referring to the set of roles for each one and the corresponding set of syntactic frames. We are currently sense-tagging the annotated predicates for lemmas with multiple Framesets, which can be done quickly and accurately with an inter-annotator agreement of over 90%. The distinctions made by the Framesets are very coarse, and each one would map to several standard dictionary entries for the lemma in question. More fine-grained sense distinctions could be useful for Automatic Content Extraction, yet it remains to be determined exactly which distinctions are necessary and what methodology should be followed to provide additional word sense annotation.</Paragraph> <Paragraph position="1"> Palmer et al (2004b) present an hierarchical approach to verb senses, where different levels of sense distinctions, from PropBank Framesets to WordNet senses, form a continuum of granularity. At the intermediate level of sense hierarchy we are considering manual groupings of the SENSEVAL-2 verb senses (Palmer, et.al., 2004a), developed in a separate project. Given a large disagreement rate between annotators (average inter-annotator agreement rate for Senseval-2 verbs was only 71%), verbs were grouped by two or more people into sets of closely related senses, with grouping differences being reconciled, and the sense groups were used for coarse-grained scoring of the systems. These groupings of WordNet senses were shown to reconcile a substantial portion of the manual and automatic tagging disagreements, showing that many of these disagreements are fairly subtle. Using the groups as a more coarse-grained set of sense distinctions improved ITA and system scores by almost 10%, to 82% and 69%, respectively (Palmer, et. al. 2004a).</Paragraph> <Paragraph position="2"> We have been investigating whether or not the groups can provide an intermediate level of hierarchy in between the PropBank Framesets and the WN 1.7 senses.</Paragraph> <Paragraph position="3"> Based on our existing WN 1.7 tags and Frameset tags of the Senseval2 verbs in the Penn Treebank, 95% of the verb instances map directly from sense groups to Framesets, with each Frameset typically corresponding to two or more sense groups. Using the PropBank coarse-grained senses as a starting place, and WordNet sense tagging for over 1000 verbs produced automatically through mapping VerbNet to PropBank (Kipper, et. al., 2004), we have the makings of a large scale tagging experiment on the Penn Treebank. This will enable investigations into the applicability of clearly defined criteria for sense distinctions at varying levels of granularity, and produce a large, 1M word corpus of sense-tagged text for training WSD systems The hierarchical approach to verb senses, as utilized by most standard dictionaries as well as Hector (Atkins, '93), and as applied to SENSEVAL-2, presents obvious advantages for the problem of Word Sense Disambiguation. The human annotation task is simplified, since there are fewer choices at each level and clearer distinctions between them. The automated systems can combine training data from closely related senses to overcome the sparse data problem, and both humans and systems can back off to a more coarse-grained choice when fine-grained choices prove too difficult.</Paragraph> <Paragraph position="4"> Conclusion This paper has presented an overview of the second phase of PropBank Annotation, PropBank II, which is being applied to English and Chinese. It includes (Neodavidsonian) eventuality variables, nominal references, an hierarchical approach to sense tagging, and connections to the Penn Discourse Treebank (PDTB), a project for annotating discourse connectives and their arguments. null</Paragraph> </Section> class="xml-element"></Paper>