File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1707_metho.xml
Size: 20,046 bytes
Last Modified: 2025-10-06 14:08:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1707"> <Title>Annotating the Propositions in the Penn Chinese Treebank</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Annotation Model </SectionTitle> <Paragraph position="0"> In this section we describe a model that annotates the predicate-argument structure of Chinese predicates. This model captures the lexical regularities by assuming that different instances of a predicate, usually a verb, have the same predicate argument structure if they have the same sense. Defining sense has been one of the most thorny issues in natural language research (Ide and Vronis, 1998), and the term &quot;sense&quot; has been used to mean different things, ranging from part-of-speech and homophones, which are easier to define, to slippery fine-grained semantic distinctions that are hard to make consistently. Determining the &quot;right&quot; level of sense distinction for natural language applications is ultimately an empirical issue, with the best level of sense distinction being the level with the least granularity and yet sufficient for a natural language application in question. Without gearing towards one particular application, our strategy is to use the structural regularities demonstrated in Section 1 to define sense. Finer sense distinctions without clear structural indications are avoided. All instances of a predicate that realize the same set of semantic roles are assumed to have one sense, with the understanding that not all of the semantic roles for this verb sense have to be realized in a given verb instance, and that the same semantic role may be realized in different syntactic positions. All the possible syntactic realizations of the same set of semantic roles for a verb sense are then alternations of one another. This state of affairs has been characterized as diathesis alternation and used to establish cross-predicate generalizations and classifications (Levin, 1993). It has been hypothesized and demonstrated that verbs sharing the same disthesis alternation patterns also have similar meaning postulates. It is equally plausible to assume then that verb instances having different diathesis alternation patterns also have different semantic properties and thus different senses.</Paragraph> <Paragraph position="1"> Using diathesis alternation patterns as a diagnostic test, we can identify the different senses for a verb. Alternating syntactic frames for a particular verb sense realizing the same set of semantic roles (we call this roleset) form a frameset and share similar semantic properties. It is easy to see that each frameset, a set of syntactic frames for a verb, corresponds with one roleset and vice versa. From now on, we use the term frameset instead of sense for clarity. Each frameset consists of one or more syntactic frames and each syntactic frame realizes one or more semantic roles. One frame differs from another in the number and type of arguments its predicate actually takes, and one frameset differs from another in the total number and type of arguments its predicate CAN take. This is illustrated graphically in Figure 1.</Paragraph> <Paragraph position="2"> Annotating the predicate-argument structure involves mapping the frameset identification information for a predicate to an actual predicate instance in the corpus and assign the semantic roles to its arguments based on the syntactic frame of that predicate instance. It is hoped that since framesets are defined through diathesis alternation of syntactic frames, the distinctions made are still structural in nature and thus are machine-learnable and can be consistently annotated by human annotators.</Paragraph> <Paragraph position="3"> So far our discussion has focused on semantic ar-</Paragraph> <Paragraph position="5"> guments, which play a central role in determining the syntactic frames and framesets. There are other elements in a proposition: semantic adjuncts. Compared with semantic arguments, semantic adjuncts do not play a role in defining the syntactic frames or framesets because they occur in a wide variety of predicates and as a result are not as discriminative as semantic arguments. On the other hand, since they can co-occur with a wide variety of predicates, they are more generalizable and classifiable than semantic arguments. In the next section, we will describe a representation scheme that captures this dichotomy.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Representing arguments and adjuncts </SectionTitle> <Paragraph position="0"> Since the number and type of semantic arguments for a predicate are unique and thus define the semantic roles for a predicate, we label the arguments for a predicate with a contiguous sequence of integers, in the form of argN, where a0 is the integer between 0 and 5. Generally, a predicate has fewer than 6 arguments. Since semantic adjuncts are not subcategorized for by the predicate, we use one label argM for all semantic adjuncts. ArgN identifies the arguments while argM identifies all adjuncts. An argN uniquely identifies an argument of a predicate even if it occupies different syntactic positions in different predicate instances. Missing arguments of a predicate instance can be inferred by noting the missing argument labels.</Paragraph> <Paragraph position="1"> Additionally, we also use secondary tags to generalize and classify the semantic arguments and adjuncts when possible. For example, an adjunct receiving a a1a3a2a5a4 tag if it is a temporal adjunct. The secondary tags are reserved for semantic adjuncts, predicates that serve as arguments, as well as certain arguments for phrasal verbs. The 18 secondary tags and their descriptions are presented in Table 1.</Paragraph> <Paragraph position="2"> 11 functional tags for semantic adjuncts ADV adverbial, default tag</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Complications </SectionTitle> <Paragraph position="0"> In this section we discuss several complications in annotating the predicate-argument structure as described in Section 2. Specifically, we discuss the phenomenon of &quot;split arguments&quot; and the annotation of nominalized verbs (or deverbal nouns).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Split Arguments </SectionTitle> <Paragraph position="0"> What can be characterized as &quot;split arguments&quot; are cases where a constituent that occurs as one argument in one sentence can also be realized as multiple arguments (generally two) for the same predicate in another sentence, without causing changes in the meaning of the sentences. This phenomenon surfaces in several different constructions. One such construction involves &quot;possessor raising&quot;, where the possessor (in a broad sense) raises to a higher position. Examples 1a and 1b illustrate this. In 1a, the possessor originates from the subject position and raises to the topic1 position, while in 1b, the possessor originates from the object position and raises subject. The topic is higher than the subject and plays an important role in the sentence (Li and Thompson, 1976).</Paragraph> <Paragraph position="1"> to the subject position. The exact syntactic analysis is not important here, and what is important is that one argument in one sentence becomes two in another. The challenge is then to capture this regularity when annotating the predicate-argument structure of the verb.</Paragraph> <Paragraph position="2"> ordinated noun phrases. In 2a, for example, the co-ordinated structure as a whole is an argument to the verb &quot; /sign&quot;. In contrast, in 2b, one piece of the argument, &quot; /China&quot; is realized as a noun phrase introduced by a preposition. There is no apparent difference in meaning for the two sentences.</Paragraph> <Paragraph position="3"> arg1: /border /trade /agreement There are two ways to capture this type of regularity. One way is to treat each piece as a separate argument. The problem is that for coordinated noun phrases, there can be arbitrarily many coordinated constituents. So we adopt the alternative approach of representing the entire constituent as one argument. When the pieces are separate constituents, they will receive the same argument label, with different secondary tags indicating they are parts of a larger constituent. For example, in 1, when possessor raising occurs, the possessor and possessee receive the same argument label with different secondary tags psr and pse. In 2b, both &quot; /China&quot; and &quot; /Burma&quot; receive the label arg0, and the secondary label crd indicates each one is a part of the coordinated constituent.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Nominalizations </SectionTitle> <Paragraph position="0"> Another complication involves nominalizations (or deverbal nouns) and their co-occurrence with light and not-so-light verbs. A nominalized verb, while serving as an argument to another predicate (generally a verb), also has its own predicate-argument structure. For example, in 3, the predicate-argument structure for &quot; /doubt&quot; should be &quot; ( , )&quot;, where all the arguments of &quot; /doubt&quot; are embedded in the NP headed by &quot; /doubt&quot;.</Paragraph> <Paragraph position="1"> The complication arises when the nominalized noun is a complement to another verb, as in 4, where the subject &quot; /reader&quot; is an argument to both the verb &quot; /produce&quot; and the nominalized verb &quot; /doubt&quot;. More interestingly, the other argument &quot; /this /CL /news&quot; is realized as an adjunct to the verb (introduced by a preposition) even though it bears no apparent thematic relationship to it.</Paragraph> <Paragraph position="2"> It might be tempting to treat the verb &quot; /develop&quot; as a &quot;light verb&quot; that does not have its own predicate-argument structure, but this is questionable because &quot; /doubt&quot; can also take a noun that is not a nominalized verb: &quot; /I /towards /she /develop /LE /feeling&quot;. In addition, there is no apparent difference in meaning for &quot; /develop&quot; between this sentence and 4, so there is little basis to say these are two different senses of this verb. So we annotate the predicate-argument structure of both the verb &quot; ( , )&quot; and the</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Implementation </SectionTitle> <Paragraph position="0"> To implement the annotation model presented in Section 2, we create a lexical database. Each entry is a predicate listed with its framesets. The set of possible semantic roles for each frameset are also listed with a mnemonic explanation. This explanation is not part of the formal annotation. It is there to help human annotators understand the different semantic roles of this frameset. An annotated example is also provided to help the human annotator.</Paragraph> <Paragraph position="1"> As illustrated in Example 5, the verb &quot; /pass&quot; has three framesets, and each frameset corresponds with a different meaning. The different meanings can be diagnosed with diathesis alternations. For example, when &quot; /pass&quot; means &quot;pass through&quot;, it allows dropped object. That is, the object does not have to be syntactically realized. When it means &quot;pass by vote&quot;, it also has an intransitive use. However, in this case, the verb demonstrates &quot;subject of the intransitive / object of the transitive&quot; alternation. That is, the subject in the intransitive use refers to the same entity as the object in the transitive use.</Paragraph> <Paragraph position="2"> When the verb means &quot;pass an exam, test, inspection&quot;, there is also the transitive/intransitive alternation. Only in this case, the object of the transitive counterpart is now part of the subject in the intransitive use. This is the argument-split problem discussed in the last section. The three framesets, representing three senses, are illustrated in 5.</Paragraph> <Paragraph position="3"> The human annotator can use the information specified in this entry to annotate all instances of &quot; /pass&quot; in a corpus. When annotating a predicate instance, the annotator first determines the syntactic frame of the predicate instance, and then determine which frameset this frame instantiates. The frameset identification is then attached to this predicate instance. This can be broadly construed as &quot;sensetagging&quot;, except that this type of sense tagging is coarser, and the &quot;senses&quot; are based on structural distinctions rather than just semantic nuances. A distinction is made only when the semantic distinctions also coincide with some structural distinctions. The expectation is that this type of sense tagging is much amenable to automatic machine-learning approaches. The annotation does not stop here. The annotator will go on identifying the arguments and adjuncts for this predicate instance. For the arguments, the annotator will determine which semantic role each argument realizes, based on the set of possible roles for this frameset, and attach the appropriate semantic role label (argN) to it. For adjuncts, the annotator will determine the type of adjunct this is and attach a secondary tag to argM.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Applications </SectionTitle> <Paragraph position="0"> A resource annotated with predicate-argument structure can be used for a variety of natural language applications. For example, this level of abstraction is useful for Information Extraction. The argument role labels can be easily mapped to an Information Extraction template, where each role is mapped to a piece of information that an IE system is interested in. Such mapping will not be as straightforward if it is between surface syntactic entities such as the subject and IE templates.</Paragraph> <Paragraph position="1"> This level of abstraction can also provide a platform where lexical transfer can take place. It opens up the possibility of linking a frameset of a predicate in one language with that of another, rather than using bilingual (or multilingual) dictionaries where one word is translated into one or more words in a different language. This type of lexical transfer has several advantages. One is that the transfer is made more precise, in the sense that there will be more cases where one-to-one mapping is possible. Even in cases where one-to-one mapping is still not possible, the identification of the framesets of a predicate will narrow down the possible lexical choices. For example, sign.02 in the English Proposition Bank (Kingsbury and Palmer, 2002) will be linked to &quot; .01/enter into an agreement&quot;. This type of linking rules out &quot; &quot; as a possible translation for sign.02, even though it is a translation for other framesets of the word sign.</Paragraph> <Paragraph position="2"> The transfer will also be more precise in another sense, that is, the predicate-argument structure of a word instance will be preserved during the transfer process. Knowing the arguments of a predicate instance can further constrain the lexical choices and rule out translation candidates whose predicate-argument structures are incompatible. For example, if the realized arguments of &quot;sign.01&quot; of the English Proposition Bank in a given sentence are the signer, the document, and the signature, among the translation candidates &quot; , &quot; (&quot; .01/enter into an agreement&quot; is ruled out as a possibility for this frameset), only &quot; &quot; is possible, because &quot; &quot; can only take two arguments, namely, the signer and the document.</Paragraph> <Paragraph position="3"> 6. /he /at /this /CL /document /LC /sign /LE /self /DE /name &quot;He signed his name on this document.&quot; One might argue that the syntactic subcategorization frame obtained from the syntactic parse tree can also constrain the lexical choices. For example, knowing that &quot;sign&quot; has a subject, an object and a prepositional phrase should be enough to rule out &quot; &quot; as a possible translation. This argument breaks down when there are lexical divergences.</Paragraph> <Paragraph position="4"> The &quot;document&quot; argument of &quot; &quot; can only be realized as a prepositional phrase in Chinese while in English it can only be realized the direct object of &quot;sign&quot;. If the syntactic subcategorization frame is used to constrain the lexical choices for &quot;sign&quot;, &quot; &quot; will be incorrectly ruled out as a possible translation. There will be no such problem if the more abstract predicate-argument structure is used for this purpose. Even when the document is realized as a prepositional phrase, it is still the same argument. Of course, &quot; /sign&quot; is also a possible translation. So compared with the surface syntactic frames, the predicate-argument structure constrains the lexical choices without incorrectly ruling out legitimate translation candidates. This is understandable because the predicate-structure abstracts away from the syntactic idiosyncracies of the different languages and thus are more transferable across languages.</Paragraph> <Paragraph position="5"> 7. /he /at /this /CL /document /LC /sign /he /sign /this /CL /document &quot;He signed this document.&quot; Annotating the predicate-argument structure as described in previous sections will not reduce the lexical choices to one-to-one mappings in call cases.</Paragraph> <Paragraph position="6"> For example, &quot; &quot; can be translated into &quot;standardize&quot; or &quot;unite&quot;, even though there is only one frameset for both finer senses of this verb. It is conceivable that one might want to posit two framesets, each for one finer sense of this verb. This is essentially a trade-off: either one can conduct deep analysis of the source language, resolve all sense ambiguities on the source side and have a more straightforward mapping, or one takes the one-to-many mappings and select the correct translation on the target language side. Hopefully, the annotation of the predicate-argument provides just the right level of abstraction and the resource described here, with each predicate annotated with its arguments and adjuncts in context, enables the automatic acquisition of the predicate-argument structure.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Summary </SectionTitle> <Paragraph position="0"> In this paper, we described an approach to annotate the propositions in the Penn Chinese Treebank. We described how diathesis alternation patterns can be used to make coarse sense distinctions for Chinese verbs as a necessary step in annotating the predicate-structure of predicates. We also described the representation scheme we use to label the semantic arguments and adjuncts of the predicates. We discussed several complications for this type of annotation and described our solutions. We then discussed how a lexical database with predicate-argument structure information can be used to ensure consistent annotation. Finally, we discussed possible applications for this resource.</Paragraph> </Section> class="xml-element"></Paper>