File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-3213_metho.xml
Size: 19,791 bytes
Last Modified: 2025-10-06 14:09:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3213"> <Title>Unsupervised Semantic Role Labelling</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Determining Slots and Role Sets </SectionTitle> <Paragraph position="0"> Previous work has divided the semantic role labelling task into the identification of the arguments to be labelled, and the tagging of each argument with a role (Gildea and Jurafsky, 2002; Fleischman et al., 2003). Our algorithm addresses both these steps. Also, the unsupervised nature of the approach highlights an intermediate step of determining the set of possible roles for each argument. Because we need to constrain the role set as much as possible, and cannot draw on extensive training data, this latter step takes on greater significance in our work.</Paragraph> <Paragraph position="1"> We first describe the lexicon that specifies the syntactic arguments and possible roles for the verbs, and then discuss our process of argument and role set identification.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 The Verb Lexicon </SectionTitle> <Paragraph position="0"> In semantic role labelling, a lexicon is used which lists the possible roles for each syntactic argument of each predicate. Supervised approaches to this task have thus far used the predicate lexicon of FrameNet, or the verb lexicon of PropBank, since each has an associated labelled corpus for training. We instead make use of VerbNet (Kipper et al., 2000), a manually developed hierarchical verb lexicon based on the verb classification of Levin (1993). For each of 191 verb classes, including around 3000 verbs in total, VerbNet specifies the syntactic frames along with the semantic role assigned to each slot of a frame. Throughout the paper we use the term &quot;frame&quot; to refer to a syntactic frame--the set of syntactic arguments of a verb--possibly labelled with roles, as exemplified in the VerbNet entry in Table 1.</Paragraph> <Paragraph position="1"> While FrameNet uses semantic roles specific to a particular situation (such as Speaker, Message, a verb (such as Arg0, Arg1, Arg2), VerbNet uses an intermediate level of thematic roles (such as Agent, Theme, Recipient). These general thematic roles are commonly assumed in linguistic theory, and have some advantages in terms of capturing commonalities of argument relations across a wide range of predicates. It is worth noting that although there are fewer of these thematic roles than the more situation-specific roles of FrameNet, the role labelling task is not necessarily easier: there may be more data per role, but possibly less discriminating data, since each role applies to more general relations. (Indeed, in comparing the use of FrameNet roles to general thematic roles, Gildea and Jurafsky (2002) found very little difference in performance.)</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Frame Matching </SectionTitle> <Paragraph position="0"> We devise a frame matching procedure that uses the verb lexicon to determine, for each instance of a verb, the argument slots and their possible thematic roles. The potential argument slots are subject, object, indirect object, and PP-object, where the latter is specialized by the individual preposition.1 Given chunked sentences with our verbs, the frame matcher uses VerbNet both to restrict the list of candidate roles for each slot, and to eliminate some of the PP slots that are likely not arguments.</Paragraph> <Paragraph position="1"> To initialize the candidate roles precisely, we only choose roles from frames in the verb's lexical entry (cf. Table 1) that are the best syntactic matches with the chunker output. We align the slots of each frame with the chunked slots, and compute the portion %Frame of frame slots that can be mapped to a chunked slot, and the portion %Chunks of chunked slots that can be mapped to the frame. The score for each frame is computed by %Framea0 %Chunks, and only frames having the highest score contribute candidate roles to the chunked slots. An example 1As in VerbNet, we assume that when a verb takes a PP argument, the slot receiving the thematic role from the verb is the NP object of the preposition. Also, VerbNet has few verbs that take sentence complements, and for now we do not consider them.</Paragraph> <Paragraph position="2"> scoring is shown in Table 2.</Paragraph> <Paragraph position="3"> This frame matching step is very restrictive and greatly reduces potential role ambiguity. Many syntactic slots receive only a single candidate role, providing the initial unambiguous data for our bootstrapping algorithm. Some slots receive no candidate roles, which is an error for argument slots but which is correct for adjuncts. The reduction of candidate roles in general is very helpful in lightening the load on the probability model, but note that it may also cause the correct role to be omitted. In future work, we plan to explore other possible methods of selecting roles from the frames, such as choosing candidates from all frames, or setting a threshold value on the matching score.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The Probability Model </SectionTitle> <Paragraph position="0"> Once slots are initialized as above, our algorithm uses an iteratively updated probability model for role labelling. The probability model predicts the role for a slot given certain conditioning information. We use a backoff approach with three levels of specificity of probabilities. If a candidate role fails to meet the threshold of evidence (counts towards that probability) for a given level, we backoff to the next level. For any given slot, we use the most specific level that reaches the evidence threshold for any of the candidates. We only use information at a single level to compare candidates for a single slot.</Paragraph> <Paragraph position="1"> We assume the probability of a role for a slot is independent of other slots; we do not ensure a consistent role assignment across an instance of a verb.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The Backoff Levels </SectionTitle> <Paragraph position="0"> Our most specific probability uses the exact combination of verb, slot, and noun filling that slot, yielding a39 a40 a50 a42a57a31a53a1a0a52a53a3a2 a58 .2 2We use only the head noun of potential arguments, not the full NP, in our probability model. Our combination of slot plus head word provides similar information (head of argument and its syntactic relation to the verb) to that captured by the features of Gildea and Jurafsky (2002) or Thompson et al. (2003).</Paragraph> <Paragraph position="1"> For our first backoff level, we introduce a novel way to generalize over the verb, slot, and noun information of a39 a40 a50 a42a57a31a53a1a0a52a53a3a2 a58 . Here we use a linear interpolation of three probabilities, each of which: (1) drops one source of conditioning information from the most specific probability, and (2) generalizes a second source of conditioning information to a class-based conditioning event. Specifically, we use the following probability formula:</Paragraph> <Paragraph position="3"> where a0a15a7 is slot class, a2a11a7 is noun class, a57a16a7 is verb class, and the individual probabilities are (currently) equally weighted (i.e., all a4 a13 's have a value of a17a15a18a20a19 ).</Paragraph> <Paragraph position="4"> Note that all three component probabilities make use of the verb or its class information. In a39 a5 , the noun component is dropped, and the slot is generalized to the appropriate slot class. In a39 a9 , the slot component is dropped, and the noun is generalized to the appropriate noun class. Although it may seem counterintuitive to drop the slot, this helps us capture generalizations over &quot;alternations,&quot; in which the same semantic argument may appear in different syntactic slots (as in The ice melted and The sun melted the ice). In a39 a12 , again the noun component is dropped, but in this case the verb is generalized to the appropriate verb class. Each type of class is described in the following subsection.</Paragraph> <Paragraph position="5"> The last backoff level simply uses the probability of the role given the slot class, a39a41a40 a50 a42 a0a8a7a10a58 . The backoff model is summarized in Figure 1. We use maximum likelihood estimates (MLE) for each of the probability formulas.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Classes of Information </SectionTitle> <Paragraph position="0"> For slots, true generalization to a class only occurs for the prepositional slots, all of which are mapped to a single PP slot class. All other slots-subject, object, and indirect object--each form their own singleton slot class. Thus, a39 a5 differs from a39a41a40 a50 a42a57 a53a1a0a47a53a3a2 a58 by dropping the noun, and by treating all prepositional slots as the same slot. This formula allows us to generalize over a slot regardless of the</Paragraph> <Paragraph position="2"> particular noun, and preposition if there is one, used in the instance.</Paragraph> <Paragraph position="3"> Classes of nouns in the model are given by the WordNet hierarchy. Determining the appropriate level of generalization for a noun is an open problem (e.g., Clark and Weir, 2002). Currently, we use a cut through WordNet including all the top categories, except for the category &quot;entity&quot;; the latter, because of its generality, is replaced in the cut by its immediate children (Schulte im Walde, 2003). Given a noun argument, all of its ancestors that appear in this cut are used as the class(es) for the noun. (Credit for a noun is apportioned equally across multiple classes.) Unknown words placed in a separate category. This yields a noun classification system that is very coarse and that does not distinguish between senses, but which is simple and computationally feasible. a39 a9 thus captures consistent relations between a verb and a class of nouns, regardless of the slot in which the noun occurs.</Paragraph> <Paragraph position="4"> Verb classes have been shown to be very important in capturing generalizations across verb behaviour in computational systems (e.g., Palmer, 2000; Merlo and Stevenson, 2001). In semantic role labelling using VerbNet, they are particularly relevant since the classes are based on a commonality of role-labelled syntactic frames (Kipper et al., 2000). The class of a verb in our model is its VerbNet class that is compatible with the current frame. When multiple classes are compatible, we apportion the counts uniformly among them. For probability a12 , then, we generalize over all verbs in a class of the target verb, giving us much more extensive data over relevant role assignments to a particular slot.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The Bootstrapping Algorithm </SectionTitle> <Paragraph position="0"> We have described the frame matcher that produces a set of slots with candidate role lists (some unambiguous), and our backoff probability model. All that remains is to specify the parameters that guide the iterative use of the probability model to assign roles.</Paragraph> <Paragraph position="1"> The evidence count for each of the conditional probabilities refers to the number of times we have observed the conjunction of its conditioning events. For example, for a39 a40 a50 a42a57a31a53a1a0a52a53a3a2 a58 , this is the number of times the particular combination of verb, slot, and noun have been observed. For a probability to be used, its evidence count must reach a given threshold, a46a49a48 a2a21a20 a57a47a48a19a44 a51 a2a11a7 a51 . The &quot;goodness&quot; of a role assignment is determined by taking the log of the ratio between the probabilities of the top two candidates for a slot (when the evidence of both meet a46a49a48 a2a21a20 a57a45a48a19a44a45a51 a2a11a7 a51 ) (e.g., Hindle and Rooth, 1993). A role is only assigned if the log likelihood ratio is defined and meets a threshold; in this case, the candidate role with highest probability is assigned to the slot.</Paragraph> <Paragraph position="2"> (Note that in the current implementation, we do not allow re-labelling: an assigned label is fixed.) In the algorithm, the log ratio threshold is initially set high and gradually reduced until it reaches 0. In the case of remaining ties, we assign the role for which a39a41a40 a50 a42 a0a8a7 a58 is highest.</Paragraph> <Paragraph position="3"> Because our evidence count and log ratio restrictions may not be met even when we have a very good candidate for a slot, we reduce the evidence count threshold to the minimum value of 1 when the log ratio threshold reaches 1.3 By this point, we assume competitor candidates have been given sufficient opportunity to amass the relevant counts.</Paragraph> <Paragraph position="4"> Algorithm 1 shows the bootstrapping algorithm.</Paragraph> <Paragraph position="5"> belled, along with their candidate lists of roles.</Paragraph> <Paragraph position="6"> 2: Let a22 be the set of annotated slots; a22a24a23a24a25 . Let a26 be the set of unannotated slots, initially all slots. Let a27 be the set of newly annotated slots; a27a28a23a29a25 . 3: Add to a27 each slot whose role assignment is unambiguous--whose candidate list has one element.</Paragraph> <Paragraph position="7"> Set a26 to a26a31a30a24a27 and set a22 to a22a33a32a33a27 (where a30 and a32 remove/add elements of the second set from/to the first).</Paragraph> <Paragraph position="9"> (Re)compute the probability model, using counts over the items in a22 .</Paragraph> <Paragraph position="10"> Add to a27 all slots in a26 for which: -at least two candidates meet the evidence count threshold for a given probability level (see Figure 1); and -the log ratio between the two highest probability candidates meets the log ratio threshold.</Paragraph> <Paragraph position="11"> For each slot in a27 , assign the highest probability role. Set a26 to a26a34a30a35a27 and set a22 to a22a36a32a37a27 . until a27a38a23a24a25 Decrement the log ratio threshold.</Paragraph> <Paragraph position="12"> Adjust evidence count threshold if log ratio threshold is 1. until log ratio threshold = 0 Resolve ties and terminate.</Paragraph> <Paragraph position="13"> be assigned at this point--this occurs when only one of multiple candidates has evidence.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Materials and Methods </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Verbs, Verb Classes and Roles </SectionTitle> <Paragraph position="0"> For the initial set of experiments, we chose 54 target verbs from three top-level VerbNet classes: preparing-26.3, transfer mesg-37.1, and contribute13.2. We looked for classes that contained a large number of medium to high frequency verbs displaying a variety of interesting properties, such as having ambiguous (or unambiguous) semantic roles given certain syntactic constructions, or having ambiguous semantic role assignments that could (or alternatively, could not) be distinguished by knowledge of verb class.</Paragraph> <Paragraph position="1"> From the set of target verbs, we derived an extended verb set that comprises all of the original target verbs as well as any verb that shares a class with one of those target verbs. This gives us a set of 1159 verbs to observe in total, and increases the likelihood that some verb class information is available for each of the possible classes of the target verbs. Observing the entire extended set also provides more data for our probability estimators that do not use verb class information.</Paragraph> <Paragraph position="2"> We have made several changes to the semantic roles as given by VerbNet. First, selectional restrictions such as [+Animate] are removed since our coarse model of noun class does not allow us to reliably determine whether such restrictions are met.</Paragraph> <Paragraph position="3"> Second, a few semantic distinctions that are made in VerbNet appeared to be too fine-grained to capture, so we map these to a more coarse-grained sub-set of the VerbNet roles. For instance, the role Actor is merged with Agent, and Patient with Theme.</Paragraph> <Paragraph position="4"> We are left with a set of 16 roles: Agent, Amount, Attribute, Beneficiary, Cause, Destination, Experiencer, Instrument, Location, Material, Predicate, Recipient, Source, Stimulus, Theme, Time. Of these, 13 actually occur in our target verb classes.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 The Corpus and Preprocessing </SectionTitle> <Paragraph position="0"> Our corpus consists of a random selection of 20% of the sentences in the British National Corpus (BNC Reference Guide, 2000). This corpus is processed by the chunker of Abney (1991), from whose output we can identify the probable head words of verb arguments with some degree of error. For instance, distant subjects are often not found, and PPs identified as arguments are often adjuncts. To reduce the number of adjuncts, we ignore dates and any PPs that are not known to (possibly) introduce an argument to one of the verbs in our extended set.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.3 Validation and Test Data </SectionTitle> <Paragraph position="0"> We extracted two sets of sentences: a validation set consisting of 5 random examples of each target verb, and a test set, consisting of 10 random examples of each target verb. The data sets were chunked as above, and the role for each potential argument slot was labelled by two human annotators, choosing from the simplified role set allowed by each verb according to VerbNet. A slot could also be labelled as an adjunct, or as &quot;bad&quot; (incorrectly chunked). Agreement between the two annotators was high, yielding a kappa statistic of 0.83.</Paragraph> <Paragraph position="1"> After performing the labelling task individually, the annotators reconciled their responses (in consultation with a third annotator) to yield a set of human judgements used for evaluation.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.4 Setting the Bootstrapping Parameters </SectionTitle> <Paragraph position="0"> In our development experiments, we tried an evidence count threshold of either the mean or median over all counts of a particular conjunction of conditioning events. (For example, for a39 a40 a50 a42a57a31a53a1a0a52a53a3a2 a58 , this is the mean or median count across all combinations of verb, slot, and noun.) The more lenient median setting worked slightly better on the validation set, and was retained for our test experiments.</Paragraph> <Paragraph position="1"> We also experimented with initial starting values of 2, 3, and 8 for the log likelihood ratio threshold. An initial setting of 8 showed an improvement in performance, as lower values enabled too many early role assignments, so we used the value of 8 in our test experiments. In all experiments, a decrement of .5 was used to gradually reduce the log likelihood ratio threshold.</Paragraph> </Section> </Section> class="xml-element"></Paper>