File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0623_metho.xml
Size: 8,317 bytes
Last Modified: 2025-10-06 14:09:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0623"> <Title>A Joint Model for Semantic Role Labeling</Title> <Section position="4" start_page="0" end_page="173" type="metho"> <SectionTitle> 2 Local Models </SectionTitle> <Paragraph position="0"> Our local model labels nodes in a parse tree independently. We decompose the probability over labels (all argument labels plus NONE), into a product of the probability over ARG and NONE, and a probability over argument labels given that a node is an ARG. This can be seen as chaining an identification and a classification model. The identification model classifies each phrase as either an argument or non-argument and our classification model labels each potential argument with a specific argument label.</Paragraph> <Paragraph position="1"> The two models use the same features.</Paragraph> <Paragraph position="2"> Previous research (Gildea and Jurafsky, 2002; Pradhan et al., 2004; Carreras and M`arquez, 2004) has identified many useful features for local identification and classification. Below we list the features and hand-picked conjunctions of features used in our local models. The ones denoted with asterisks (*) were not present in (Toutanova et al., 2005). Although most of these features have been described in previous work, some features, described in the next section, are - to our knowledge - novel.</Paragraph> <Section position="1" start_page="173" end_page="173" type="sub_section"> <SectionTitle> 2.1 Additional Local Features </SectionTitle> <Paragraph position="0"> We found that a large source of errors for A0and A1 stemmed from cases such as those illustrated in Figure 1, where arguments were dislocated by raising or controlling verbs. Here, the predicate, expected, does not have a subject in the typical position - indicated by the empty NP - since the auxiliary is has raised the subject to its current position. In order to capture this class of examples, we use a binary feature, Missing Subject, indicating whether the predicate is &quot;missing&quot; its subject, and use this feature in conjunction with the Path feature, so that we learn typical paths to raised subjects conditioned on the absence of the subject in its typical position.</Paragraph> <Paragraph position="1"> In the particular case of Figure 1, there is another instance of an argument being quite far from pect is a raising verb, widen's subject is not in its typical position either, and we should expect to find it in the same positions as expected's subject. This indicates it may be useful to use the path relative to expected to find arguments for widen. In general, to identify certain arguments of predicates embedded in auxiliary and infinitival VPs we expect it to be helpful to take the path from the maximum extended projection of the predicate - the highest VP in the chain of VP's dominating the predicate. We introduce a new path feature, Projected Path, which takes the path from the maximal extended projection to an argument node. This feature applies only when the argument is not dominated by the maximal projection, (e.g., direct objects). These features also handle other cases of discontinuous and non-local dependencies, such as those arising due to controller verbs. For a local model, these new features and their conjunctions improved F1-Measure from 73.80 to 74.52 on the development set. Notably, the F1-Measure of A0 increased from 81.02 to 83.08.</Paragraph> </Section> </Section> <Section position="5" start_page="173" end_page="174" type="metho"> <SectionTitle> 3 Joint Model </SectionTitle> <Paragraph position="0"> Our joint model, in contrast to the local model, collectively scores a labeling of all nodes in the parse tree. The model is trained to re-rank a set of N likely labelings according to the local model. We find the exact top N consistent1 most likely local model labelings using a simple dynamic program described in (Toutanova et al., 2005).</Paragraph> <Paragraph position="1"> Most of the features we use are described in more detail in (Toutanova et al., 2005). Here we briefly describe these features and introduce several new joint features (denoted by *). A labeling L of all nodes in the parse tree specifies a candidate argument frame - the sequence of all nodes labeled with a non-NONE label according to L. The joint model features operate on candidate argument frames, and look at the labels and internal features of the candidate arguments. We introduce them in the context of the example in Figure 2. The candidate argument frame corresponding to the correct labeling for the tree is: [NP1-A1,VBD-V,PP1-A3,PP2-A4,NP2-AM-TMP].</Paragraph> <Paragraph position="2"> * Core arguments label sequence: The sequence of labels of core arguments concatenated with the predicate voice. Example: [voice:active: A1,V,A3,A4] A back-off feature which substitutes specific argument labels with a generic argument (A) label is also included.</Paragraph> <Paragraph position="3"> arguments together with annotated phrase types.</Paragraph> <Paragraph position="4"> Phrase types are annotated with the head word for PP nodes, and with the head POS tag for S and VP nodes. Example: [voice:active: NP-A1,V,PP-to-A3,PP-from-A4]. A back-off to generic A labels is also included. Also a variant that adds the predicate stem.</Paragraph> <Paragraph position="5"> * Repeated core argument labels with phrase types: Annotated phrase types for nodes with the same core argument label. This feature captures, for example, the tendency of WHNP referring phrases to occur as the second phrase having the same label as a preceding NP phrase.</Paragraph> <Paragraph position="6"> * Repeated core argument labels with phrase types and sister/adjacency information*: Similar to the previous feature, but also indicates whether all repeated arguments are sisters in the parse tree, or whether all repeated arguments are adjacent in terms of word spans. These features can provide robustness to parser errors, making it more likely to label adjacent phrases incorrectly split by the parser with the same label.</Paragraph> </Section> <Section position="6" start_page="174" end_page="174" type="metho"> <SectionTitle> 4 Combining Local and Joint Models </SectionTitle> <Paragraph position="0"> It is useful to combine the joint model score with a local model score, because the local model has been trained using all negative examples, whereas the joint model has been trained only on likely argument frames . Our final score is given by a mixture of the local and joint model's logprobabilities: scoreSRL(L|t) = a scorelscript(L|t) + scoreJ(L|t), where scorelscript(L|t) is the local score of L, scoreJ(L|t) is the corresponding joint score, and a is a tunable parameter. We search among the top N candidate labelings proposed by the local model, for the labeling that maximizes the final score.</Paragraph> </Section> <Section position="7" start_page="174" end_page="175" type="metho"> <SectionTitle> 5 Increasing Robustness to Parser Errors </SectionTitle> <Paragraph position="0"> It is apparent that role labeling is very sensitive to the correctness of the given parse tree. If an argument does not correspond to a constituent in a parse tree, our model will not be able to consider the correct phrase.</Paragraph> <Paragraph position="1"> One way to address this problem is to utilize alternative parses. Recent releases of the Charniak parser (Charniak, 2000) have included an option to provide the top k parses of a given sentence according to the probability model of the parser. We use these alternative parses as follow: Suppose t1,... ,tk are trees for sentence s with given probabilities P(ti|s) by the parser. Then for a fixed predicate v, let Li on the WSJ test (bottom) on the closed track of the CoNLL shared task.</Paragraph> <Paragraph position="2"> denote the best joint labeling of tree ti, with score scoreSRL(Li|ti) according to our final joint model. Then we choose the labeling L which maximizes: arg max</Paragraph> <Paragraph position="4"> Considering top k = 5 parse trees using this algorithm resulted in up to 0.4 absolute increase in F-Measure. In future work, we plan to experiment with better ways to combine information from multiple parse trees.</Paragraph> </Section> class="xml-element"></Paper>