File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0636_evalu.xml
Size: 6,878 bytes
Last Modified: 2025-10-06 13:59:31
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0636"> <Title>Joint Parsing and Semantic Role Labeling</Title> <Section position="6" start_page="225" end_page="227" type="evalu"> <SectionTitle> 4 Results and Discussion </SectionTitle> <Paragraph position="0"> In this section we present results on several reranking methods for joint parsing and semantic role la- null beling. Table 3 compares F1 on the development set of our different reranking methods. The first four rows in Table 3 are baseline systems. We present baselines using gold trees (row 1 in Table 3) and predicted trees (row 2). As shown in previous work, gold trees perform much better than predicted trees. We also report two cheating baselines to explore the maximum possible performance of a reranking system. First, we report SRL performance of ceiling parse trees (row 3), i.e., if the parse tree from the k-best list is chosen to be closest to the gold tree.</Paragraph> <Paragraph position="1"> This is the best expected performance of a parse reranking approach that maximizes parse F1. Second, we report SRL performance where the parse tree is selected to maximize SRL F1, computing using the gold frame (row 4). There is a significant gap both between parse-F1-reranked trees and SRL-F1-reranked trees, which shows promise for joint reranking. However, the gap between SRL-F1-reranked trees and gold parse trees indicates that reranking of parse lists cannot by itself completely close the gap in SRL performance between gold and predicted parse trees.</Paragraph> <Section position="1" start_page="226" end_page="226" type="sub_section"> <SectionTitle> 4.1 Reranking based on score combination </SectionTitle> <Paragraph position="0"> Equation 1 suggests a straightforward method for reranking: simply pick the parse tree from the k-best list that maximizes p(F,t|x), in other words, add the log probabilities from the parser and the base SRL system. More generally, we consider weighting the individual probabilities as</Paragraph> <Paragraph position="2"> Such a weighted combination is often used in the speech community to combine acoustic and language models.</Paragraph> <Paragraph position="3"> This reranking method performs poorly, however. No choice of a performs better than a = 1, i.e., choosing the 1-best predicted parse tree. Indeed, the more weight given to the SRL score, the worse the combined system performs. The problem is that often a bad parse tree has many nodes which are obviously not constituents: thus p(F|t,x) for such a bad tree is very high, and therefore not reliable. As more weight is given to the SRL score, the unlabeled recall drops, from 55% when a = 0 to 71% when a = 1. Most of the decrease in F1 is due to the drop in unlabeled recall.</Paragraph> </Section> <Section position="2" start_page="226" end_page="227" type="sub_section"> <SectionTitle> 4.2 Training a reranker using global features </SectionTitle> <Paragraph position="0"> One potential solution to this problem is to add features of the entire frame, for example, to vote against predicted frames that are missing key arguments. But such features depend globally on the entire frame, and cannot be represented by local classifiers. One way to train these global features is to learn a linear classifier that selects a parse / frame pair from the ranked list, in the manner of Collins (2000). Reranking has previously been applied to semantic role labeling by Toutanova et al. (2005), from which we use several features. The difference between this paper and Toutanova et al. is that instead of reranking k-best SRL frames of a single parse tree, we are reranking 1-best SRL frames from the k-best parse trees.</Paragraph> <Paragraph position="1"> Because of the the computational expense of training on k-best parse tree lists for each of 30,000 sentences, we train the reranker only on sections 15-18 of the Treebank (the same subset used in previous CoNLL competitions). We train the reranker using LogLoss, rather than the boosting loss used by Collins. We also restrict the reranker to consider only the top 25 parse trees.</Paragraph> <Paragraph position="2"> This globally-trained reranker uses all of the features from the local model, and the following global features: (a) sequence features, i.e., the linear sequence of argument labels in the sentence (e.g.</Paragraph> <Paragraph position="3"> A0_V_A1), (b) the log probability of the parse tree, (c) has-arg features, that is, for each argument type a binary feature indicating whether it appears in the frame, (d) the conjunction of the predicate and has-arg feature, and (e) the number of nodes in the tree classified as each argument type.</Paragraph> <Paragraph position="4"> The results of this system on the development set are given in Table 3 (row 6). Although this performs better than the score combination method, it is still no better than simply taking the 1-best parse tree.</Paragraph> <Paragraph position="5"> This may be due to the limited training set we used in the reranking model. A base SRL model trained only on sections 15-18 has 61.26 F1, so in comparison, reranking provides a modest improvement.</Paragraph> <Paragraph position="6"> This system is the one that we submitted as our official submission. The results on the test sets are given in Table 4.</Paragraph> <Paragraph position="7"> 5 Summing over parse trees In this section, we sketch a different approach to joint SRL and parsing that does not use reranking at all. Maximizing over parse trees can mean that poor parse trees can be selected if their semantic labeling has an erroneously high score. But we are not actually interested in selecting a good parse tree; all we want is a good semantic frame.</Paragraph> <Paragraph position="8"> This means that we should select the semantic frame the WSJ test (bottom).</Paragraph> <Paragraph position="9"> that maximizes the posterior probability: p(F|x) =summationtext t p(F|t,x)p(t|x). That is, we should be sum-ming over the parse trees instead of maximizing over them. The practical advantage of this approach is that even if one seemingly-good parse tree does not have a constituent for a semantic argument, many other parse trees in the k-best list might, and all are considered when computing F[?]. Also, no single parse tree need have constituents for all of F[?]; because it sums over all parse trees, it can mix and match constituents between different trees. The optimal frame F[?] can be computed by an O(N3) parsing algorithm if appropriate independence assumptions are made on p(F|x). This requires designing an SRL model that is independent of the bracketing derived from any particular parse tree. Initial experiments performed poorly because the marginal model p(F|x) was inadequate. Detailed exploration is left for future work.</Paragraph> </Section> </Section> class="xml-element"></Paper>