File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0857_metho.xml
Size: 13,170 bytes
Last Modified: 2025-10-06 14:09:11
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0857"> <Title>Generative Models for Semantic Role Labeling</Title> <Section position="3" start_page="0" end_page="1" type="metho"> <SectionTitle> 2 Role Labeler </SectionTitle> <Paragraph position="0"> Our general approach is to use a generative model defining a joint probability distribution over targets, frames, roles, and constituents. The advantage of such a model is its generality: it can determine the probability of any subset of the variables given values for the others. Three of our entries used the generative model illustrated in Figure 1, and the fourth used a model grouping all roles together, as described further below. The first model functions as follows. First, a target, T , is chosen, which then generates a frame, F . The frame generates a (lin null . Note that, conditioned on a particular frame, the model is just a first-order Hidden Markov Model.</Paragraph> <Paragraph position="1"> The second generative model treats all roles as a group. It is no longer based on a Hidden Markov model, but all roles are generated, in order, simultaneously. Therefore, the role sequence in Figure 1 is replaced by a single node containing all n roles. This can be compared to a case-based approach that memorizes all seen role sequences and calculates their likelihood. It is also similar to Gildea & Jurafsky's (2002) frame element groups, though we distinguish between different role orderings, whereas Association for Computational Linguistics for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems they do not. However, we still model constituent generation sequentially.</Paragraph> <Paragraph position="2"> The FrameNet corpus contains annotations for all of the model components described above. We represent each constituent by its phrasal category together with its head word. As in Gildea & Jurafsky's (2002) approach, we determine head words from the sentence's syntactic parse, using a simple heuristic when syntactic alignment with a parse is not available. null We estimate most of the model parameters using a straightforward maximum likelihood estimate based on fully labeled training data. We smooth emission probabilities with phrase type labels, due to the sparseness of head words. To label a test example, consisting of a target, frame, and constituent sequence, with a role label, we use the Viterbi algorithm. For further details, see Thompson (2003).</Paragraph> </Section> <Section position="4" start_page="1" end_page="2" type="metho"> <SectionTitle> 3 Constituent Classification for Role </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1" end_page="2" type="sub_section"> <SectionTitle> Labeling </SectionTitle> <Paragraph position="0"> To address the &quot;hard&quot; task we build a constituent classifier, whose goal is to detect the role-bearing constituents of a sentence. We use a Naive Bayes classifier from the Weka Machine Learning toolkit (Witten and Frank, 2000) to classify every sentence constituent as role-bearing or not. In our cross-validation studies, Naive Bayes was both accurate and efficient. To generate the training examples for the classifier, we generate a parse tree for every sentence in the SENSEVAL-3 training data, using the Collins (1996) statistical parser. We call each node in this tree a constituent. Once it is trained, the classifier can sift through a new constituent list and decide which are likely to be role-bearing. The selected constituents are passed on to the Role Labeler for labeling with semantic roles, as described in Section 4.</Paragraph> <Paragraph position="1"> We train the classifier on examples extracted from the SENSEVAL-3 training data. Each example is a list of attributes corresponding to a constituent in a sentence's parse tree, along with its classification as role-bearing or not. We extract the attributes by traversing each sentence's parse tree from the root node down to nodes all of whose children are preterminals. null We create a training example for every visited node.</Paragraph> <Paragraph position="2"> We decided to use the following attributes from the parse trees and FrameNet examples: Target Position: The position of the target word as The heuristic chooses the preposition for PP's and the last word of the phrase for all other phrases.</Paragraph> <Paragraph position="3"> We later fixed this to traverse the tree to the pre-terminals themselves, as discussed further in Section 5.</Paragraph> <Paragraph position="4"> being BEFORE, AFTER,orCONTAINS (contained in) the constituent.</Paragraph> <Paragraph position="5"> Distance from target: The number of words between the start of the constituent and target word. Depth: The depth of the constituent in the parse tree.</Paragraph> <Paragraph position="6"> Height: The number of levels in the parse tree below the constituent.</Paragraph> <Paragraph position="7"> Word Count: The number of words in the constituent. null Path to Target: Gildea and Jurafsky (2002) show that the path from a constituent node to the node corresponding to the target word is a good indicator that a constituent corresponds to a role. We use the 35 most frequently occurring paths in the training corpus as attribute values, as these cover about 68% of the paths in the training corpus. The remaining paths are specified as &quot;OTHER&quot;.</Paragraph> <Paragraph position="8"> Length of Path to Target: The number of nodes between the constituent and the target in the path. By generating examples in the manner described above, we create a data set that is heavily biased towards negative examples - 90.8% of the constituents are not role bearing. Therefore, the classifier can obtain high accuracy by labeling everything as negative. This is undesirable since then no constituents would be passed to the Role Labeler. However, passing all constituents to the labeler would cause it to try to label all of them and thus achieve lower accuracy. This results in the classic precision-recall tradeoff. We chose to try to bias the classifier towards high recall by using a cost matrix that penalizes missed positive examples more than missed negatives. The resulting classifier's cross-validation precision was 0.19 and its recall was 0.91. If we do not use the cost matrix, the precision is 0.30 and the recall is 0.82. We are still short of our goal of perfect recall and reasonable precision, but this provides a good filtering mechanism for the next step of role labeling.</Paragraph> </Section> </Section> <Section position="5" start_page="2" end_page="2" type="metho"> <SectionTitle> 4 Combining Constituent Classification </SectionTitle> <Paragraph position="0"> with Role Labeling The constituent classifier correctly picks out most of the role bearing constituents. However, as we have seen, it still omits some constituents and, as it was designed to, includes several irrelevant constituents per sentence. For this paper, because we plan to improve the constituent classifier further, we did not use it to bias the Role Labeler at training time, but only used it to filter constituents at test time for the hard task.</Paragraph> <Paragraph position="1"> When using the classifier with the Role Labeler at testing time, there are two possibilities. First, all constituents deemed relevant by the classifier could be presented to the labeler. However, because we aimed for high recall but possibly low precision, this would allow many irrelevant constituents as input. This both lowers accuracy and increases the computational complexity of labeling. The second possibility is thus to choose some reasonable subset of the positively identified constituents to present to the labeler. The options we considered were a top-down search, a bottom-up search, and a greedy search; we chose a top-down search for simplicity. In this case, the algorithm searches from the root down in the parse tree until it finds a positively labeled constituent. While this assumes that no subtree of a role-bearing constituent is also rolebearing, we discovered that some role-bearing constituents do overlap with each other in the parse trees. However, in the Senseval training corpus, only 1.2% of the sentences contain a (single) overlapping constituent. In future work we plan to investigate alternative approaches for constituent choice. After filtering via our top down technique, we present the resulting constituent sequence to the role labeler. Since the role labeler is trained on sequences containing only true role-bearing constituents but tested on sequences with potentially missing and potentially irrelevant constituents, this stage provides an opportunity for errors to creep into the process. However, because of the Markovian assumption, the presence of an irrelevant constituent has only local effects on the overall choice of a role sequence.</Paragraph> </Section> <Section position="6" start_page="2" end_page="2" type="metho"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> The SENSEVAL-3 committee chose 40 of the most frequent 100 frames from FrameNet II for the competition. In experiments with validation sets, our algorithm performed better using only the SENSEVAL-3 training data, as opposed to also using sentences from the remaining frames, so all our models were trained only on that data. We calculated performance using SENSEVAL-3's scoring software.</Paragraph> <Paragraph position="1"> We submitted two set of answers for each task.</Paragraph> <Paragraph position="2"> We summarize each system's performance in Table 1. For the easy task, we used both the grouped (FEG Easy) and first order (FirstOrder Easy) models. The grouped model performed better on experiments with validation sets, perhaps due to the fact that many frames have a small number of possible role permutations corresponding to a given number of constituents. In less artificial conditions this version would be less flexible in incorporating both relevant and irrelevant constituents.</Paragraph> <Paragraph position="3"> For the hard task, we used only the first order model, due both to its greater flexibility and to the low precision of our classifier: if all positively classified constituents were passed to the group model, the sequence length would be greater than any seen at training time, when only correct constituents are given to the labeler. We used both the cost sensitive classifier (CostSens Hard) and the regular constituent classifier to filter constituents (Hard). There is a precision/recall tradeoff in using the different classifiers. We were surprised how poorly our labeler was performing on validation sets as we prepared our results. We found out that our classifier was omitting about 70% of the role-bearing constituents from consideration, because they only matched a parse constituent at a pre-terminal node.</Paragraph> <Paragraph position="4"> We fixed this bug after submission, learned a new constituent classifier, and used the same role labeler as before. The improved results are shown in Table 2. Note that our recall has an upper limit of 85.8% due to mismatches between roles and parse tree constituents.</Paragraph> </Section> <Section position="7" start_page="2" end_page="2" type="metho"> <SectionTitle> 6 Future Work </SectionTitle> <Paragraph position="0"> We have identified three problems for future research. First, our constituent classifier should be improved to produce fewer false positives and to include a higher percentage of true positives. To do this, we first plan to enhance the feature set. We will also explore improved approaches to combining the results of the classifier with the role labeler. For example, in preliminary studies, a bottom-up search for positive constituents in the parse tree seems to yield better results than our current top-down approach. null Second, since false positives cannot be entirely avoided, the labeler needs to better handle constituents that should not be labeled with a role. To solve this problem, we will adapt the idea of null generated words from machine translation (Brown et al., 1993). Instead of having a word in the target language that corresponds to no word in the source language, we have a constituent that corresponds to no state in the role sequence.</Paragraph> <Paragraph position="1"> Finally, we will address roles that do not label a constituent, called null-instantiated roles. An example is the sentence &quot;The man drove to the station,&quot; in which the VEHICLE role does not have a constituent, but is implicitly there, since the man obviously drove something to the station. This problem is more difficult, since it involves obtaining information not actually in the sentence. One possibility is to consider inserting null-instantiated roles at every step. We will consider only roles seen as null-instantiated at training time. This method will restrict the search space, which would otherwise be extremely large.</Paragraph> </Section> class="xml-element"></Paper>