File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1117_evalu.xml
Size: 8,609 bytes
Last Modified: 2025-10-06 13:59:43
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1117"> <Title>Semantic Role Labeling via FrameNet, VerbNet and PropBank</Title> <Section position="7" start_page="932" end_page="935" type="evalu"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> In the previous sections we have presented the algorithm for annotating the verb predicates of FrameNet (FN) with Intersective Levin classes (ILCs). In order to show the effectiveness of this annotation and of the ILCs in general we have performed several experiments.</Paragraph> <Paragraph position="1"> First, we trained (1) an ILC multiclassifier from FN, (2) an ILC multiclassifier from PB and (3) a frame multiclassifier from FN. We compared the results obtained when trying to classify the VN class with the results obtained when classifying frame. We show that ILCs are easier to detect than FN frames.</Paragraph> <Paragraph position="2"> Our second set of experiments regards the automatic labeling of FN semantic roles on FN corpus when using as features: gold frame, gold ILC, automatically detected frame and automatically detected ILC. We show that in all situations in which the VN class feature is used, the accuracy loss, compared to the usage of the frame feature, is negligible. This suggests that the ILC can successfully replace the frame feature for the task of semantic role labeling.</Paragraph> <Paragraph position="3"> Another set of experiments regards the generalization property of the ILC. We show the impact of this feature when very few training data is available and its evolution when adding more and more training examples. We again perform the experiments for: gold frame, gold ILC, automatically detected frame and automatically detected ILC.</Paragraph> <Paragraph position="4"> Finally, we simulate the difficulty of free text by annotating PB with FN semantic roles. We used PB because it covers a different set of verbal predicates and also because it is very different from FN at the level of vocabulary and sometimes even syntax. These characteristics make PB a difficult testbed for the semantic role models trained on FN.</Paragraph> <Paragraph position="5"> In the following section we present the results obtained for each of the experiments mentioned above.</Paragraph> <Section position="1" start_page="933" end_page="934" type="sub_section"> <SectionTitle> 5.1 Experimental setup </SectionTitle> <Paragraph position="0"> The corpora available for the experiments were PB and FN. PB contains about 54,900 predicates and gold parse trees. We used sections from 02 to 22 (52,172 predicates) to train the ILC classifiers and Section 23 (2,742 predicates) for testing purposes.</Paragraph> <Paragraph position="1"> The number of ILCs is 180 in PB and 133 on FN, i.e. the classes that we were able to map.</Paragraph> <Paragraph position="2"> For the experiments on FN corpus, we extracted 58,384 sentences from the 319 frames that contain at least one verb annotation. There are 128,339 argument instances of 454 semantic roles. In our evaluation we use only verbal predicates. Moreover, as there is no fixed split between training and testing, we randomly selected 20% of sentences for testing and 80% for training. The sentences were processed using Charniak's parser (Charniak, 2000) to generate parse trees automatically. The classification models were implemented by means of the SVM-light-TK software available at http://ai-nlp.info.uniroma2.it/moschitti which encodes tree kernels in the SVM-light software (Joachims, 1999). We used the default parameters. The classification performance was evaluated using the F1 measure for the individual role and ILC classifiers and the accuracy for the multiclassifiers.</Paragraph> </Section> <Section position="2" start_page="934" end_page="934" type="sub_section"> <SectionTitle> 5.2 Automatic VerbNet class vs. automatic </SectionTitle> <Paragraph position="0"> FrameNet frame detection In these experiments, we classify ILCs on PB and frames on FN. For the training stage we use SVMs with Tree Kernels.</Paragraph> <Paragraph position="1"> The main idea of tree kernels is the modeling of a KT (T1,T2) function which computes the number of common substructures between two trees T1 and T2. Thus, we can train SVMs with structures drawn directly from the syntactic parse tree of the sentence. The kernel that we employed in our experiments is based on the SCF structure devised in (Moschitti, 2004). We slightly modified SCF by adding the headwords of the arguments, useful for representing the selectional preferences (more details are given in (Giuglea and Moschitti, 2006). For frame detection on FN, we trained our classifier on 46,734 training instances and tested on 11,650 testing instances, obtaining an accuracy of 91.11%. For ILC detection the results are depicted in Table 4. The first six columns report the F1 measure of some verb class classifiers whereas the last column shows the global multiclassifier accuracy. We note that ILC detection is more accurate than the frame detection on both FN and PB. Additionally, the ILC results on PB are similar with those obtained for the ILCs on FN. This suggests that the training corpus does not have a major influence. Also, the SCF-based tree kernel seems to be robust in what concerns the quality of the parse trees. The performance decay is very small on FN that uses automatic parse trees with respect to PB that contains gold parse trees.</Paragraph> </Section> <Section position="3" start_page="934" end_page="934" type="sub_section"> <SectionTitle> 5.3 Automatic semantic role labeling on FrameNet </SectionTitle> <Paragraph position="0"> In the experiments involving semantic role labeling, we used SVMs with polynomial kernels. We adopted the standard features developed for semantic role detection by Gildea and Jurafsky (see Section 2). Also, we considered some of the features designed by (Pradhan et al., 2005): First and Last Word/POS in Constituent, Subcategorization, Head Word of Prepositional Phrases and the Syntactic Frame feature from (Xue and Palmer, 2004).</Paragraph> <Paragraph position="1"> For the rest of the paper, we will refer to these features as being literature features (LF). The results obtained when using the literature features alone or in conjunction with the gold frame feature, gold ILC, automatically detected frame feature and automatically detected ILC are depicted in Table 5.</Paragraph> <Paragraph position="2"> The first four columns report the F1 measure of some role classifiers whereas the last column shows the global multiclassifier accuracy. The first row contains the number of training and testing instances and each of the other rows contains the performance obtained for different feature combinations. The results are reported for the labeling task as the argument-boundary detection task is not affected by the frame-like features (G&J).</Paragraph> <Paragraph position="3"> We note that automatic frame produces an accuracy very close to the one obtained with automatic ILC suggesting that this is a very good candidate for replacing the frame feature. Also, both automatic features are very effective and they decrease the error rate by 20%.</Paragraph> <Paragraph position="4"> To test the impact of ILC on SRL with different amount of training data, we additionally draw the learning curves with respect to different features: LF, LF+ (gold) ILC, LF+automatic ILC trained on PB and LF+automatic ILC trained on FN. As can be noted, the automatic ILC information provided by the ILC classifiers (trained on FN or PB) performs almost as good as the gold ILC.</Paragraph> </Section> <Section position="4" start_page="934" end_page="935" type="sub_section"> <SectionTitle> 5.4 Annotating PB with FN semantic roles </SectionTitle> <Paragraph position="0"> To show that our approach can be suitable for semantic role free-text annotation, we have automatically classified PB sentences3 with the FN semantic-role classifiers. In order to measure the quality of the annotation, we randomly selected 100 sentences and manually verified them.</Paragraph> <Paragraph position="1"> We measured the performance obtained with and without the automatic ILC feature. The sentences contained 189 arguments from which 35 were incorrect when ILC was used compared to 72 incorrect in the absence of this feature, i.e. an accuracy of 81% with ILC versus 62% without it. This demonstrates the importance of the ILC feature outside the scope of FN where the frame feature is not available.</Paragraph> </Section> </Section> class="xml-element"></Paper>