File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2611_intro.xml
Size: 4,817 bytes
Last Modified: 2025-10-06 14:04:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2611"> <Title>Towards Free-text Semantic Parsing: A Unified Framework Based on FrameNet, VerbNet and PropBank</Title> <Section position="2" start_page="0" end_page="78" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> During the last years a noticeable effort has been devoted to the design of lexical resources that can provide the training ground for automatic semantic role labelers. Unfortunately, most of the systems developed until now are confined to the scope of the resource that they use during the learning stage. A very recent example in this sense was provided by the CONLL 2005 Shared Task on PropBank (Kingsbury and Palmer, 2002) role labeling (Carreras and Marquez, 2005). While the best F-measure recorded on a test set selected from the training corpus (WSJ) was 80%, on the Brown corpus, the F-measure dropped below 70%. The most significant causes for this performance decay were highly ambiguous and unseen predicates (i.e. predicates that do not have training examples, unseen in the training set).</Paragraph> <Paragraph position="1"> On the FrameNet (Johnson et al., 2003) role labeling task, the Senseval-3 competition (Litkowski, 2004) registered similar results (~80%) by using the gold frame information as a given feature. No tests were performed outside Frame-Net. In this paper, we show that when the frame feature is not used, the performance decay on different corpora reaches 30 points. Thus, the context knowledge provided by the frame is very important and a free-text semantic parser using FrameNet roles depends on the accurate automatic detection of this information.</Paragraph> <Paragraph position="2"> In order to test the feasibility of such a task, we have trained an SVM (Support Vector Machine) Tree Kernel model for the automatic acquisition of the frame information. Although FrameNet contains three types of predicates (nouns, adjectives and verbs), we concentrated on the verb predicates and the roles associated with them. Therefore, we considered only the frames that have at least one verb lexical unit. Our experiments show that given a FrameNet predicate-argument structure, the task of identifying the originating frame can be performed with very good results when the verb predicates have enough training examples, but becomes very challenging otherwise. The predicates not yet included in FrameNet and the predicates belonging to new application domains (that require new frames) are especially problematic as for them there is no available training data.</Paragraph> <Paragraph position="3"> We have thus studied new means of capturing the semantic context, other than the frame, which can be easily annotated on FrameNet and are available on a larger scale (i.e. have a better coverage). A very good candidate seems to be the Intersective Levin classes (Dang et al., 1998) that can be found as well in other predicate resources like PropBank and VerbNet (Kipper et al., 2000). Thus, we have designed a semi-automatic algorithm for assigning an Intersective Levin class to each FrameNet verb predicate.</Paragraph> <Paragraph position="4"> The algorithm creates a mapping between FrameNet frames and the Intersective Levin classes. By doing that we could connect FrameNet to VerbNet and PropBank and obtain an increased training set for the Intersective Levin class. This leads to better verb coverage and a more robust semantic parser. The newly created knowledge base allows us to surpass the shortcomings that arise when FrameNet, VerbNet and PropBank are used separately while, at the same time, we benefit from the extensive research involving each of them (Pradhan et al., 2004; Gildea and Jurafsky, 2002; Moschitti, 2004).</Paragraph> <Paragraph position="5"> We mention that there are 3,672 distinct verb senses1 in PropBank and 2,351 distinct verb senses in FrameNet. Only 501 verb senses are in common between the two corpora which mean 13.64% of PropBank and 21.31% of FrameNet.</Paragraph> <Paragraph position="6"> Thus, by training an Intersective Levin class classifier on both PropBank and FrameNet we extend the number of available verb senses to 5,522.</Paragraph> <Paragraph position="7"> In the remainder of this paper, Section 2 summarizes previous work done on FrameNet automatic role detection. It also explains in more detail why models based exclusively on this corpus are not suitable for free-text parsing. Section 3 focuses on VerbNet and PropBank and how they can enhance the robustness of our semantic parser. Section 4 describes the mapping between frames and Intersective Levin classes whereas Section 5 presents the experiments that support our thesis. Finally, Section 6 summarizes the conclusions.</Paragraph> </Section> class="xml-element"></Paper>