File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1186_metho.xml
Size: 8,595 bytes
Last Modified: 2025-10-06 14:08:49
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1186"> <Title>Semantic Role Labeling Using Dependency Trees</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Dependency Bank (DepBank) </SectionTitle> <Paragraph position="0"> In this section, we describe the corpus that we automatically created using the syntactic annotations of the Penn TreeBank with the semantic annotations of the PropBank. Hereafter, we refer to this new corpus as DepBank.</Paragraph> <Paragraph position="1"> Firstly, we convert constituency trees into dependency trees1. The functional tags are removed from constituency trees before the conversion, since the current state-of-the-art syntactic parsers do not exploit those tags. Secondly, we trace the dependency trees to determine the word sequences covered by the dependency relation nodes. Finally, we augment those nodes with their semantic role labels that cover the same sequence of words. The relations that do not align with any semantic role are tagged using the label &quot;O&quot;. In Figure 2, we illustrate a sample dependency tree from the DepBank. It corresponds to the predicate posted of the following sentence (semantic roles are also indicated): [A0 The dollar] [V posted] [A1 gains] [AM-LOC in quiet training] [AM-ADV as concerns about equities abated] We note that the other predicate in the sentence is abated and the same tree with different semantic labels is also instantiated in the DepBank for it. The dependency relation nodes are indicated by &quot;R:&quot; in The dependency relation types are paired with the corresponding semantic role labels. The only exception is the node that belongs to the predicate; the semantic label V is used with the lemma of the predicate. The lexical nodes include the word itself and its part-of-speech (POS) tag.</Paragraph> </Section> <Section position="4" start_page="0" end_page="2" type="metho"> <SectionTitle> 3 Semantic Role Labeling of Relations </SectionTitle> <Paragraph position="0"> In the proposed approach, we first linearize the dependency tree in a bottom-up left-to-right manner into a sequence of dependency relations. During this engconst2dep, from the University of Maryland, is used. Special thanks to R. Hwa, A. Lopez and M. Diab.</Paragraph> <Paragraph position="1"> process we filter out the dependency relations that are less likely to be an argument. The selection mechanism is based on simple heuristics derived from dependency trees. Then we extract a set of features for each dependency relation. Finally, we input the features to a bank of SVM classifiers. A one-versus-all SVM classifier is used for each semantic role.</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.1 Dependency Relation Selection </SectionTitle> <Paragraph position="0"> In dependency tree representations, we observe that the semantic roles are highly localized with respect to the chosen predicate. We exploit this observation to devise a method for deciding whether a dependency relation is likely to be a semantic role or not. We define a tree-structured family of a predicate as a measure of locality. It is a set of dependency relation nodes that consists of the predicate's parent, children, grandchildren, siblings, siblings' children and siblings' grandchildren with respect to its dependency tree. Any relation that does not belong to this set is skipped while we linearize the dependency tree in a bottom-up left-to-right manner. Further selection is performed on the family members that are located at the leaves of the tree. For example, a leaf member with det dependency relation is not considered for semantic labeling. Our selection mechanism reduces the data for semantic role labeling by approximately 3-4 fold with nearly 1% miss of semantic labels, since a quite large number of nodes in the dependency trees are not associated with any semantic role.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.2 Features </SectionTitle> <Paragraph position="0"> For each candidate dependency relation we extract a set of features. In the following, we explain these features and give examples for their values referring to the dependency tree shown in Figure 1 (feature values for the relation node R:mod with the semantic label [A0] is given in parentheses). The features that are specific to the dependency relation (i.e. token-level features) are Type: This feature indicates the type of the dependency relation (mod) Family membership: This feature indicates how the dependency relation is related to the predicate in the family (child) Position: This feature indicates the position of the headword of the dependency relation with respect to the predicate position in the sentence (before) null Headword: the modified (head) word in the relation (posted).</Paragraph> <Paragraph position="1"> Dependent word: the modifying word in the relation (dollar) POS tag of headword: (VBD) POS tag of dependent word: (NN) Path: the chain of relations from relation node to predicate. (mod *) and the features that are specific to the predicate (i.e. sentence-level features): POS pattern of predicate's children: This feature indicates the left-to-right chain of the POS tags of the immediate words that depend on the predicate. (NN-NNS-IN-IN) Relation pattern of predicate's children: This feature indicates the left-to-right chain of the relation labels of the predicate's children (mod-obj-pobj) null POS pattern of predicate's siblings: This feature indicates the left-to-right chain of the POS tags of the headwords of the siblings of predicate. (-) Relation pattern of predicate's siblings: This feature indicates the left-to-right chain of the relation labels of the predicate's siblings. (-).</Paragraph> </Section> <Section position="3" start_page="1" end_page="2" type="sub_section"> <SectionTitle> 3.3 Classifier </SectionTitle> <Paragraph position="0"> We selected support vector machines (Vapnik, 1995) to implement the semantic role classifiers.</Paragraph> <Paragraph position="1"> The motivation for this selection was the ability of SVMs to handle an extremely large number of interacting or overlapping features with quite strong generalization properties. Support vector machines for SRL were first used in (Hacioglu and Ward, 2003) as word-by-word (W-by-W) classifiers. The system was then applied to the constituent-by-constituent (C-by-C) classification in (Hacioglu et. al., 2003) and phrase-by-phrase (P-by-P) classification in (Hacioglu, 2004). Several extensions of the basic system with state-of-the-art performance were reported in (Pradhan et.al, 2003; Pradhan et. al. 2004; Hacioglu et. al. 2004). All SVM classifiers for semantic argument labeling were realized using the TinySVM with a polynomial kernel of degree 2 and the general purpose SVM based chunker YamCha .</Paragraph> </Section> </Section> <Section position="5" start_page="2" end_page="2" type="metho"> <SectionTitle> 4 Experiments </SectionTitle> <Paragraph position="0"> Experiments were carried out using a part of the February 2004 release of the PropBank. Sections 15 through 18 were used for training, Section 20 was used for developing and Section 21 was used for testing. This is exactly the same data used for CoNLL2004 shared task on SRL. Therefore, the results can be directly compared to the performance of the systems that used or that will use the same data. The system performance is evaluated by using precision, recall and F metrics. In the experiments, http://cl.aist-nara.ac.jp/~taku-ku/software/ the gold standard constituency parses were used.</Paragraph> <Paragraph position="1"> Therefore, the results provide an upper bound on the performance with automatic parses. Table 1 presents the results on the DepBank development set. The results on the CoNLL2004 development set are also illustrated. After we project the predicted semantic role labels in the DepBank dev set onto the CoNLL2004 dev set (directly created from the PropBank) we observe a sharp drop in the recall performance. The drop is due to the loss of approximately 8% of semantic roles in the DepBank dev set during the conversion process; not all phrase nodes in constituency trees find an equivalent relation node in dependency trees. However, this mismatch is significantly less than the 23% mismatch reported in (Gildea and Hockenmaier, 2003) between the CCGBank and an earlier version of the PropBank.</Paragraph> </Section> class="xml-element"></Paper>