File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0634_intro.xml

Size: 4,193 bytes

Last Modified: 2025-10-06 14:03:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0634">
  <Title>Semantic Role Chunking Combining Complementary Syntactic Views</Title>
  <Section position="4" start_page="217" end_page="218" type="intro">
    <SectionTitle>
2 System Description
</SectionTitle>
    <Paragraph position="0"> We again formulate the semantic labeling problem as a multi-class classification problem using Support Vector Machine (SVM) classifiers. TinySVM1 along with YamCha2 (Kudo and Matsumoto, 2000; Kudo and Matsumoto, 2001) are used to implement the system. Using what is known as the ONE VS ALL classification strategy, n binary classifiers are trained, where n is number of semantic classes including a NULL class.</Paragraph>
    <Paragraph position="1"> The general framework is to train separate semantic role labeling systems for each of the parse tree views, and then to use the role arguments output by these systems as additional features in a semantic role classifier using a flat syntactic view. The constituent based classifiers walk a syntactic parse tree and classify each node as NULL (no role) or as one of the set of semantic roles. Chunk based systems classify each base phrase as being the B(eginning) of a semantic role, I(nside) a semantic role, or O(utside) any semantic role (ie. NULL). This is referred to as an IOB representation (Ramshaw and Marcus, 1995). The constituent level roles are mapped to the IOB representation used by the chunker. The IOB tags are then used as features for a separate base-phase semantic role labeler (chunker), in addition to the standard set of features used by the chunker. An n-fold cross-validation paradigm  and the chunk based classifier.</Paragraph>
    <Paragraph position="2"> For the system reported here, two full syntactic parsers were used, a Charniak parser and a Collins parser. Features were extracted by first generating the Collins and Charniak syntax trees from the word-by-word decomposed trees in the CoNLL data. The chunking system for combining all features was trained using a 4-fold paradigm. In each fold, separate SVM classifiers were trained for the Collins and Charniak parses using 75% of the training data. That is, one system assigned role labels to the nodes in Charniak based trees and a separate system assigned roles to nodes in Collins based trees. The other 25% of the training data was then labeled by each of the systems. Iterating this process 4 times created the training set for the chunker. After the chunker was trained, the Charniak and Collins based semantic labelers were then retrained using all of the training data.</Paragraph>
    <Paragraph position="3"> Two pieces of the system have problems scaling to large training sets - the final chunk based classifier and the NULL VS NON-NULL classifier for the parse tree syntactic views. Two techniques were used to reduce the amount of training data - active sampling and NULL filtering. The active sampling process was performed as follows. We first train a system using 10k seed examples from the training set. We then labeled an additional block of data using this system. Any sentences containing an error were added to the seed training set. The system was retrained and the procedure repeated until there were no misclassified sentences remaining in the training data. The set of examples produced by this procedure was used to train the final NULL VS NON-NULL classifier. The same procedure was carried out for the chunking system. After both these were trained, we tagged the training data using them and removed all most likely NULLs from the data.</Paragraph>
    <Paragraph position="4"> Table 1 lists the features used in the constituent based systems. They are a combination of features introduced by Gildea and Jurafsky (2002), ones proposed in Pradhan et al. (2004), Surdeanu et al.</Paragraph>
    <Paragraph position="5"> (2003) and the syntactic-frame feature proposed in (Xue and Palmer, 2004). These features are extracted from the parse tree being labeled. In addition to the features extracted from the parse tree being labeled, five features were extracted from the other parse tree (phrase, head word, head word POS, path</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML