File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0629_metho.xml

Size: 6,078 bytes

Last Modified: 2025-10-06 14:09:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0629">
  <Title>Semantic Role Labeling Using Support Vector Machines</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Support Vector Machines
</SectionTitle>
    <Paragraph position="0"> SVMs are one of the binary classifiers based on the maximum margin strategy introduced by Vapnik (Vapnik, 1995). This algorithm has achieved good performance in many classification tasks, e.g.</Paragraph>
    <Paragraph position="1"> named entity recognition and document classification. There are some advantages to SVMs in that (i) they have high generalization performance independent of the dimensions of the feature vectors and (ii) learning with a combination of multiple features is possible by using the polynomial kernel function (Yamada and Matsumoto, 2003). SVMs were used in the CoNLL-2004 shred task and achieved good performance (Hacioglu et al., 2004) (Kyung-Mi Park and Rim, 2004). We used YamCha (Yet Another Multipurpose Chunk Annotator) 1 (Kudo and Matsumoto, 2001), which is a general purpose SVM-based chunker. We also used TinySVM2 as a package for SVMs.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="198" type="metho">
    <SectionTitle>
3 System Description
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="197" type="sub_section">
      <SectionTitle>
3.1 Data Representation
</SectionTitle>
      <Paragraph position="0"> We changed the representation of original data according to Hacioglu et al. (Hacioglu et al., 2004) in  phrase method (Hacioglu et al., 2004).</Paragraph>
      <Paragraph position="1"> Word tokens were collapsed into base phrase (BP) tokens. The BP headwords were rightmost words. Verb phrases were not collapsed because some included more the one predicate.</Paragraph>
    </Section>
    <Section position="2" start_page="197" end_page="197" type="sub_section">
      <SectionTitle>
3.2 Feature Coding
</SectionTitle>
      <Paragraph position="0"> We prepared the training and development set by using files corresponding to: words, predicated partial parsing (part-of-speech, base chunks), predicate full parsing trees (Charniak models), and named entities.</Paragraph>
      <Paragraph position="1"> We will describe feature extraction according to Fig.</Paragraph>
      <Paragraph position="2">  from a predicate (see Fig. 2). We used full parses predicted by the Charniak parser. In this figure, the depth of paid , which is a predicate, is zero and the depth of April is -2.</Paragraph>
    </Section>
    <Section position="3" start_page="197" end_page="197" type="sub_section">
      <SectionTitle>
Class Examples
</SectionTitle>
      <Paragraph position="0"> Person he, I, people, investors, we Organization company, Corp., Inc., companies, group Time year, years, time, yesterday, months Location Francisco, York, California, city, America Number %, million, billion, number, quarter Money price, prices, cents, money, dollars 9th Flat Path: This means the path from the current word to the predicate as a chain of the phrases.</Paragraph>
      <Paragraph position="1"> The chain begins from the BP of the current word to the BP of the predicate.</Paragraph>
      <Paragraph position="2"> 10th Semantic Class : We collected the most frequently occurring 1,000 BP headwords appearing in the training set and tried to manually classified. The five classes (person, organization, time, location, number and money) were relatively easy to classify. In the 1,000 words, the 343 words could be classified into the five classes. Remainder could not be classified. The details are listed in Table 1.</Paragraph>
      <Paragraph position="3"> Preceding class: The class (e.g. B-A0 or I-A1) of the token(s) preceding the current token. The number of preceding tokens is dependent on the window size. In this paper, the left context considered is two.</Paragraph>
    </Section>
    <Section position="4" start_page="197" end_page="198" type="sub_section">
      <SectionTitle>
3.3 Machine learning with YamCha
</SectionTitle>
      <Paragraph position="0"> YamCha (Kudo and Matsumoto, 2001) is a general purpose SVM-based chunker. After inputting the training and test data, YamCha converts them for</Paragraph>
      <Paragraph position="2"> (3rd), named entities (4th), token depth (5th), predicate (6th), position of tokens (7th), phrase distance (8th), flat paths (9th), semantic classes (10th), argument classes (11th).</Paragraph>
      <Paragraph position="3"> the SVM. The YamCha format for an example sentence is shown in Fig. 1. Input features are written in each column as words (1st), POS tags (2nd), base phrase chunks (3rd), named entities (4th), token depth (5th), predicate (6th), the position of tokens (7th), the phrase distance (8th), flat paths (9th), semantic classes (10th), and argument classes (11th).</Paragraph>
      <Paragraph position="4"> The boxed area in Fig. 1 shows the elements of feature vectors for the current word, in this case &amp;quot;share&amp;quot;. The information from the two preceding and two following tokens is used for each vector.</Paragraph>
    </Section>
    <Section position="5" start_page="198" end_page="198" type="sub_section">
      <SectionTitle>
3.4 Parameters of SVM
</SectionTitle>
      <Paragraph position="0"> a0 Degree of polynomial kernel (natural number): We can only use a polynomial kernel in Yam-Cha. In this paper, we adopted the degree of two.</Paragraph>
      <Paragraph position="1"> a0 Range of window (integer): The SVM can use the information on tokens surrounding the token of interest as illustrated in Fig. 1. In this paper, we adopted the range from the left two tokens to the right two tokens.</Paragraph>
      <Paragraph position="2"> a0 Method of solving a multi-class problem: We adopted the one-vs.-rest method. The BIO class is learned as (B vs. other), (I vs. other), and (O vs. other).</Paragraph>
      <Paragraph position="3"> a0 Cost of constraint violation (floating number): There is a trade-off between the training error and the soft margin for the hyper plane. We adopted a default value (1.0).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML