File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2404_metho.xml

Size: 7,103 bytes

Last Modified: 2025-10-06 14:10:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2404">
  <Title>Chunking Japanese Compound Functional Expressions by Machine Learning</Title>
  <Section position="4" start_page="28" end_page="30" type="metho">
    <SectionTitle>
3 Chunking Japanese Compound
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="28" end_page="29" type="sub_section">
      <SectionTitle>
Functional Expressions with SVMs
3.1 Support Vector Machines
</SectionTitle>
      <Paragraph position="0"> The principle idea of SVMs is to find a separate hyperplane that maximizes the margin between two classes (Vapnik, 1998). If the classes are not separated by a hyperplane in the original input space, the samples are transformed in a higher dimensional features space.</Paragraph>
      <Paragraph position="1"> Giving x is the context (a set of features) of</Paragraph>
      <Paragraph position="3"> [?]{1,[?]1}) indicate the context of the training data and its category, respectively; The decision function f in SVM framework is defined as:</Paragraph>
      <Paragraph position="5"> where K is a kernel function, b [?] R is a threshold, and a</Paragraph>
      <Paragraph position="7"> are weights. Besides, the weights a</Paragraph>
      <Paragraph position="9"> are called support vectors. To train an SVM is to find the a</Paragraph>
      <Paragraph position="11"> and the b by solving the optimization problem; maximizing the following under the constraints of (2) and (3):</Paragraph>
      <Paragraph position="13"> The kernel function K is used to transform the samples in a higher dimensional features space.</Paragraph>
      <Paragraph position="14"> Among many kinds of kernel functions available, we focus on the d-th polynomial kernel:</Paragraph>
      <Paragraph position="16"> Through experimental evaluation on chunking Japanese compound functional expressions, we compared polynomial kernels with d = 1, 2, and 3. Kernels with d = 2 and 3 perform best, while the kernel with d =3requires much more computational cost than that with d =2. Thus, throughout the paper, we show results with the quadratic kernel (d =2).</Paragraph>
    </Section>
    <Section position="2" start_page="29" end_page="30" type="sub_section">
      <SectionTitle>
3.2 Chunking with SVMs
</SectionTitle>
      <Paragraph position="0"> This section describes details of formalizing the chunking task using SVMs. In this paper, we use an SVMs-based chunking tool YamCha  (Kudo and Matsumoto, 2001). In the SVMs-based chunking framework, SVMs are used as classifiers for assigning labels for representing chunks to each token. In our task of chunking Japanese compound functional expressions, each sentence is represented as a sequence of morphemes, where a morpheme is regarded as a token.</Paragraph>
      <Paragraph position="1">  For representing proper chunks, we employ IOB2 representation, one of those which have been studied well in various chunking tasks of natural language processing (Tjong Kim Sang, 1999; Kudo and Matsumoto, 2001). This method uses the following set of three labels for representing proper chunks.</Paragraph>
      <Paragraph position="2"> I Current token is a middle or the end of a chunk consisting of more than one token.</Paragraph>
      <Paragraph position="3"> O Current token is outside of any chunk.</Paragraph>
      <Paragraph position="4"> B Current token is the beginning of a chunk. As we described in section 2.2, given a candidate expression, we classify the usages of the expression into two classes: functional and content. Accordingly, we distinguish the chunks of the two types: the functional type chunk and the content type chunk. In total, we have the following five labels for representing those chunks: B-functional, I-functional, B-content, I-content, and O.Table 6 gives examples of those chunk labels representing chunks.</Paragraph>
      <Paragraph position="5"> Finally, as for exending SVMs to multi-class classifiers, we experimentally compare the pair-wise method and the one vs. rest method, where the pairwise method slightly outperformed the one vs. rest method. Throughout the paper, we show results with the pairwise method.</Paragraph>
      <Paragraph position="7"> For the feature sets for training/testing of SVMs, we use the information available in the surrounding context, such as the morphemes, their parts-of-speech tags, as well as the chunk labels.</Paragraph>
      <Paragraph position="8"> More precisely, suppose that we identify the chunk</Paragraph>
      <Paragraph position="10"> is the morpheme appearing at i-th position, F i is the feature set at i-th position, and c i is the chunk label for i-th morpheme. Roughly speaking, when identifying the chunk label c i for the i-th morpheme, we use the feature sets F</Paragraph>
      <Paragraph position="12"> at the positions i [?] 2, i [?] 1, i, i +1, i +2, as well as the preceding two chunk labels c</Paragraph>
      <Paragraph position="14"> The detailed definition of the feature set F</Paragraph>
      <Paragraph position="16"> fined as a tuple of the morpheme feature MF(m</Paragraph>
      <Paragraph position="18"> of the i-th morpheme m i , the chunk candidate feature CF(i) at i-th position, and the chunk context feature OF(i) at i-th position.</Paragraph>
      <Paragraph position="20"> ) consists of the lexical form, part-of-speech, conjugation type and form, base form, and pronunciation of m</Paragraph>
      <Paragraph position="22"> The chunk candidate feature CF(i) and the chunk context feature OF(i) are defined considering the candidate compound functional expression, which is a sequence of morphemes including the morpheme m i at the current position i.As we described in section 2, the class of Japanese compound functional expressions can be regarded as closed and their number is at most a few thousand. Therefore, it is easy to enumerate all the compound functional expressions and their morpheme sequences. Chunk labels other than O should be assigned to a morpheme only when it constitutes at least one of those enumerated compound functional expressions. Suppose that a se- null are at immediate left/right contexts of E.</Paragraph>
      <Paragraph position="23"> Then, the chunk candidate feature CF(i) at i-th position is defined as a tuple of the number of morphemes constituting E and the position of m</Paragraph>
      <Paragraph position="25"/>
      <Paragraph position="27"> as well as the chunk candidate features at immediate left/right contexts of E.</Paragraph>
      <Paragraph position="28"> CF(i)=&lt; length of E, position of m</Paragraph>
      <Paragraph position="30"> Table 6 gives examples of chunk candidate features and chunk context features It can happen that the morpheme at the current position i constitutes more than one candidate compound functional expression. For example, in the example below, the morpheme sequences  In such cases, we prefer the one starting with the leftmost morpheme. If more than one candidate expression starts with the leftmost morpheme, we prefer the longest one. In the example above, we prefer the candidate E  and construct the chunk candidate features and chunk context features considering E  only.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML