File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1016_intro.xml
Size: 4,536 bytes
Last Modified: 2025-10-06 14:02:22
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1016"> <Title>Convolution Kernels with Feature Selection for Natural Language Processing Tasks</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Convolution Kernels </SectionTitle> <Paragraph position="0"> Convolution kernels have been proposed as a concept of kernels for discrete structures, such as sequences, trees and graphs. This framework defines the kernel function between input objects as the convolution of &quot;sub-kernels&quot;, i.e. the kernels for the decompositions (parts) of the objects.</Paragraph> <Paragraph position="1"> Let X and Y be discrete objects. Conceptually, convolution kernels K(X; Y ) enumerate all sub-structures occurring in X and Y and then calculate their inner product, which is simply written as:</Paragraph> <Paragraph position="3"> represents the feature mapping from the discrete object to the feature space; that is,</Paragraph> <Paragraph position="5"> kernels (Lodhi et al., 2002), input objects X and Y are sequences, and i(X) is a sub-sequence. With tree kernels (Collins and Duffy, 2001), X and Y are trees, and i(X) is a sub-tree.</Paragraph> <Paragraph position="6"> When implemented, these kernels can be efficiently calculated in quadratic time by using dynamic programming (DP).</Paragraph> <Paragraph position="7"> Finally, since the size of the input objects is not constant, the kernel value is normalized using the following equation.</Paragraph> <Paragraph position="9"> The value of ^K(X; Y ) is from 0 to 1, ^K(X; Y ) = 1 if and only if X = Y .</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Sequence Kernels </SectionTitle> <Paragraph position="0"> To simplify the discussion, we restrict ourselves hereafter to sequence kernels. Other convolution kernels are briefly addressed in Section 5.</Paragraph> <Paragraph position="1"> Many kinds of sequence kernels have been proposed for a variety of different tasks. This paper basically follows the framework of word sequence kernels (Cancedda et al., 2003), and so processes gapped word sequences to yield the kernel value.</Paragraph> <Paragraph position="2"> Let be a set of finite symbols, and n be a set of possible (symbol) sequences whose sizes are n or less that are constructed by symbols in . The meaning of &quot;size&quot; in this paper is the number of symbols in the sub-structure. Namely, in the case of sequence, size n means length n. S and T can represent any sequence. si and tj represent the ith and jth symbols in S and T, respectively. Therefore, a where jSj represents the length of S. If sequence u is contained in sub-sequence S[i : j] def= si : : :sj of S (allowing the existence of gaps), the position of u in S is written as i = (i1 : ijuj). The length of S[i] is l(i) = ijuj i1 + 1. For example, if u = ab and S = cacbd, then i = (2 : 4) and l(i) = 4 2 + 1 = 3.</Paragraph> <Paragraph position="3"> By using the above notations, sequence kernels can be defined as:</Paragraph> <Paragraph position="5"> where is the decay factor that handles the gap present in a common sub-sequence u, and (i) = l(i) juj. In this paper, j means &quot;such that&quot;. Figure 1 shows a simple example of the output of this kernel.</Paragraph> <Paragraph position="6"> However, in general, the number of features j nj, which is the dimension of the feature space, becomes very high, and it is computationally infeasible to calculate Equation (3) explicitly. The efficient recursive calculation has been introduced in (Cancedda et al., 2003). To clarify the discussion, we redefine the sequence kernels with our notation.</Paragraph> <Paragraph position="7"> The sequence kernel can be written as follows:</Paragraph> <Paragraph position="9"> where Si and Tj represent the sub-sequences Si = s1; s2; : : :; si and Tj = t1; t2; : : :; tj, respectively. Let Jm(Si; Tj) be a function that returns the value of common sub-sequences if si = tj.</Paragraph> <Paragraph position="11"> value between si and tj. This paper defines I(si; tj) as an indicator function that returns 1 if si = tj, otherwise 0.</Paragraph> <Paragraph position="12"> Then, J0m(Si; Tj) and J00m(Si; Tj) are introduced to calculate the common gapped sub-sequences between Si and Tj.</Paragraph> <Paragraph position="14"> If we calculate Equations (5) to (7) recursively, Equation (4) provides exactly the same value as Equation (3).</Paragraph> </Section> </Section> class="xml-element"></Paper>