File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1015_intro.xml

Size: 4,565 bytes

Last Modified: 2025-10-06 14:03:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1015">
  <Title>Making Tree Kernels practical for Natural Language Learning</Title>
  <Section position="2" start_page="0" end_page="113" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In recent years tree kernels have been shown to be interesting approaches for the modeling of syntactic information in natural language tasks, e.g.</Paragraph>
    <Paragraph position="1"> syntactic parsing (Collins and Duffy, 2002), relation extraction (Zelenko et al., 2003), Named Entity recognition (Cumby and Roth, 2003; Culotta and Sorensen, 2004) and Semantic Parsing (Moschitti, 2004).</Paragraph>
    <Paragraph position="2"> The main tree kernel advantage is the possibility to generate a high number of syntactic features and let the learning algorithm to select those most relevant for a specific application. In contrast, their major drawback are (a) the computational time complexity which is superlinear in the number of tree nodes and (b) the accuracy that they produce is often lower than the one provided by linear models on manually designed features.</Paragraph>
    <Paragraph position="3"> To solve problem (a), a linear complexity algorithm for the subtree (ST) kernel computation, was designed in (Vishwanathan and Smola, 2002).</Paragraph>
    <Paragraph position="4"> Unfortunately, the ST set is rather poorer than the one generated by the subset tree (SST) kernel designed in (Collins and Duffy, 2002). Intuitively, an ST rooted in a node n of the target tree always contains all n's descendants until the leaves. This does not hold for the SSTs whose leaves can be internal nodes.</Paragraph>
    <Paragraph position="5"> To solve the problem (b), a study on different tree substructure spaces should be carried out to derive the tree kernel that provide the highest accuracy. On the one hand, SSTs provide learning algorithms with richer information which may be critical to capture syntactic properties of parse trees as shown, for example, in (Zelenko et al., 2003; Moschitti, 2004). On the other hand, if the SST space contains too many irrelevant features, overfitting may occur and decrease the classification accuracy (Cumby and Roth, 2003). As a consequence, the fewer features of the ST approach may be more appropriate.</Paragraph>
    <Paragraph position="6"> In this paper, we aim to solve the above problems. We present (a) an algorithm for the evaluation of the ST and SST kernels which runs in linear average time and (b) a study of the impact of diverse tree kernels on the accuracy of Support Vector Machines (SVMs).</Paragraph>
    <Paragraph position="7"> Our fast algorithm computes the kernels between two syntactic parse trees in O(m + n) average time, where m and n are the number of nodes in the two trees. This low complexity allows SVMs to carry out experiments on hundreds of thousands of training instances since it is not higher than the complexity of the polynomial ker- null nel, widely used on large experimentation e.g.</Paragraph>
    <Paragraph position="8"> (Pradhan et al., 2004). To confirm such hypothesis, we measured the impact of the algorithm on the time required by SVMs for the learning of about 122,774 predicate argument examples annotated in PropBank (Kingsbury and Palmer, 2002) and 37,948 instances annotated in FrameNet (Fillmore, 1982).</Paragraph>
    <Paragraph position="9"> Regarding the classification properties, we studied the argument labeling accuracy of ST and SST kernels and their combinations with the standard features (Gildea and Jurafsky, 2002). The results show that, on both PropBank and FrameNet datasets, the SST-based kernel, i.e. the richest in terms of substructures, produces the highest SVM accuracy. When SSTs are combined with the manual designed features, we always obtain the best figure classifier. This suggests that the many fragments included in the SST space are relevant and, since their manual design may be problematic (requiring a higher programming effort and deeper knowledge of the linguistic phenomenon), tree kernels provide a remarkable help in feature engineering.</Paragraph>
    <Paragraph position="10"> In the remainder of this paper, Section 2 describes the parse tree kernels and our fast algorithm. Section 3 introduces the predicate argument classification problem and its solution. Section 4 shows the comparative performance in term of the execution time and accuracy. Finally, Section 5 discusses the related work whereas Section 6 summarizes the conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML