File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3222_intro.xml
Size: 4,884 bytes
Last Modified: 2025-10-06 14:02:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3222"> <Title>The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this work we are concerned with building statistical models for parse disambiguation - choosing a correct analysis out of the possible analyses for a sentence. Many machine learning algorithms for classification and ranking require data to be represented as real-valued vectors of fixed dimensionality. Natural language parse trees are not readily representable in this form, and the choice of representation is extremely important for the success of machine learning algorithms.</Paragraph> <Paragraph position="1"> For a large class of machine learning algorithms, such an explicit representation is not necessary, and it suffices to devise a kernel function a0a2a1a4a3a6a5a8a7a10a9 which measures the similarity between inputs a3 and a7 . In addition to achieving efficient computation in high dimensional representation spaces, the use of kernels allows for an alternative view on the modelling problem as defining a similarity between inputs rather than a set of relevant features.</Paragraph> <Paragraph position="2"> In previous work on discriminative natural language parsing, one approach has been to define features centered around lexicalized local rules in the trees (Collins, 2000; Shen and Joshi, 2003), similar to the features of the best performing lexicalized generative parsing models (Charniak, 2000; Collins, 1997). Additionally non-local features have been defined measuring e.g. parallelism and complexity of phrases in discriminative log-linear parse ranking models (Riezler et al., 2000).</Paragraph> <Paragraph position="3"> Another approach has been to define tree kernels: for example, in (Collins and Duffy, 2001), the all-subtrees representation of parse trees (Bod, 1998) is effectively utilized by the application of a fast dynamic programming algorithm for computing the number of common subtrees of two trees. Another tree kernel, more broadly applicable to Hierarchical Directed Graphs, was proposed in (Suzuki et al., 2003). Many other interesting kernels have been devised for sequences and trees, with application to sequence classification and parsing. A good overview of kernels for structured data can be found in (Gaertner et al., 2002).</Paragraph> <Paragraph position="4"> Here we propose a new representation of parse trees which (i) allows the localization of broader useful context, (ii) paves the way for exploring kernels, and (iii) achieves superior disambiguation accuracy compared to models that use tree representations centered around context-free rules.</Paragraph> <Paragraph position="5"> Compared to the usual notion of discriminative models (placing classes on rich observed data) discriminative PCFG parsing with plain context free rule features may look naive, since most of the features (in a particular tree) make no reference to observed input at all. The standard way to address this problem is through lexicalization, which puts an element of the input on each tree node, so all features do refer to the input. This paper explores an alternative way of achieving this that gives a broader view of tree contexts, extends naturally to exploring kernels, and performs better.</Paragraph> <Paragraph position="6"> We represent parse trees as lists of paths (leaf projection paths) from words to the top level of the tree, which includes both the head-path (where the word is a syntactic head) and the non-head path. This allows us to capture for example cases of non-head dependencies which were also discussed by (Bod, 1998) and were used to motivate large subtree features, such as &quot;more careful than his sister&quot; where &quot;careful&quot; is analyzed as head of the adjective phrase, but &quot;more&quot; licenses the &quot;than&quot; comparative clause. This representation of trees as lists of projection plan on that.</Paragraph> <Paragraph position="7"> paths (strings) allows us to explore string kernels on these paths and combine them into tree kernels.</Paragraph> <Paragraph position="8"> We apply these ideas in the context of parse disambiguation for sentence analyses produced by a Head-driven Phrase Structure Grammar (HPSG), the grammar formalism underlying the Redwoods corpus (Oepen et al., 2002). HPSG is a modern constraint-based lexicalist (or &quot;unification&quot;) grammar formalism.1 We build discriminative models using Support Vector Machines for ranking (Joachims, 1999). We compare our proposed representation to previous approaches and show that it leads to substantial improvements in accuracy.</Paragraph> </Section> class="xml-element"></Paper>