File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1064_intro.xml
Size: 5,405 bytes
Last Modified: 2025-10-06 14:02:50
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1064"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 507-514, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Hidden-Variable Models for Discriminative Reranking</Title> <Section position="2" start_page="0" end_page="507" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A number of recent approaches in statistical NLP have focused on reranking algorithms. In reranking methods, a baseline model is used to generate a set of candidate output structures for each input in training or test data. A second model, which typically makes use of more complex features than the baseline model, is then used to rerank the candidates proposed by the baseline. Reranking approaches have given improvements in accuracy on a number of NLP problems including parsing (Collins, 2000; Charniak and Johnson, 2005), machine translation (Och and Ney, 2002; Shen et al., 2004), information extraction (Collins, 2002), and natural language generation (Walker et al., 2001).</Paragraph> <Paragraph position="1"> The success of reranking approaches depends critically on the choice of representation used by the reranking model. Typically, each candidate structure (e.g., each parse tree in the case of parsing) is mapped to a feature-vector representation. Previous work has generally relied on two approaches to representation: explicitly hand-crafted features (e.g., in Charniak and Johnson (2005)) or features defined through kernels (e.g., see Collins and Duffy (2002)).</Paragraph> <Paragraph position="2"> This paper describes a new method for the representation of NLP structures within reranking approaches. We build on the intuition that lexical items in natural language often fall into word clusters (for example, president and chairman might belong to the same cluster) or fall into distinct word senses (e.g., bank might have two distinct senses). Our method involves a hidden-variable model, where the hidden variables correspond to an assignment of words to either clusters or word-senses. Lexical items are automatically assigned their hidden values using unsupervised learning within a discriminative reranking approach.</Paragraph> <Paragraph position="3"> We make use of a conditional log-linear model for our task. Formally, hidden variables within the log-linear model consist of global assignments, where a global assignment entails an assignment of every word in the sentence to some hidden cluster or sense value. The number of such global assignments grows exponentially fast with the length of the sentence being processed. Training and decoding with the model requires summing over the exponential number of possible global assignments, a major technical challenge in our model. We show that the required summations can be computed efficiently and exactly using dynamic-programming methods (i.e., the belief propagation algorithm for Markov random fields (Yedidia et al., 2003)) under certain restrictions on features in the model.</Paragraph> <Paragraph position="4"> Previous work on reranking has made heavy use of lexical statistics, but has treated lexical items as atoms. The motivation for our method comes from the observation that statistics based on lexical items are critical, but that these statistics suffer considerably from problems of data sparsity and word- null sense polysemy. Our model has the ability to alleviate data sparsity issues by learning to assign words to word clusters, and can mitigate problems with word-sense polysemy by learning to assign lexical items to underlying word senses based upon contextual information. A critical difference between our method and previous work on unsupervised approaches to word-clustering or word-sense discovery is that our model is trained using a discriminative criterion, where the assignment of words to clusters or senses is driven by the reranking task in question.</Paragraph> <Paragraph position="5"> As a case study, in this paper we focus on syntactic parse reranking. We describe three model types that can be captured by our approach. The first method emulates a clustering operation, where the aim is to place similar words (e.g., president and chairman) into the same cluster. The second method emulates a refinement operation, where the aim is to recover distinct senses underlying a single word (for example, distinct senses underlying the noun bank).</Paragraph> <Paragraph position="6"> The third definition makes use of an existing ontology (i.e., WordNet (Miller et al., 1993)). In this case the set of possible hidden values for each word corresponds to possible WordNet senses for the word.</Paragraph> <Paragraph position="7"> In experimental results on the Penn Wall Street Journal treebank parsing domain, the hidden-variable model gives an F-measure improvement of [?] 1.25% beyond a baseline model (the parser described in Collins (1999)), and gives an [?] 0.25% improvement beyond the reranking approach described in Collins (2000). Although the experiments in this paper are focused on parsing, the techniques we describe generalize naturally to other NLP structures such as strings or labeled sequences. We discuss this point further in Section 6.1.</Paragraph> </Section> class="xml-element"></Paper>