File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1022_intro.xml
Size: 3,815 bytes
Last Modified: 2025-10-06 14:03:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1022"> <Title>Coarse-to-fine n-best parsing and MaxEnt discriminative reranking</Title> <Section position="2" start_page="0" end_page="173" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> We describe a reranking parser which uses a regularized MaxEnt reranker to select the best parse from the 50-best parses returned by a generative parsing model. The 50-best parser is a probabilistic parser that on its own produces high quality parses; the maximum probability parse trees (according to the parser's model) have an f-score of 0.897 on section 23 of the Penn Treebank (Charniak, 2000), which is still state-of-the-art. However, the 50 best (i.e., the 50 highest probability) parses of a sentence often contain considerably better parses (in terms of f-score); this paper describes a 50-best parsing algorithm with an oracle f-score of 96.8 on the same data.</Paragraph> <Paragraph position="1"> The reranker attempts to select the best parse for a sentence from the 50-best list of possible parses for the sentence. Because the reranker only has to consider a relatively small number of parses per sentences, it is not necessary to use dynamic programming, which permits the features to be essentially arbitrary functions of the parse trees. While our reranker does not achieve anything like the oracle f-score, the parses it selects do have an f-score of 91.0, which is considerably better than the maximum probability parses of the n-best parser.</Paragraph> <Paragraph position="2"> In more detail, for each string s the n-best parsing algorithm described in section 2 returns the n highest probability parses Y(s) = {y1(s),...,yn(s)} together with the probability p(y) of each parse y according to the parser's probability model. The number n of parses was set to 50 for the experiments described here, but some simple sentences actually received fewer than 50 parses (so n is actually a function of s). Each yield or terminal string in the training, development and test data sets is mapped to such an n-best list of parse/probability pairs; the cross-validation scheme described in Collins (2000) was used to avoid training the n-best parser on the sentence it was being used to parse.</Paragraph> <Paragraph position="3"> A feature extractor, described in section 3, is a vector of m functions f = (f1,...,fm), where each fj maps a parse y to a real number fj(y), which is the value of the jth feature on y. So a feature extractor maps each y to a vector of feature values f(y) = (f1(y),...,fm(y)).</Paragraph> <Paragraph position="4"> Our reranking parser associates a parse with a score vth(y), which is a linear function of the feature values f(y). That is, each feature fj is associated with a weight thj, and the feature values and weights define the score vth(y) of each parse y as follows:</Paragraph> <Paragraph position="6"> thjfj(y).</Paragraph> <Paragraph position="7"> Given a string s, the reranking parser's output ^y(s) on string s is the highest scoring parse in the n-best parses Y(s) for s, i.e.,</Paragraph> <Paragraph position="9"> vth(y).</Paragraph> <Paragraph position="10"> The feature weight vector th is estimated from the labelled training corpus as described in section 4. Because we use labelled training data we know the correct parse ystar(s) for each sentence s in the training data. The correct parse ystar(s) is not always a member of the n-best parser's output Y(s), but we can identify the parses Y+(s) in Y(s) with the highest f-scores. Informally, the estimation procedure finds a weight vector th that maximizes the score vth(y) of the parses y [?] Y+(s) relative to the scores of the other parses in Y(s), for each s in the training data.</Paragraph> </Section> class="xml-element"></Paper>