File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1087_intro.xml
Size: 4,027 bytes
Last Modified: 2025-10-06 14:02:56
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1087"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 692-699, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Maximum Expected F-Measure Training of Logistic Regression Models</Title> <Section position="2" start_page="0" end_page="692" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Log-linear models have been used in many areas of Natural Language Processing (NLP) and Information Retrieval (IR). Scenarios in which log-linear models have been applied often involve simple binary classification decisions or probability assignments, as in the following three examples: Ratnaparkhi et al.</Paragraph> <Paragraph position="1"> (1994) consider a restricted form of the prepositional phrase attachment problem where attachment decisions are binary; Ittycheriah et al. (2003) reduce entity mention tracking to the problem of modeling the probability of two mentions being linked; and Greiff and Ponte (2000) develop models of probabilistic information retrieval that involve binary decisions of relevance. What is common to all three approaches is the application of log-linear models to binary classification tasks.1 As Ratnaparkhi (1998, 1These kinds of log-linear models are also known among the NLP community as &quot;maximum entropy models&quot; (Berger et al., p. 27f.) points out, log-linear models of binary response variables are equivalent to, and in fact mere notational variants of, logistic regression models.</Paragraph> <Paragraph position="2"> In this paper we focus on binary classification tasks, and in particular on the loss or utility associated with classification decisions. The three problems mentioned before - prepositional phrase attachment, entity mention linkage, and relevance of a document to a query - differ in one crucial aspect: The first is evaluated in terms of accuracy or, equivalently, symmetric zero-one loss; but the second and third are treated as information extraction/retrieval problems and evaluated in terms of recall and precision. Recall and precision are combined into a single overall utility function, the well-known F-measure.</Paragraph> <Paragraph position="3"> It may be desirable to estimate the parameters of a logistic regression model by maximizing F-measure during training. This is analogous, and in a certain sense equivalent, to empirical risk minimization, which has been used successfully in related areas, such as speech recognition (Rahim and Lee, 1997), language modeling (Paciorek and Rosenfeld, 2000), and machine translation (Och, 2003).</Paragraph> <Paragraph position="4"> The novel contribution of this paper is a training procedure for (approximately) maximizing the expected F-measure of a probabilistic classifier based on a logistic regression model. We formulate a vector-valued utility function which has a well-defined expected value; F-measure is then a rational function of this expectation and can be maximized numerically under certain conventional regularizing assumptions.</Paragraph> <Paragraph position="5"> 1996; Ratnaparkhi, 1998). This is an unfortunate choice of terminology, because the term &quot;maximum entropy&quot; does not uniquely determine a family of models unless the constraints subject to which entropy is being maximized are specified. We begin with a review of logistic regression (Section 2) and then discuss the use of F-measure for evaluation (Section 3). We reformulate F-measure as a function of an expected utility (Section 4) which is maximized during training (Section 5). We discuss the differences between our parameter estimation technique and maximum likelihood training on a toy example (Section 6) as well as on a real extraction task (Section 7). We conclude with a discussion of further applications and generalizations (Section 8).</Paragraph> </Section> class="xml-element"></Paper>