File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2024_intro.xml
Size: 3,685 bytes
Last Modified: 2025-10-06 14:03:30
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2024"> <Title>NER Systems that Suit User's Preferences: Adjusting the Recall-Precision Trade-off for Entity Extraction</Title> <Section position="2" start_page="0" end_page="93" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Named entity recognition (NER) is the task of identifying named entities in free text--typically personal names, organizations, gene-protein entities, and so on. Recently, sequential learning methods, such as hidden Markov models (HMMs) and conditional random fields (CRFs), have been used successfully for a number of applications, including NER (Sha and Pereira, 2003; Pinto et al., 2003; Mc-callum and Lee, 2003). In practice, these methods provide imperfect performance: precision and recall, even for well-studied problems on clean well-written text, reach at most the mid-90's. While performance of NER systems is often evaluated in terms of F1 measure (a harmonic mean of precision and recall), this measure may not match user preferences regarding precision and recall. Furthermore, learned NER models may be sub-optimal also in terms of F1, as they are trained to optimize other measures (e.g., loglikelihood of the training data for CRFs).</Paragraph> <Paragraph position="1"> Obviously, different applications of NER have different requirements for precision and recall. A system might require high precision if it is designed to extract entities as one stage of fact-extraction, where facts are stored directly into a database. On the other hand, a system that generates candidate extractions which are passed to a semi-automatic curation system might prefer higher recall. In some domains, such as anonymization of medical records, high recall is essential.</Paragraph> <Paragraph position="2"> One way to manipulate an extractor's precision-recall tradeoff is to assign a confidence score to each extracted entity and then apply a global threshold to confidence level. However, confidence thresholding of this sort cannot increase recall. Also, while confidence scores are straightforward to compute in many classification settings, there is no inherent mechanism for computing confidence of a sequential extractor. Culotta and McCallum (2004) suggest several methods for doing this with CRFs.</Paragraph> <Paragraph position="3"> In this paper, we suggest an alternative simple method for exploring and optimizing the relationship between precision and recall for NER systems. In particular, we describe and evaluate a technique called &quot;extractor tweaking&quot; that optimizes a learned extractor with respect to a specific evaluation metric. In a nutshell, we directly tweak the threashold term that is part of any linear classifier, including sequential extractors. Though simple, this approach has not been empirically evaluated before, to our knowledge. Further, although sequential extractors such as HMMs and CRFs are state-of-the-art methods for tasks like NER, there has been little prior research about tuning these extractors' performance to suit user preferences. The suggested algorithm optimizes the system performance per a user-provided evaluation criterion, using a linear search procedure.</Paragraph> <Paragraph position="4"> Applying this procedure is not trivial, since the underlying function is not smooth. However, we show that the system's precision-recall rate can indeed be tuned to user preferences given labelled data using this method. Empirical results are presented for a particular NER task--recognizing person names, for three corpora, including email and newswire text.</Paragraph> </Section> class="xml-element"></Paper>