File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0402_intro.xml
Size: 4,417 bytes
Last Modified: 2025-10-06 14:03:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0402"> <Title>Feature Engineering and Post-Processing for Temporal Expression Recognition Using Conditional Random Fields</Title> <Section position="4" start_page="9" end_page="10" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="9" end_page="9" type="sub_section"> <SectionTitle> 2.1 Task Description </SectionTitle> <Paragraph position="0"> In recent years, temporal aspects of information access have received increasing amounts of attention, especially as it relates to news documents. In addition to factual content, news documents have a temporal context, reporting events that happened, are happening, or will happen in relation to the publication date. Temporal document retrieval concerns the inclusion of both the document publication date and the in-text temporal expressions in the retrieval model (Kalczynski and Chou, 2005). The task in which we are interested in this paper is identifying the latter type of expressions, i.e., extraction of temporal expressions. TERN, the Temporal Expression Recognition and Normalization Evaluation, is organized under the auspices of the Automatic Content Extraction program (ACE, http://www.nist.</Paragraph> <Paragraph position="1"> gov/speech/tests/ace/). The TERN evaluation provides specific guidelines for the identification and normalization of timexes, as well as tagged corpora for training and testing and evaluation software. These guidelines and resources were used for the experiments described below.</Paragraph> <Paragraph position="2"> The TERN evaluation consisted of two distinct tasks: recognition and normalization. Timex recognition involves correctly detecting and delimiting timexes in text. Normalization involves assigning recognized timexes a fully qualified temporal value.</Paragraph> <Paragraph position="3"> Our focus in this paper is on the recognition task; it is defined, for human annotators, in the TIDES TIMEX2 annotation guidelines (Ferro et al., 2004).</Paragraph> <Paragraph position="4"> The recognition task is performed with respect to corpora of transcribed broadcast news speech and news wire texts from ACE 2002, ACE 2003, and ACE 2004, marked up in SGML format and, for the training set, hand-annotated for TIMEX2s. An official scorer that evaluates the recognition performance is provided as part of the TERN evaluation. It computes precision, recall, and F-measure both for TIMEX2 tags (i.e., for overlap with a gold standard TIMEX2 element) and for extent of TIMEX2 elements (i.e., exact match of entire timexes).</Paragraph> </Section> <Section position="2" start_page="9" end_page="10" type="sub_section"> <SectionTitle> 2.2 Conditional Random Fields </SectionTitle> <Paragraph position="0"> We view the recognition of timexes task as a sequence labeling task in which each token in the text is classified as being either a timex or not. One machine learning technique that has recently been introduced to tackle the problem of labeling and segmenting sequence data is conditional random fields (CRFs, (Lafferty et al., 2001)). CRFs are conditional probability distributions that take the form of exponential models. The special case of linear chain CRFs, which takes the following form, has been</Paragraph> <Paragraph position="2"> where Z (x) is the normalization factor, X = {x1,...,xn} is the observation sequence, Y = {y1,...,yT} is the label sequences, fk and lk are the feature functions and their weights respectively. An important property of these models is that probabilities are computed based on a set of feature functions, i.e., fk (usually binary valued), which are defined on both the observation X and label sequences Y . These feature functions describe different aspect of the data and may overlap, providing a flexible way of describing the task.</Paragraph> <Paragraph position="3"> CRFs have been shown to perform well in a number of natural language processing applications, such as POS tagging (Lafferty et al., 2001), shallow parsing or NP chunking (Sha and Pereira, 2003), and named entity recognition (McCallum and Li, 2003).</Paragraph> <Paragraph position="4"> In this paper, CRFs are applied to the recognition of timexes; in our experiments we used the minorThird implementation of CRFs (Cohen, 2004).</Paragraph> </Section> </Section> class="xml-element"></Paper>