File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1031_intro.xml
Size: 4,626 bytes
Last Modified: 2025-10-06 14:01:05
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1031"> <Title>Guidelines for Annotating Temporal Information</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> The processing of temporal information poses numerous challenges for NLP. Progress on these challenges may be accelerated through the use of corpus-based methods. This paper introduces a set of guidelines for annotating time expressions with a canonicalized representation of the times they refer to.</Paragraph> <Paragraph position="1"> Applications that can benefit from such an annotated corpus include information extraction (e.g., normalizing temporal references for database entry), question answering (answering &quot;when&quot; questions), summarization (temporally ordering information), machine translation (translating and normalizing temporal references), and information visualization (viewing event chronologies).</Paragraph> <Paragraph position="2"> Our annotation scheme, described in detail in [Ferro et al. 2000], has several novel features: * It goes well beyond the one used in the Message Understanding Conference [MUC7 1998], not only in terms of the range of expressions that are flagged, but, also, more importantly, in terms of representing and normalizing the time values that are communicated by the expressions.</Paragraph> <Paragraph position="3"> * In addition to handling fully-specified time expressions [e.g., September 3 rd , 1997), it also handles context-dependent expressions. This is significant because of the ubiquity of context-dependent time expressions; a recent corpus study [Mani and Wilson 2000] revealed that more than two-thirds of time expressions in print and broadcast news were context-dependent ones. The context can be local (within the same sentence), e.g., In 1995, the months of June and July were devilishly hot, or global (outside the sentence), e.g., The hostages were beheaded that afternoon. A subclass of these context-dependent expressions are 'indexical' expressions, which require knowing when the speaker is speaking to determine the intended time value, e.g., now, today, yesterday, tomorrow, next Tuesday, two weeks ago, etc.</Paragraph> <Paragraph position="4"> Our scheme differs from the recent scheme of [Setzer and Gaizauskas 2000] in terms of our in-depth focus on representations for the values of specific classes of time expressions, and in the application of our scheme to a variety of different genres, including print news, broadcast news, and meeting scheduling dialogs.</Paragraph> <Paragraph position="5"> The annotation scheme has been designed to meet the following criteria: Simplicity with precision: We have tried to keep the scheme simple enough to be executed confidently by humans, and yet precise enough for use in various natural language processing tasks.</Paragraph> <Paragraph position="6"> Naturalness: We assume that the annotation scheme should reflect those distinctions that a human could be expected to reliably annotate, rather than reflecting an artificially-defined smaller set of distinctions that automated systems might be expected to make. This means that some aspects of the annotation will be well beyond the reach of current systems.</Paragraph> <Paragraph position="7"> Expressiveness: The guidelines require that one specify time values as fully as possible, within the bounds of what can be confidently inferred by annotators. The use of 'parameters' and the representation of 'granularity' (described below) are tools to help ensure this.</Paragraph> <Paragraph position="8"> Reproducibility: In addition to leveraging the [ISO-8601 1997] format for representing time values, we have tried to ensure consistency among annotators by providing an example-based approach, with each guideline closely tied to specific examples. While the representation accommodates both points and intervals, the guidelines are aimed at using the point representation to the extent possible, further helping enforce consistency.</Paragraph> <Paragraph position="9"> The annotation process is decomposed into two steps: flagging a temporal expression in a document, and identifying the time value that the expression designates, or that the speaker intends for it to designate. The flagging of temporal expressions is restricted to those temporal expressions which contain a reserved time word used in a temporal sense, called a 'lexical trigger', which include words like day, week, weekend, now, Monday, current, future, etc.</Paragraph> </Section> class="xml-element"></Paper>