File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1031_metho.xml

Size: 4,888 bytes

Last Modified: 2025-10-06 14:07:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1031">
  <Title>Guidelines for Annotating Temporal Information</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. SEMANTIC DISTINCTIONS
</SectionTitle>
    <Paragraph position="0"> Three different kinds of time values are represented: points in time (answering the question &amp;quot;when?&amp;quot;), durations (answering &amp;quot;how long?&amp;quot;), and frequencies (answering &amp;quot;how often?&amp;quot;).</Paragraph>
    <Paragraph position="1"> Points in time are calendar dates and times-of-day, or a combination of both, e.g., Monday 3 pm, Monday next week, a Friday, early Tuesday morning, the weekend. These are all represented with values (the tag attribute VAL) in the ISO format, which allows for representation of date of the month, month of the year, day of the week, week of the year, and time of day, e.g., &lt;TIMEX2 VAL=&amp;quot;2000-11-29-T16:30&amp;quot;&gt;4:30 p.m. yesterday afternoon&lt;/TIMEX2&gt;.</Paragraph>
    <Paragraph position="2"> Durations also use the ISO format to represent a period of time.</Paragraph>
    <Paragraph position="3"> When only the period of time is known, the value is represented as a duration, e.g., &lt;TIMEX2 VAL=&amp;quot;P3D&amp;quot;&gt;a three-day&lt;/TIMEX2&gt; visit.</Paragraph>
    <Paragraph position="4"> Frequencies reference sets of time points rather than particular points. SET and GRANULARITY attributes are used for such expressions, with the PERIODICITY attribute being used for regularly recurring times, e.g., &lt;TIMEX2 VAL=&amp;quot;XXXX-WXX-2&amp;quot;  The annotation scheme also addresses several semantic problems characteristic of temporal expressions: Fuzzy boundaries. Expressions like Saturday morning and Fall are fuzzy in their intended value with respect to when the time period starts and ends; the early 60's is fuzzy as to which part of the 1960's is included. Our format for representing time values includes parameters such as FA (for Fall), EARLY (for early, etc.), PRESENT_REF (for today, current, etc.), among others.</Paragraph>
    <Paragraph position="5"> For example, we have &lt;TIMEX2 VAL=&amp;quot;1990-SU&amp;quot;&gt;Summer of 1990&lt;/TIMEX2&gt;. Fuzziness in modifiers is also represented, e.g., &lt;TIMEX2 VAL=&amp;quot;1990&amp;quot; MOD=&amp;quot;BEFORE&amp;quot;&gt;more than a decade ago&lt;/TIMEX2&gt;. The intent here is that a given application may choose to assign specific values to these parameters if desired; the guidelines themselves don't dictate the specific values.</Paragraph>
    <Paragraph position="6"> Non-Specificity. Our scheme directs the annotator to represent the values, where possible, of temporal expressions that do not indicate a specific time. These non-specific expressions include generics, which state a generalization or regularity of some kind, e.g., &lt;TIMEX2 VAL=&amp;quot;XXXX-04&amp;quot; NON_SPECIFIC=&amp;quot;YES&amp;quot;&gt;April&lt;/TIMEX2&gt; is usually wet, and non-specific indefinites, like &lt;TIMEX2 VAL=&amp;quot;1999-06-XX&amp;quot; NON_SPECIFIC=&amp;quot;YES&amp;quot; GRANULARITY=&amp;quot;G1D&amp;quot;&gt;a sunny day in &lt;TIMEX2 VAL=&amp;quot;1999-06&amp;quot;&gt;June&lt;/TIMEX2&gt;&lt;/TIMEX2&gt;.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. USEFULNESS
</SectionTitle>
    <Paragraph position="0"> Based on the guidelines, we have annotated a small reference corpus, consisting of 35,000 words of newspaper text and 78,000 words of broadcast news [TDT2 1999]. Portions of this corpus were used to train and evaluate a time tagger with a reported F-measure of .83 [Mani and Wilson 2000]; the corpus has also been used to order events for summarization.</Paragraph>
    <Paragraph position="1"> Others have used temporal annotation schemes for the much more constrained domain of meeting scheduling, e.g., [Wiebe et al.</Paragraph>
    <Paragraph position="2"> 1998], [Alexandersson et al. 1997], [Busemann et al. 1997]; our scheme has been applied to such domains as well. In particular, we have begun annotation of the 'Enthusiast' corpus of meeting scheduling dialogs used at CMU and by [Wiebe et al. 1998]. Only minor revisions to the guidelines' rules for tag extent have so far been required for these dialogs.</Paragraph>
    <Paragraph position="3"> This annotation scheme is also being leveraged in the Automatic Content Extraction (ACE) program of the U.S. Department of Defense, whose focus is on extraction of time-dependent relations between pairs of 'entities' (persons, organizations, etc.).</Paragraph>
    <Paragraph position="4"> Finally, initial feedback from Machine Translation system grammar writers [Levin, personal communication] indicates that the guidelines were found to be useful in extending an existing interlingua for machine translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML