File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1309_intro.xml

Size: 10,929 bytes

Last Modified: 2025-10-06 14:01:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1309">
  <Title>From Temporal Expressions to Temporal Information: Semantic Tagging of News Messages</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper describes a semantic tagging system that extracts temporal information from news messages. Temporal expressions are defined for this system as chunks of text that express some sort of direct or inferred temporal information.</Paragraph>
    <Paragraph position="1"> The set of these expressions investigated in the present paper includes dates (e.g. 08.04.2001), prepositional phrases (PPs) containing some time expression (e.g. on Friday), and verbs referring to a situation (e.g. opened). Related work by Mani and Wilson (2000) focuses only on the core temporal expressions neglecting the temporal information conveyed by prepositions (e.g. Friday vs. by Friday).</Paragraph>
    <Paragraph position="2"> The main part of the system is a temporal expression tagger that employs finite state transducers based on hand-written rules. The tagger was trained on economic news articles obtained from two German news papers and an on-line news agency (Financial Times Deutschland, die tageszeitung and www.comdirect.de).</Paragraph>
    <Paragraph position="3"> Based on the syntactic classification of temporal expressions a semantic representation of the extracted chunks is proposed. A clear-cut distinction between the syntactic tagging process and the semantic interpretation is maintained. The advantage of this approach is that a second level is created that represents the meaning of the extracted chunks. Having defined the semantic representation of the temporal expressions, further inferences, in particular on temporal relations, can be drawn. Establishing the temporal relations between all events mentioned by a news article is the ultimate goal of this enterprise. However, at the current stage of this work the semantic analysis is still in progress. For the time being, we focus on the anchoring of the temporal expressions in the absolute time line and present an already substantial subset of a full semantics that will eventually cover the entire set of temporal expressions extracted. null Finally, the evaluation of the temporal expression tagger provides precision and recall rates for tagging temporal expressions and drawing temporal inferences.</Paragraph>
    <Paragraph position="4"> 2 Representing time in news articles Since we focus on a particular text domain (i.e. news articles), the classification of temporal expressions can be kept to a manageable set of classes.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Classification of temporal expressions
</SectionTitle>
      <Paragraph position="0"> The main distinction we make is between time-denoting and event-denoting expressions. The first group comprises chunks expressing temporal information that can be stated with reference to a calendar or clock system. Syntactically speaking, these expressions are mainly expressed by prepositional, adverbial or noun phrases (e.g. on Friday or today or the fourth quarter).</Paragraph>
      <Paragraph position="1"> The second group, event-denoting expressions, refers to events. These expressions have an implicit temporal dimension, since all situations possess a temporal component. For these expressions, however, there is no direct or indirect link to the calendar or clock system. These expressions are verb or noun phrases (e.g. increased or the election).</Paragraph>
      <Paragraph position="2">  Temporal reference can be expressed in three different ways: Explicit reference. Date expressions such as 08.04.2001 refer explicitly to entries of a calendar system. Also time expressions such as 3 p.m. or Midnight denote a precise moment in our temporal representation system.</Paragraph>
      <Paragraph position="3"> Indexical reference. All temporal expressions that can only be evaluated via a given index time are called indexical. Expressions such as today, by last week or next Saturday need to be evaluated wrt. the article's time stamp.</Paragraph>
      <Paragraph position="4"> Vague reference. Some temporal expressions express only vague temporal information and it is rather difficult to precisely place the information expressed on a time line. Expressions such as in several weeks, in the evening or by Saturday the latest cannot be represented by points or exact intervals in time.</Paragraph>
      <Paragraph position="5"> For the given domain of news article, the extraction of a time stamp for the given article is very important. This time stamp represents the production time of the news information and is used by the other temporal expressions as an index time to compute the correct temporal meaning of the expression. Note that an explicit date expression such as 24.12. can only be evaluated wrt. the year that the article was written. This means that even an explicit temporal expression can contain some degree of indexicality.</Paragraph>
      <Paragraph position="6">  Two types of event-denoting expressions have to be distinguished, on the one hand, sentences, and, on the other, specific noun phrases. In the former case, the verb is the lexical bearer of information about the event in question, in the latter case, specific nouns, especially those created by nominalisation, refer to an event.</Paragraph>
      <Paragraph position="7"> Since temporal information is the topic of the system described in this paper, only a subset of event-denoting nouns have to be considered.</Paragraph>
      <Paragraph position="8"> These expressions -- as election in the phrase after the election -- which serve as temporal reference pointers in building the temporal structure of a news, can be marked by a specific attribute in their lexical entry. Furthermore, in the text classes we have investigated, there is a small number of event nouns, which are used as domain dependent pointers to elements of temporal structures. For the domain of business and stock market news, phrases such as opening of the stock exchange, opening bell, or the close are examples of domain specific event expressions.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Representation of temporal information:
</SectionTitle>
      <Paragraph position="0"> the time domain The primary purpose of the present paper is to anchor the temporal information obtained from natural language expressions in news messages in absolute time, i.e. in a linearly ordered set of abstract time-entities, which we call time-set in the following. One of the major tasks in this anchoring process is to augment the temporal information in case of indexical and vague temporal descriptions (see section 4.3 for more details). Since these expressions do not specify an individual time-entity of the time-set, it is necessary to add temporal information until the temporal entity build up from natural language is fully specified, i.e. can be anchored in the time-set.</Paragraph>
      <Paragraph position="1"> 2.2.1 The granular system of temporal entities The temporal information obtained from news messages is organised in a granular system of temporal entities including such granularity levels as GL-day,GL-week,GL-month and GLyear.1 Individual days are anchored by a 1In the present paper we focus on the conception of granularity level in semantic and pragmatic inferences. Therefore, we do not discuss the formal notions of granular systems for temporal entities here. Compare, e.g. Bettini et al. (2000), for a framework of temporal granularity, which could be used for the purposes we discuss here.</Paragraph>
      <Paragraph position="2"> date, e.g. date(2001,3,23), on the time line, i.e. the time-set. Further information, for example, the day of the week, can also be included by an additional slot of the time entity: time = ['Fri', date(2001,3,23)]. Time entities of coarser granularity levels, e.g. weeks, are represented on the basis of intervals, which can be determined by a start, that is an entity ofGL-day, and a specific duration: time = ['Mon', date(2001,4,2), '7 days' ]. 2 The concept of temporal granularity is reflected linguistically, for example, in the use of demonstratives as determiners of time expressions in German: dieser Freitag ('this Friday') refers to that Friday which is located in the current week (i.e. the time entity of the next coarser level of temporal granularity). The same phenomenon holds with dieser Monatserste ('this first day of the month') In the following we will apply the granularity structure of temporal expressions only with respect to the finer than - coarser than relation between levels of granularity, which is different from the is part of relation between temporal entities. For example, whereas between days and weeks there is a unique functional relationship, namely that there is exactly one week (as standard calendar unit) that an individual day is a part of, a week can temporally overlap with one or two months (Technically, overlap can be realized by temporal relations of Allen-style; see Allen (1983)). Nevertheless, GL-week finer than GL-month holds in the granularity system.3 2Whether the GL-week information remains implicit, i.e. is inferable from duration, or is made explicit, i.e. coded by a GL-week-stamp, depends on some design decisions dependent on the conceptual richness of domain modelling. For example, in a standardised world of ISO-weeks, which start on Monday, only, it is not necessary to use GL-weekstamps. On the other hand, if ISO-weeks, and business weeks--of five-day length-- are conceptual alternatives, then it is appropriate to use explicit granularity-level stamps. 3The phenomena of overlapping temporal entities of different granularity systems, for example the system of calendar time-entities vs. the system of business time-entities, or the astronomical system of seasons of the year vs. the meteorological seasons of the year are especially relevant for processing vague and ambiguous temporal expressions. Due to the temporal and spatial limitations of this paper, we can not go into the details here.</Paragraph>
      <Paragraph position="3">  Temporal relations are explicitely marked by temporal prepositions (e.g. before, on or by). We use the following seven temporal relation: before, after, incl, at, starts, finishes, excl. The preposition on as in on Friday, for instance, denotes the inclusion relation incl, whereas the preposition by as in by Friday is represented as finishes.</Paragraph>
      <Paragraph position="4"> Note that the seven temporal relations employed by the current version are equivalent to sets of Allen's interval relations (Allen, 1983).4</Paragraph>
      <Paragraph position="6"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML