File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-1309_metho.xml
Size: 16,426 bytes
Last Modified: 2025-10-06 14:07:44
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1309"> <Title>From Temporal Expressions to Temporal Information: Semantic Tagging of News Messages</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Extraction of temporal information </SectionTitle> <Paragraph position="0"> Similar to other approaches to information extraction or tagging, a cascade of Finite State Transducers (FST) was employed. The following sections provides a brief introduction to this technique before the overall system architecture is described in more detail.5</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Preliminaries </SectionTitle> <Paragraph position="0"> The temporal expression chunks are extracted via an FST. FSTs are basically automata that have transitions labelled with a translation instruction.</Paragraph> <Paragraph position="1"> A label of the form a:b indicates such an translation from a to b. Take as an example the simple FST in figure 1. If the input contains the sequence of the three subsequent characters a31 , a32 , and a33 , the same output is produced with the sequence of these three characters put into brackets. The input stream &quot;FSTs are basically automata&quot; is, for instance, translated, into &quot;[FST]s are basically automata&quot;. null 4Allen (1983) proposes a temporal reasoning system that contains all 13 conceivable relations between intervals: b(efore), m(eets), o(verlaps), s(tarts), d(uring), f(inishes), the 6 reverse relations bi, mi, oi, si, di and fi and eq(ual).</Paragraph> <Paragraph position="3"/> <Paragraph position="5"/> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Classes of temporal information </SectionTitle> <Paragraph position="0"> The FSTs defined are fed by the output of a Part of Speech (POS) tagger.6 The POS tagger specifies the syntactic categories and a lemma for every word of the input text. The syntactic information is then stored in an XML file.7 Given the derived syntactic categories and the lemma information for every word of the text, several FSTs specialised into different classes of temporal expressions are run.</Paragraph> <Paragraph position="1"> Temporal Expressions. One FST consisting of 15 states and 61 arcs tags all occurrences of time- null denoting temporal expressions. The POS information stored in an XML file as well as a predefined class of temporal lemmas are used by this FST. The class of temporal lemmas used include days of the week (e.g. Friday), months (e.g. April) as well as general temporal descriptions such as midday, week or year. Since German is a very productive language regarding compound nouns, a simple morphological analysing tool was integrated into this FST as well. This tool captures expressions such as Rekordjahr ('record year') or Osterferien ('Easter holiday').</Paragraph> <Paragraph position="2"> The extracted temporal expression chunks are marked by the CHUNK tag and an attribute type = time. See the first row of table 2 for an example. Note that the attributes sem and time carry semantic information. The meaning of these values are explained in section 4. detail.</Paragraph> <Paragraph position="3"> Document time stamp. The document time stamp for a given article is crucial for the computation of almost all temporal expressions (e.g. now). In particular, this index time is indispensable for the computation of all temporal expressions that express an indexical reference (see the second row of table 2).8 8This FST consists of 7 states and 15 arcs. It also extracts the name of the newspaper or agency as indicated by the attribute ag. So far only the newspaper names and agencies Verbal descriptions. Another FST that contains 4 states and 27 arcs marks all verbs as previously tagged by the POS tagger. As already pointed out these temporal expressions denote an event. The tag for such expressions is <CHUNK type = event> </CHUNK> (see table2; third row).</Paragraph> <Paragraph position="4"> Nominal descriptions. So far there is only an experimental FST that extracts also nominal descriptions of events such as the election. More tests have to be carried out to determine a sub-set of nouns for the given domain. These nouns should then also be used to denote events mentioned in the text which can be combined with time-denoting expressions, as in after the election in May.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 System output </SectionTitle> <Paragraph position="0"> After all expressions have been tagged, an HTML file is produced highlighting the respective expressions. See the snapshot in figure 2.9 While reading the output stream from the FSTs temporal inferences are drawn by the system. In particular, expressions bearing indexical references are resolved and the event descriptions are matched with the time denoting temporal expressions.</Paragraph> <Paragraph position="1"> Note that the values for CHUNK attributes sem, time, and temp as indicated by the three examples in table 2 are PROLOG expressions. While translating the tagged text a PROLOG predicate triggers other predicates that compute the correct temporal information. An additional HTML file is also generated that contains the derived temporal information in standard ISO format, provided an explicit reference was given or was resolved.</Paragraph> <Paragraph position="2"> In the case of vague reference (e.g. afternoon) the semantic description is kept (e.g. 20:01:04:03:afternoon).10 In addition, the temporal relations holding between the events and times expressed mentioned by the article of the training set can be extracted. A future version of the temporal expressions tagger should also be capable of tagging previously unknown names. However, note that this is rather a named entity recognition task and therefore goes beyond the scope of this paper.</Paragraph> <Paragraph position="3"> 9Time-denoting expressions are indicated by a dark (or magenta) background, while event-denoting expressions are indicated by a lighter (or yellow) background. The document time stamp is tagged by a very dark (or green) background. 10Future research will focus on the temporal inferences that can be drawn with these vague descriptions taking into account the different granularity levels.</Paragraph> <Paragraph position="4"> by the text are stored as well.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Semantic descriptions and temporal </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> inferences 4.1 Semantics for temporal expressions </SectionTitle> <Paragraph position="0"> With respect to processing temporal information, the crucial distinction between time-denoting and event-denoting expressions is that event-denoting expressions lack the direct link to temporal entities. An event-denoting expression (e.g. a verb) refers to an event of a certain type. The verb to meet, for instance, can be formalised as a8a51a21a7a21a23a52a54a53a55a21a27a56a58a57 . In order to add the temporal information to the event, a function temp is defined that gives back the time when the event occurred (i.e. run-time of the event). A time-denoting expression such as on Monday that is combined with the event description carries some temporal information that can further specify the run time temp(e1) of the event e1.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Semantics for temporal prepositions </SectionTitle> <Paragraph position="0"> PPs are the carrier of temporal relations. The semantics for a preposition is, therefore, as follows:</Paragraph> <Paragraph position="2"> a21a23a60 was defined. The preposition by expresses, for instance, thefinishesrelation, as in by Friday.</Paragraph> <Paragraph position="3"> Temporal expressions that do not contain a preposition are assumed to express an inclusion relation, as in Die Pflegeversicherung war 1995 [. . . ] in Kraft getreten ('the statutory health insurance coverage of nursing care for the infirm took effect in 1995').</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Derivation of meaning </SectionTitle> <Paragraph position="0"> The temporal information expressed by a sentence as in example sequence (1) is derived via unification of the semantic attributes derived for the temporal expression chunks.</Paragraph> <Paragraph position="1"> 'The Nasdaq closed with a minus of 3.11 percent at 1782 points on Monday.' Two temporal expressions are marked by the tagger: am Montag ('on Monday') and geschlossen ('closed'). The former expression is a time-denoting expression that consists of a preposition and a time-denoting expression that is stored by the FST. The derivation of the semantics for this expression is done during the tagging process for the temporal expressions.</Paragraph> <Paragraph position="2"> First, the preposition am ('on') denoting an inclusion relation between an event and a time is processed. The expressed temporal relation is represented by a PROLOG list (i.e. [incl,[E,T]]). After having processed the following noun referring to a time (i.e. Monday), the following semantic representation is obtained via unification: sem = [incl,[E,t1]], where t1 refers to the following time stamp time = ['Mon', date( , , ), time( , , ), gl([ ,'1 day', ])].11 11Note that the underscore &quot; &quot; refers to an anonymous variable in PROLOG.</Paragraph> <Paragraph position="3"> In the next step, the verbal expression tagger combines the temporal information derived for am Montag with the event representation for geschlossen. The following semantic representation is assign to the verb geschlossen during the tagging of the verbal expressions: sem = close(e23) temp = [ , [t(e23), ]]. This means that event e23 is of type closing and the run-time t(e23) of this event stands in some to-be-specified relation with another expression. Next, the temporal information extracted by the FST specialised in time-denoting expression is unified with the value of the temp-attribute. The result is [incl,[t(e23),t1]].</Paragraph> <Paragraph position="4"> So far, only the temporal relation that the event of closing happened within a time frame of one day has been determined. Since Montag contains an indexical reference, this reference has to be resolved. The document time stamp is needed here. All references regarding this index time are resolved during the generation of the HTML output file. Accordingly, the following time stamp is generated for am Montag: time = ['Mon', date(2001,4,2), time( , , ), gl([ ,'1 day', ])]. The timeinformation is left open because the current granularity level is GL-day.</Paragraph> <Paragraph position="5"> However, this information could be further specified by modifiers such as in n&quot;achstes Jahr ('next year'). The third slot in gl is reserved for these modifiers. The first slot can be filled by temporal modifier that refer to a subpart of the expressed temporal entity, as in Beginn des Jahres ('beginning of the year'). The resulting representation of an expression such as Beginn letzten Jahres ('beginning of last year') is gl([begin, year, last]).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Pragmatic inferences for anchoring </SectionTitle> <Paragraph position="0"> indexicals: The case of 'last' Temporal expressions of the type last Friday are similar to the phenomena discussed in the section above. German has three lexemes, namely letzt, vergangen and vorig that express this idea. The differences in meaning are-- in referring to a specific day--more of the type of individual preferences than of real alternatives in meaning. Which day is referred to by using vorigen Montag? This depends on the time of utterance. In general, there seems to be a tendency to interpret this expression as synonymous to Monday of the previous week, i.e. to make use of the previous-operation on the coarser level GL-week, instead of using this operation on the level GL-day. But, if uttered on Friday, our informants would give the Monday of the same week a preference in their interpretation.</Paragraph> <Paragraph position="1"> Thus the granularity-level up strategy is not always successful. As an alternative strategy we propose the strategy of the gliding time window.</Paragraph> <Paragraph position="2"> Similar to the first proposal a granularity of weeksize is relevant, but the relevant time entity in question is centered around the focused day of the week. In other words, looking forward and backward in time from the perspective of a Friday, the next Monday is nearer--or more activated--than the last Monday, although it is in the same calendar week. Thus, this Monday, i.e. the last Monday, has to be marked explictly by vorige, and therefore, the Monday before this, has to be specified as Montag der vorigen Woche ('Monday of last week').</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> We evaluated the temporal expression tagger wrt. a small corpus consisting of 10 news articles taken from Financial Times Deutschland. We can report precision and recall rates regarding the recognition of simple temporal expressions and complex temporal expression phrases. Based on the extracted temporal expression chunks the temporal information was derived and evaluated.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Tagging results </SectionTitle> <Paragraph position="0"> First, the class of simple temporal expressions was tagged and analysed. Mani and Wilson (2000) call this class TIMEX expression (of type TIME or DATE). We computed the precision and recall values for our data regarding this type of expressions in order to obtain a better comparability with the results obtained by this earlier study. However, as pointed out earlier, we consider PPs carrying information regarding temporal relations as quite crucial for the derivation of temporal information. This class of complex temporal expressions provides more detailed information about the temporal information expressed by a text.</Paragraph> <Paragraph position="1"> Table 3 contains the results of the evaluation wrt. the two classes of temporal expressions. There was a total of 186 simple and 182 complex temporal expressions previously annotated.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Simple Complex </SectionTitle> <Paragraph position="0"> temp. Expr. temp. Expr.</Paragraph> <Paragraph position="1"> tagger An error analysis showed that the main source of missed temporal expressions was the occurrence of a combined temporal expression, as in 2000/01. There were 6 cases when the tagger did not correctly analyse this type of expression.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Temporal information </SectionTitle> <Paragraph position="0"> The analysis of the temporal expressions included an evaluation of the temporal relations derived.</Paragraph> <Paragraph position="1"> Since all temporal prepositions and the class of temporal expressions that can be recognised by the FSTs come with a predefined semantics, precision and recall rates are the same. The overall performance showed a precision and recall rate of 84.49. As indicated by table 4, errors were only made for expressions that express an indexical reference. These errors were in most cases due to a missing semantics assigned to the respective expression. Since this part of the system is still work in progress, we have not yet defined a complete semantics for all temporal expression.</Paragraph> <Paragraph position="2"> Hence the performance of the system regarding temporal inference is likely to improve in the future. null</Paragraph> </Section> </Section> class="xml-element"></Paper>