XML Viewer - w06-0902

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0902_metho.xml
Size: 22,104 bytes
Last Modified: 2025-10-06 14:10:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0902">
  <Title>Local Semantics in the Interpretation of Temporal Expressions</Title>
  <Section position="4" start_page="9" end_page="10" type="metho">
    <SectionTitle>
2 The DANTE System
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="9" end_page="10" type="sub_section">
      <SectionTitle>
2.1 Processing Steps
</SectionTitle>
      <Paragraph position="0"> In our work, our goal is very close to that for which the TIMEX2 standard was developed: we want to annotate each temporal expression in a document with an indication of its interpretation, in the form of an extended ISO-format date and time string, normalised to some time zone. So, for example, suppose we have the following italicised temporal expression in an email message that was sent from Sydney on Monday 10th April 2006: (1) I expect that we will be able to present this at the meeting on Friday at 11am.</Paragraph>
      <Paragraph position="1"> In the context of our application, this temporal expression should be marked up as follows: (2) &lt;TIMEX2 VAL=&amp;quot;2006-04-14T01:00GMT&amp;quot;&gt; Friday at 11am&lt;/TIMEX2&gt; We have to do three things to achieve the desired result: * First, we have to detect the extent of the temporal expression in the text. We refer to this process as temporal expression recognition. null * Then, we have to use information from the document context to turn the recognized expression into a fully specified date and time. We refer to this as temporal expression interpretation. null * Finally, we have to normalise this fully specified date and time to a predefined time zone, which in the case of the present example is Greenwich Mean Time. We refer to this as temporal expression normalisation.2 2Note that this third step is not required by the TIMEX guidelines, but is an additional requirement in the context of our particular application. This also means that our use of the term 'normalisation' here is not consistent with the standard usage in the TIMEX context; however, we would argue that our distinction between interpretation and normalisation describes more accurately the nature of the processes involved here.</Paragraph>
      <Paragraph position="2">  We observe that, at the time that the extent of a temporal expression within a text is determined, it is also possible to derive some semantic representation of that expression irrespective of the wider context within which it needs to be interpreted: for example, by virtue of having recognized an occurrence of the string Friday in a text, we already know that this is a reference to a specific day of the week. Most existing systems for the interpretation of temporal expressions probably make use of such a level of representation. Schilder's (2004) approach captures the semantics here in terms of a lambda expression like lxFriday(x); Negri and Marseglia (2005) capture information at this stage of processing via a collection of temporary attributes.</Paragraph>
      <Paragraph position="3"> In our system, each of the three steps above corresponds to a distinct processing component in the DANTE system architecture. These components communicate in terms of a number of distinct representations, which we now describe.</Paragraph>
    </Section>
    <Section position="2" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
2.2 The Text
</SectionTitle>
      <Paragraph position="0"> This level of representation corresponds simply to the strings that constitute temporal expressions in text. These are understood to be linguistic constructions whose referents are entities in the temporal domain: either points in time, or periods of time. In the above example, the text representation is simply the string Friday at 11am.</Paragraph>
    </Section>
    <Section position="3" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
2.3 Local Semantics
</SectionTitle>
      <Paragraph position="0"> We use this term to refer to a level of representation that corresponds to the semantic content that is derivable directly from the text representation; in the case of temporal expressions that are arguments to prepositions, this includes the interpretation of the preposition. Such representations are often incomplete, in that they do not denote a particular point or period on the time line; however, usually they do partially specify points or periods, and constrain the further interpretation of the string.</Paragraph>
    </Section>
    <Section position="4" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
2.4 In-Document Semantics
</SectionTitle>
      <Paragraph position="0"> We use this term to refer to the fully explicit interpretation of the text string, to the extent that this can be determined from the document itself, in conjunction with any metadata associated with the document. This level of representation corresponds to the information encoded in the attributes of the TIMEX2 tag as defined in the TIMEX guidelines.</Paragraph>
    </Section>
    <Section position="5" start_page="10" end_page="10" type="sub_section">
      <SectionTitle>
2.5 Global Semantics
</SectionTitle>
      <Paragraph position="0"> The TIMEX guidelines do not have anything to say beyond the representation described in the previous section. In our application, however, we are also required to normalise all temporal expressions to a specific time zone. This requires that some further temporal arithmetic be applied to the semantics of the found expressions. To calculate this, we simply have to determine the difference between the time zone of the document containing the temporal reference and the target time zone, here Greenwich Mean Time. The document may not always be explicitly marked with information about the time zone of its creation; in such cases, this has to be inferred from information about the location of the author or sender of the message.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="10" end_page="11" type="metho">
    <SectionTitle>
3 Representing Temporal Expressions
</SectionTitle>
    <Paragraph position="0"> In this section, we describe a conceptualisation of the semantics of temporal expressions in terms of recursive attribute-value matrices.</Paragraph>
    <Section position="1" start_page="10" end_page="11" type="sub_section">
      <SectionTitle>
3.1 Temporal Entities
</SectionTitle>
      <Paragraph position="0"> As is conventional in this area of research, we view the temporal world as consisting of two basic types of entities, these being points in time and durations; each of these has an internal hierarchical structure. We can represent these in the following  The example above corresponds to the semantics of the temporal expression 3pm Thursday 13th May 2006 GMT; in the ISO date and time format used in the TIMEX2 standard, this would be written as follows:  Each atomic feature in the attribute-value structure thus corresponds to a specific position in the ISO format date-time string.</Paragraph>
    </Section>
    <Section position="2" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
3.2 Underspecification
</SectionTitle>
      <Paragraph position="0"> Of course, very few temporal expressions in text are fully specified. The attribute-value matrix representation makes it easy to represent the content of underspecified temporal expressions. For example, the content of the temporal expression Thursday in a sentence like We will meet on Thursday can be expressed as follows:  In the cases just described, the semantic representation corresponds to the entire temporal noun phrase in each case. The same form of representation is easy to use in a compositional semantic framework: each constituent in a larger temporal expression provides a structure that can be unified with the structures corresponding to the other constituents of the expression to provide a semantics for the expression as a whole. The values of the atomic elements in such an expression come from the lexicon; multiword sequences that are best considered atomic (such as, for example, idioms) can also be assigned semantic representations in the same way. The value of a composite structure is produced by unifying the values of its constituents. Unifying the two structures above, for example, gives us the following representation  So, these structures provide a convenient representation for what we have referred to above as the  of such an expression is captured by the context-free grammar rule 'NP-NP NP'. Other treatments are possible.</Paragraph>
      <Paragraph position="1"> local semantics of a temporal expression, and correspond to the output of the recognition stage of our processing architecture.</Paragraph>
    </Section>
    <Section position="3" start_page="11" end_page="11" type="sub_section">
      <SectionTitle>
3.3 Interpretation
</SectionTitle>
      <Paragraph position="0"> We can now define the task of interpretation in terms of the content of these structures. We assume a granularity ordering over what we might think of as the defining attributes in a temporal representation:  (4) year &gt; month &gt; daynum &gt; hour &gt; minute &gt; second  These are, of course, precisely the elements that are represented explicitly in an ISO date-time expression. null Interpretation of a partially specified temporal expression then requires ensuring that there is a value for every defining attribute that is of greater granularity than the smallest granularity present in the partially specified representation. We refer to this as the granularity rule in interpretation. In the case of the example in the previous section, the granularity rule tells us that in order to compute the full semantic value of the expression we have to determine a value for YEAR, but not for HOUR, MINS or SECS. This interpretation process may require a variety of forms of reasoning and inference, as discussed below, and is qualitatively different from the computation of the local semantics.</Paragraph>
      <Paragraph position="1"> In the context of our application, a third stage, the normalisation process, then requires taking the further step of adding a ZONE attribute with a specific value, and translating the rest of the construction into this time zone if it represents a time in another time zone.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="11" end_page="13" type="metho">
    <SectionTitle>
4 A Compact Encoding
</SectionTitle>
    <Paragraph position="0"> The structures described in the previous section are relatively unwieldy in comparison to the simple string structures used as values in the TIMEX standard. To enable easy evaluation of a system's ability to construct these intermediate semantic representations, we would like to use a representation that is immediately usable by existing evaluation tools. To achieve this goal, we define a number of extensions to the standard TIMEX2 string representation for values of the VAL attribute; these extensions allow us to capture the range of distinctions we need. To save space, we  also use these representations here to show the coverage of the annotation scheme that results.</Paragraph>
    <Paragraph position="1"> In our implementation, we represent the local semantic content via an additional set of attributes on TIMEX elements that mirrors exactly the set of attributes used by the TIMEX2 standard: thus we have T-VAL, T-ANCHOR VAL and so on. This means that markup applied to a text distinguishes intermediate and final semantic values, making it possible to evaluate on just intermediate values, just final values, or both. In what follows, we will also use these intermediate attributes to make clear which level of representation is under discussion.</Paragraph>
    <Section position="1" start_page="12" end_page="12" type="sub_section">
      <SectionTitle>
4.1 Partially Specified Dates and Times
</SectionTitle>
      <Paragraph position="0"> As noted above, many references to dates or times are not fully specified in a text, with the result that some parts will have to be computed from the context during the interpretation stage. Typical examples are as follows: (5) a. We'll see you in November.</Paragraph>
      <Paragraph position="1"> b. I expect to see you at half past eight.</Paragraph>
      <Paragraph position="2"> In the recursive attribute-value notation introduced above, the missing information in each case corresponds to those features that are absent in the structure as determined by the granularity rule introduced in Section 3.3.</Paragraph>
      <Paragraph position="3"> In our string-based notation, we use lowercase xs to indicate those elements for which a value needs to be found, but which are not available at the time the local semantics are computed; and we capture the granularity requirement by omitting from the string representation those elements that do not require a value.5 Table 1 provides a range of examples that demonstrate various forms of underspecification.</Paragraph>
      <Paragraph position="4"> A lowercase x thus corresponds to a variable.</Paragraph>
      <Paragraph position="5"> By analogy with this extension, we also use a lowercase t instead of the normal ISO date-time separator of T to indicate that the time may need further specification: consider the third and fourth examples in Table 1, where it is not clear whether the time specified is a.m. or p.m.</Paragraph>
      <Paragraph position="6"> For partially-specified dates and times, the string-based encoding thus both captures the local 5Note that this does not mean the same thing as the use of an uppercase X in the TIMEX2 guidelines: an uppercase X means effectively that no value can be determined. Of course, if no value can be found for a variable element during the interpretation process, then the corresponding lowercase x will be replaced by an uppercase X.</Paragraph>
      <Paragraph position="7">  semantic content of the temporal expression, and provides a specification of what information the interpretation process has to add. If the temporal focus is encoded in the same form of representation, then producing the final interpretation is often a simple process of merging the two structures, with the values already specified in the intermediate representation taking precedence over those in the representation of the temporal focus. Expressions involving references to named months require a decision as to whether to look for the next or previous instance of the month, typically determined by the tense of the major clause containing the reference.</Paragraph>
    </Section>
    <Section position="2" start_page="12" end_page="13" type="sub_section">
      <SectionTitle>
4.2 Representing Weekdays
</SectionTitle>
      <Paragraph position="0"> In recognition that the year-based calendar and the week-based calendar are not aligned, our intermediate representation embodies a special case borrowed from the TIMEX2 notation for days of the week that require context for their specification. Consider example (6a), uttered on Friday 14th April 2006; the intermediate semantic representation is provided in example (6b), and the final interpretation is provided in example (6c).</Paragraph>
      <Paragraph position="1">  (6) a. We left on Tuesday.</Paragraph>
      <Paragraph position="2"> b. T-VAL=&amp;quot;D2&amp;quot; c. VAL=&amp;quot;2006-04-11&amp;quot;  This is not as convenient as the ISO-like encoding, and requires special case handling in the interpreter; however, a more comprehensive single representation would require abandoning the ISO-like encoding and the benefits it brings, so we choose to use the two formats in concert.</Paragraph>
      <Paragraph position="3">  The same notation supports references to parts of specific days, as presented in example (7).  (7) a. We left on Tuesday morning.</Paragraph>
      <Paragraph position="4"> b. T-VAL=&amp;quot;D2TMO&amp;quot; c. VAL=&amp;quot;2006-04-11TMO&amp;quot;</Paragraph>
    </Section>
    <Section position="3" start_page="13" end_page="13" type="sub_section">
      <SectionTitle>
4.3 Relative Dates and Times
</SectionTitle>
      <Paragraph position="0"> A relative date or time reference is one that requires a calendar arithmetic operation to be carried out with respect to some temporal focus in the text.</Paragraph>
      <Paragraph position="1"> Typical examples are as follows: (8) a. We'll see him tomorrow.</Paragraph>
      <Paragraph position="2"> b. We saw him last year.</Paragraph>
      <Paragraph position="3"> c. We'll see him next Thursday.</Paragraph>
      <Paragraph position="4"> d. We saw him last November.</Paragraph>
      <Paragraph position="5"> We distinguish three subtypes here: relative dates and times whose local semantics can be expressed in an ISO-like format; relative references to days and months by name; and less specific references to past, present and future times.</Paragraph>
      <Paragraph position="6"> For the first of these, we extend the ISO format with a preceding '+' or '[?]' to indicate the direction from the current temporal focus. Some examples of dates are provided in Table 2, and some examples of date-time combinations are provided in Table 3. Note the both the date and time elements in a relative reference can be independently either absolute or relative: compare the representations for in six hours time and at 6am today.</Paragraph>
      <Paragraph position="7"> This representation leads to a very intuitive coordinate-based arithmetic for computing the final semantic interpretation of a given expression: the interpreter merely adds the temporal focus and</Paragraph>
    </Section>
    <Section position="4" start_page="13" end_page="13" type="sub_section">
      <SectionTitle>
String Representation
</SectionTitle>
      <Paragraph position="0"> sixty seconds later +0000-00-00T+00:00:60 five minutes ago +0000-00-00T[?]00:05 in six hours time +0000-00-00T+06:00 at 6 a.m. today +0000-00-00T06:00 last night [?]0000-00-01TNI  the intermediate value element-by-element from the smallest unit upwards, using carry arithmetic where appropriate.</Paragraph>
      <Paragraph position="1"> Relative references to named days and months require a different treatment, in line with the notation introduced in Section 4.2. Table 4 shows the intermediate values used for a number of such expressions.</Paragraph>
      <Paragraph position="2"> A further variation on this notation also allows us to specify a local semantics for expressions like the first Tuesday in temporal expressions like the first Tuesday in July, or like the last year in the last year of the millenium; see Table 5. To produce final interpretations of these, the interpreter has to construct the set of elements that correspond to the head noun (for example, a list of the ISO dates that correspond to the Tuesdays in a given month), and then select the nth element from that set.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="13" end_page="14" type="metho">
    <SectionTitle>
5 Handling Embedded Constructions
</SectionTitle>
    <Paragraph position="0"> The TIMEX specification allows for the embedding of one TIMEX within another. Consider an example like the following:  The bulk of the embedded TIMEXs provided as examples in the TIMEX guidelines are, like this one, of the form [NP PP], where the head NP contains a TIMEX, and the PP contains another TIMEX that is modified by the head NP. Syntactically, these structures are of the form shown in  For our purposes, it is convenient to first think of these structures as consisting of three, rather than two, TIMEXs, corresponding to the three subscripted NP nodes in this tree. The outermost TIMEX, corresponding to NP0, is the one whose value we are ultimately interested in; this is computed by combining the semantics of the two constituent TIMEXs, corresponding to NP1 and NP2, and the preposition indicates how this combination should be carried out.</Paragraph>
    <Paragraph position="1"> Structurally, the recognizer may first determine that there are two separate TIMEXs here:  Each of these TIMEXs can be given the appropriate local semantics by the recognizer; the recognizer then reorganizes this structure to mirror the embedding required by the TIMEX guidelines, to produce the structure shown in example (10) above; effectively, NP1 disappears as a distinct constituent, and its intermediate semantics are inherited by NP0.</Paragraph>
    <Paragraph position="2"> We then leave it to the interpreter to combine the intermediate semantics of NP0 with the intermediate semantics of NP2 to produce a final semantics for NP0: schematically, we have (11) NP0(VAL) = NP0(T-VAL) [?] NP2(T-VAL) where '[?]' is the combinatory operation that corresponds to the preposition used. The operation required is specified by the recognizer as the value of the temporary attribute T-REL, which represents the semantics of the preposition.</Paragraph>
    <Paragraph position="3"> The following three examples demonstrate a variety of possibilities, showing both the intermediate (T-VAL) and final (VAL) semantic interpretations in each case:  Note that, when the embedded TIMEX is fully specified, as in the last example here, it would be possible for the recognizer to calculate the final value of the whole expression; however, for consistency we leave this task to the interpreter. The semantics of the indicated T-REL depend on the types of its arguments. In the cases above, for example, the operation is one of selecting an ordinally-specified element of a list; but where the entity is a period rather than a point, as in the first six months of 2005, the operation is one of delimiting the period in question.</Paragraph>
    <Paragraph position="4"> Of course, other forms of embedding are possible. In appositions, the syntactic structure can be thought of as [NP NP]; as in the case of embedded PPs, the TIMEX representation effectively promotes the semantics of the first NP to be the semantics of the whole. Again, we show both VAL and T-VAL values here, and the relevant T-REL.</Paragraph>
    <Paragraph position="5">  Here, the fact that the T-REL is EQUAL causes the interpreter to combine the values of the two TIMEXs, with points taking precedence over durations. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML