File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1018_intro.xml

Size: 6,521 bytes

Last Modified: 2025-10-06 14:03:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1018">
  <Title>Understanding Temporal Expressions in Emails</Title>
  <Section position="3" start_page="136" end_page="137" type="intro">
    <SectionTitle>
2 Temporal Expressions in Emails
</SectionTitle>
    <Paragraph position="0"> The extent of temporal expressions considered in this paper includes most of the expressions using temporal terms such as 2005, summer, evening, 1:30pm, tomorrow, etc. These expressions can be classified into the following categories: * Explicit: These expressions can be immediately anchored, i.e., positioned on a timeline. E.g., June 2005, 1998 Summer, etc.</Paragraph>
    <Paragraph position="1"> * Deictic: These expressions form a specific relation with the speech time (timestamp of an email). E.g., tomorrow, last year, two weeks from today.</Paragraph>
    <Paragraph position="2"> * Relative: These include the other expressions that form a specific relation with a temporal focus, i.e., the implicit time central to the discussion. E.g., from 5 to 7, on Wednesday, etc. Different from the speech time, a temporal focus can shift freely during the discourse.</Paragraph>
    <Paragraph position="3"> * Durational: These are the expressions that describe certain length in time. E.g., for about an hour, less than 20 minutes. This is different from an interval expression where both the starting point and the ending point are given (e.g., from 5 to 7). Most durational expressions are used to build more complex expressions, e.g., for the next 20-30 minutes.</Paragraph>
    <Paragraph position="4"> It is worth emphasizing the crucial difference between deictic expressions and relative expressions: anchoring the former only relies on the fixed speech time while normalizing the latter requires the usually hidden focus. As illustrated below the latter task can be much more challenging: &amp;quot;I'm free next week. Let's meet on Wednesday.&amp;quot; &amp;quot;Are you free on Wednesday?&amp;quot; In the first example the &amp;quot;Wednesday&amp;quot; denotes a different date since the first sentence sets up a different focus. To make things even more interesting, verbal tense can also play a role, e.g., &amp;quot;He finished the report on Wednesday.&amp;quot; There are other types of temporal expressions such as recurrence (&amp;quot;every Tuesday&amp;quot;) and rate expressions (&amp;quot;twice on Wednesday&amp;quot;) that are not supported in our system, although they are planned in our future work (Sec. 6).</Paragraph>
    <Paragraph position="5"> To appreciate the different nature of emails as a genre, an interesting observation can be made by comparing the distributions of temporal expressions in emails and in newswire texts. The email corpora we used for development and testing were collected from MBA students of Carnegie Mellon University over the year 1997 and 1998. The 277 students, organized in approximately 50 teams of 4 to 6 members, were participating in a 14-week course and running simulated companies in a variety of market scenarios (Kraut et al., 2004). The original dataset, the CSpace email corpus, contains approximately 15,000 emails. We manually picked 1,196 emails that are related to scheduling - these include scheduling meetings, presentations, or general planning for the groups. The emails are then randomly divided into five sets (email1 to email5), and only four of them are used in this work: email1 was used to establish our baseline, email2 and email5 were used for development, and part of email4 was used for testing. Table 1 shows some basic statistics of these three datasets  , and an edited sample email is shown in Fig. 1 (names altered). The most apparent difference comparing these emails to newswire texts is in the percentage of explicit expressions occurring in the two different genres. In (Mani et al., 2003) it was reported that the proportion of such expressions is about 25% in the newswire corpus they  The percentages in some rows do not add up to 100% because some expressions like coordination can be classified into more than one type.</Paragraph>
    <Paragraph position="6">  Date: Thu, 11 Sep 1997 00:14:36 -0500 I have put an outline out in the n10f1 OpReview directory... (omitted) We have very little time for this. Please call me Thursday  night to get clarification. I will need graphs and prose in files by Saturday Noon.</Paragraph>
    <Paragraph position="7"> - Mary ps. Mark and John , I waited until AFTER midnight to send this .</Paragraph>
    <Paragraph position="8">  . In contrast, explicit expressions on average only account for around 9.5% in the three email datasets. This is not surprising given that people tend to use under-specified expressions in emails for economic reasons. Another thing to note is that there are roughly the same number of relative expressions and non-relative expressions. Since non-relative expressions (including deictic expressions) can be anchored without tracking the temporal focus over a discourse and therefore can be dealt with in a fairly straightforward way, we may assign 50% as a somewhat generous baseline performance of any anchoring system  .</Paragraph>
    <Paragraph position="9"> Another difference between emails and newswire texts is that the former is a medium for communication: an email can be used as a reply, or can be attached within another email, or even be used to address to multiple recipients. All of this complicates a great deal of our task. Other notable differences are that in emails hour ambiguity tend to appear more often (&amp;quot;I'll be home at 2.&amp;quot;), and people tend to be more creative when they compose short messages such as using tables (e.g., an entire column of numbers to denote the number of minutes alloted for each presenter), bullet lists, abbreviations, and different month/day formats (&amp;quot;1/9&amp;quot; can mean January 9 or September 1), etc. Emails also contain more &amp;quot;human errors&amp;quot; such as misspellings (&amp;quot;Thusday&amp;quot; to mean Thursday) and confusion about dates (e.g., using &amp;quot;tomorrow&amp;quot; when sending emails  Using the North American News Corpus.</Paragraph>
    <Paragraph position="10">  This is a bit generous since solving simple calendric arithmetics such as anchoring last summer still requires a non-trivial modeling of human calendars; see Sec. 3.</Paragraph>
    <Paragraph position="11"> around midnight), etc. Overall it is very difficult to recover from this type of errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML