File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1018_evalu.xml
Size: 3,283 bytes
Last Modified: 2025-10-06 13:59:39
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1018"> <Title>Understanding Temporal Expressions in Emails</Title> <Section position="7" start_page="141" end_page="19970825" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> The temporal expressions in all of the datasets were initially tagged using rules developed for MinorThird null , and subsequently corrected manually by two of the authors. We then developed a prototype system and established our baseline over email1 (50%). The system at that time did not have any focus tracking mechanism (i.e., it always used the timestamp as the focus), and it did not use any tense information. The result confirms our estimate given in Sec. 2. We then gradually developed TEA to its current form using email1, email2 and email5. During the four-month development we added the focus tracking mechanism, incorporating the tense information into each TCNL formula via the coordinate prefixes, and introduced several representational improvements. Finally we tested the system on the unseen dataset email4, and obtained the results shown in Table 4. Note that the percentages reported in the table are accuracies, i.e., the number of correctly anchored expressions over the total number of temporal expressions over a dataset, since we are assuming correct tagging of all of the expressions.</Paragraph> <Paragraph position="1"> Our best result was achieved in the dev set email5 (85.45%), and the accuracy over the test set email4 was 76.34%.</Paragraph> <Paragraph position="2"> Table 4 also lists the types of the errors made by our system. The parsing errors are mistakes made at transducing temporal expressions using the finite-state parser into their TCNL formulae, the human errors are described in Sec. 2, and the rest are the anchoring errors. The accuracy numbers are all compared favorably to the baseline (50%). To put this performance in perspective, in (Wiebe et al., 1998) a similar task was performed over transcribed scheduling-related phone conversations. They reported an average accuracy 80.9% over the CMU http://minorthird.sourceforge.net/ test set and 68.9% over the NMSU test set. Although strictly speaking the two results cannot be compared due to differences in the nature of the corpora (transcription vs. typing), we nevertheless believe it represents a closer match compared to the other works done on newswire genre.</Paragraph> <Paragraph position="3"> It should also be noted that we adopted a similar recency-based focus model as in (Wiebe et al., 1998). Although simple to implement, this naive approach proved to be one major contributor to the anchoring errors in our experiments. An example is given below (the anchored times are shown in subscript): null This research can not proceed until the trade-offs are known on Monday .</Paragraph> <Paragraph position="4"> The last expression received an incorrect date: it should be the same date the expression &quot;on Monday&quot; refers to. Our system made this error because it blindly used the most recently mentioned time ((min..19970822)) as the focus to anchor the formula +f{mon}. This error later also propagated to the anchoring of the subsequent expressions.</Paragraph> </Section> class="xml-element"></Paper>