File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-2008_concl.xml
Size: 1,791 bytes
Last Modified: 2025-10-06 13:55:13
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2008"> <Title>Temporal Classification of Text and Automatic Document Dating</Title> <Section position="6" start_page="31" end_page="31" type="concl"> <SectionTitle> 5 Evaluation, Results and Conclusion </SectionTitle> <Paragraph position="0"> The system was trained using 67,000 news items selected randomly from the GigaWord corpus. The evaluation took place on 678,924 news items extracted from items marked as being of type &quot;story&quot; or &quot;multi&quot;. Table 1 presents a summary of results. The actual date was extracted from each news item in the GigaWord corpus and the day of week (DOW), week number and quarter calculated from the actual date. Average errors for each type of classifier were calculated automatically. For results to be considered correct, the system had to have the predicted value ranked in the first position equal to the actual value (of the type of period).</Paragraph> <Paragraph position="1"> The system results show that reasonable accurate dates can be guessed at the quarterly and yearly levels. The weekly classifier had the worst performance of all classifiers. The combined classifier uses a simple weighted formula to guess the final document date using input from all classifiers. The weights for the combined classifier have been set on the basis of this evaluation. The temporal classification and analysis system presented in this paper can handle any Indo-European language in its present form. Further work is being carried out to extend the system to Chinese and Arabic. Current research is aiming at improving the accuracy of the classifier by using the non-periodic components and improving the combined classification method.</Paragraph> </Section> class="xml-element"></Paper>