File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/w05-0403_relat.xml
Size: 5,060 bytes
Last Modified: 2025-10-06 14:15:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0403"> <Title>Temporal Feature Modification for Retrospective Categorization</Title> <Section position="6" start_page="20" end_page="21" type="relat"> <SectionTitle> 5 Related Work </SectionTitle> <Paragraph position="0"> The use of metadata and other complementary (noncontent) information to improve text categorization is an interesting and well-known problem. The specific use of temporal information, even if only implicitly, for tasks closely related to TC has been explored through adaptive information filtering (AIF) and topic detection and tracking (TDT).</Paragraph> <Section position="1" start_page="20" end_page="21" type="sub_section"> <SectionTitle> 5.1 Adaptive Information Filtering </SectionTitle> <Paragraph position="0"> There exists a large body of work on information filtering, which &quot;is concerned with the problem of delivering useful information to a user while preventing an overload of irrelevant information&quot; (Lam et al., 1996). Of particular interest here is adaptive information filtering (AIF), which handles the problems of concept drift (a gradual change in the data set a classifier must learn from) and concept shift (a more radical change).</Paragraph> <Paragraph position="1"> Klinkenberg and Renz test eight different classifiers on their abilities to adapt to changing user preferences for news documents (Klinkenberg and Renz, 1998). They try different &quot;data management techniques&quot; for the concept drift scenario, selectively altering the size of the set of examples (the adaptive window) that a classifier trains on using a heuristic that accounts for the degree of dissimilarity between the current batch of examples and previous batches. Klinkenberg and Joachims later abandon this approach because it relies on &quot;complicated heuristics&quot;, and instead concentrate their analysis on support vector machines (Klinkenberg and Joachims, 2000).</Paragraph> <Paragraph position="2"> Stanley uses an innovative approach that eschews the need for an adaptive window of training examples, and instead relies on a voting system for decision trees (Stanley, 2001). The weight of each classifier's vote (classification) is proportional to its record in predicting classifications for previous examples. He notes that this technique does not rely on decision trees; rather, any combination of classifiers can be inserted into the system.</Paragraph> <Paragraph position="3"> The concept drift and shift scenarios used in the published literature are often unrealistic and not based upon actual user data. Topic Detection and Tracking, described in the following section, must work not with the behavior of one individual, but with texts that report on real external events and are not subject to artificial manipulation. This multifaceted, unsupervised character of TDT makes it a more appropriate precursor with which to compare our work.</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 5.2 Topic Detection and Tracking </SectionTitle> <Paragraph position="0"> Franz et al. note that Topic Detection and Tracking (TDT) is fundamentally different from AIF in that the &quot;adaptive filtering task focuses on performance improvements driven by feedback from real-time human relevance assessments. TDT systems, on the other hand, are designed to run autonomously without human feedback&quot; (Franz et al., 2001). Having roots in information retrieval, text categorization, and information filtering, the initial TDT studies used broadcast news transcripts and written news corpora to accomplish tasks ranging from news story clustering to boundary segmentation. Of most relevance to the present work is the topic tracking task. In this task, given a small number (1-4) of training stories known to be about a particular event, the system must make a binary decision about whether each story in an incoming stream is about that event.</Paragraph> <Paragraph position="1"> Many TDT systems make use of temporal information, at least implicitly. Some employ a least recently used (Chen and Ku, 2002) or decay (Allan et al., 2002) function to restrict the lexicon available to the system at any given point in time to those terms most likely to be of use in the topic tracking task.</Paragraph> <Paragraph position="2"> There are many projects with a foundation in TDT that go beyond the initial tasks and corpora. For example, TDT-inspired language modeling techniques have been used to train a system to make intelligent stock trades based upon temporal analysis of financial texts (Lavrenko et al., 2000). Retrospective timeline generation has also become popular, as exhibited by Google's Zeitgeist feature and browsers of TDT news corpora (Swan and Allan, 2000; Swan and Jensen, 2000).</Paragraph> <Paragraph position="3"> The first five years of TDT research are nicely summarized by Allan (Allan, 2002).</Paragraph> </Section> </Section> class="xml-element"></Paper>