XML Viewer - n03-3008

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-3008_metho.xml
Size: 20,657 bytes
Last Modified: 2025-10-06 14:08:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-3008">
  <Title>Investigations on Event Evolution in TDT</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Problems in TDT
</SectionTitle>
    <Paragraph position="0"> The events are taking place in the world, and some of them are reported in the news. A TDT system does not perceive the events themselves, rather makes an effort in deducing them from the continuous news-stream - which is in a sense like the shadows on the wall in Plato's cave analogy. Given this setting, what is it that we are trying to model? Typically, the text categorization is conducted using some machine learning system (Sebastiani, 2002; Yang and Liu, 1999). Such system is taught to recognize the difference between two or more predefined classes or categories by providing a good number of pre-labeled samples to learn from. As to classes and word frequencies, this training material is assumed to lend itself to the same underlying distribution as the material that is to be categorized. More formally, the documents a1a3a2a5a4a7a6a9a8a11a10a12a6a14a13a11a10a7a15a16a15a16a15a7a10a12a6a18a17a19a20a17a22a21 and their labels  a2a5a4a25a24 a8 a10a26a24 a13 a10a7a15a16a15a16a15a16a10a27a24a11a17a28a29a17a30a21 yield to a unknown distribution. This distribution is expressed as a function a31a32 that assigns to each document-label pair  a32 , practically, with the 'highest' accuracy. This accuracy is evaluated with a pre-labeled testing material.</Paragraph>
    <Paragraph position="1"> Now, with TDT the problem is different. Let us assume that the documents and events yield to an unknown distribution represented by the function</Paragraph>
    <Paragraph position="3"> that assigns each document a6a70a36a71a41a43a1 a boolean value indicating whether it discussed event a72a7a38a50a41a43a68 or not. The problem is that domain of a68a54a2a5a4a40a72a73a8a74a10a27a72a25a13a11a10a7a15a16a15a16a15a16a10a27a72a75a17a76a53a17a77a21 is timedependent. The hypothesis a66 a58 a1a61a44a43a68 a60 a4a63a62 a48 a10 a48 a21 built from the training data does not work with evaluation data, because these two data sets do not discuss the same events. Moreover, the events are very small in size compared to categories and their identity, that is, the most important terms evolve over time. We can, however, model similarity between two documents. By examining the pair-wise comparisons in the training set, we can formulate a hypothesis a0a78a58 a1a79a44a80a1 a60 a4a34a62 a48 a10 a48 a21 that assigns the pair a33a81a6 a36 a10a12a6 a38 a39a42a41a64a1a45a44a43a1 a boolean value 1 if the documents discuss the same event, -1 otherwise. Any two documents of same event are (ideally) similar in a similar way. This somewhat trivial observation has some implications worth mentioning.</Paragraph>
    <Paragraph position="4"> Firstly, by definition news documents report changes, something new with respect to what is already known.</Paragraph>
    <Paragraph position="5"> This would lead to think that the identity of an event eludes all static representations and that the representation for a topic would have to adapt automatically to the various changes in the reporting of the event.</Paragraph>
    <Paragraph position="6"> Secondly, so far the parameters and thresholds of the state-of-the-art methods in IR have tried to capture this similarity of similarity, but there does not seem be a representation expressive enough (Allan et al., 2000).</Paragraph>
    <Paragraph position="7"> Thirdly, the detection and tracking is based on pair-wise comparisons which requires exhaustive computation. Yang et al. (2002) suggested topic-categories that could be used to limit the search space of the first-story detection. However, building topic-categories automatically is difficult. In the following we outline some suggestions to these problems: event modeling, event representation and decreasing the computational cost.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Events and Topics
</SectionTitle>
    <Paragraph position="0"> Although the concept of event might seem intuitively clear and self-explanatory, formulating a sound definition appears to be difficult. Predating TDT research, numerous historians and researchers of political science have wrestled with the definitions (Falk, 1989; Gerner et al., 1994). What seems to be somewhat agreed upon is that an event is some sort of activity conducted by some agent and taking place somewhere at some time.</Paragraph>
    <Paragraph position="1"> Definition 1 An event is something that happens at some specific time and place (Yang et al., 1999).</Paragraph>
    <Paragraph position="2"> This initial definition was adopted to TDT project and it is intuitively quite sound. Practically all of the events of the TDT test set yield to the temporal proximity (&amp;quot;burstiness&amp;quot;) and the compactness. However, there is also a number of problematic cases which this definition seems to neglect: events which either have a long-lasting nature (Intifada, Kosovo-Macedonia, struggle in Columbia), escalate to several large-scale threads or campaigns (September 11), or are not tightly spatiotemporally constrained (BSE-epidemics).</Paragraph>
    <Paragraph position="3"> The events in the world are not as autonomous as this definition assumes. They are often interrelated and do not necessarily decay within weeks or a few months.</Paragraph>
    <Paragraph position="4"> Some of these problematic events would classify as activities (Papka, 1999), but when encountering a piece of news, we do not know a priori whether it is a short term event or long term activity, a start for a complex chain of events or just a simple incident.</Paragraph>
    <Paragraph position="5"> Definition 2 An event is a specific thing that happens at a specific time and place along with all necessary pre-conditions and unavoidable consequences (Cieri et al., 2002).</Paragraph>
    <Paragraph position="6"> This is basically a variant of Definition 1 that in some sense tries to address the autonomy assumption. Yet, it opens a number of questions as to what are the necessary preconditions for certain event, an oil crisis, for example. What are the necessary preconditions and unavoidable consequences of Secretary Powell's visit to Middle East or of suicide-bombing in Ramallah? Definition 3 A topic is an event or an activity, along with all related events and activities (Cieri et al., 2002).</Paragraph>
    <Paragraph position="7"> Here, Cieri et al. explicate the connection between a topic and an event: they are more or less synonyms. Rules of interpretation that have been issued to help to draw the line and to attain consistency. In TDT, there are eleven topic types that tell what kind of other topic types are relevant. The topic type of the topic is determined by the seminal event. Since TDT2 and TDT3 corpora are produced along this guideline, this is in a sense the de facto definition.</Paragraph>
    <Paragraph position="8"> Definition 4 A topic is a series of events, a narrative that evolves and may fork into several distinct topics.</Paragraph>
    <Paragraph position="9"> Definition 4 makes an attempt at addressing the changing or evolving nature of a topic. A seminal event can lead to several things at the same time and the connection between the various outcomes and the initial cause become less and less obvious as the events progress. As a practical consequence, the event evolution (Yang et al., 1999; Papka, 1999) causes changes in the vocabulary, especially in the crucial, identifying terms.</Paragraph>
    <Paragraph position="10"> The news documents are temporally linearly ordered, and the news stories can be said to form series of different lengths. Identifying these chains as topics is motivated by Falk's investigations on historical events (Falk, 1989). A narrative begins as soon as the first story is encountered. Then the narrative is developed into one or more directions: simple events, like plane accidents might not have as many sub-plots as a political scandal, a war or economical crises. Then, at some point one could say the latest story is so different from the initial one that it is considered a first story for a new event. However, there could remain some sort of link that these two topics (narratives) are somehow relevant. Hence, this kind of a narrative has a beginning, a middle and an end. An event evolution is illustrated in Figure 1.</Paragraph>
    <Paragraph position="11">  Initially, in phase a48 we have only one document, a first story a82 , an it constitutes an event that is depicted by the dashed line. Then in phase a83 , document a84 is found relevant to this event. Since it is found similar to a82 , there is link in between them. In phase a85 there are two more relevant documents: a23 and a86 . The former is more found similar to a84 than to a82 , and thus it continues the off-spring started by a84 . On the contrary, a86 appears closer to a82 and thus it starts a new direction. Phase a87 shows two stories, a88a90a89 a48 and a91a3a89 a48 outside the dashed ellipse. This represents a situation, where the vocabulary of the two expulsed documents is diverging from the rest of the documents, i.e., the inner cohesion of the topic is violated too much. The dotted ellipse represents the domain of possible topical shifts, i.e., stories that lead too far from the original topic. They are still regarded as part of the topic, but are on the brink of diverging from the topic and hence candidates for new first stories or seminal events.</Paragraph>
    <Paragraph position="12"> Finally, in phase a92 the separation takes place: Three new documents, a88a67a89a93a83 , a88a67a89a52a85 and a88a67a89a80a87 , are found similar to a88a94a89 a48 . As a result, document a88a94a89 a48 is separated into its own topic. Note that there is no follow-ups for a91a95a89 a48 , and therefore it is not cut off.</Paragraph>
    <Paragraph position="13"> The problem of text summarization is similar to detecting topical shifts: traces of all the main topics occurring in the given text need to be retained in the summarization. On the other hand, text segmentation shares some qualities with the topic shift detection. Lexical cohesion (Boguraev and Neff, 2000) has been employed in the task as well as in text segmentation (Stokes et al., 2002).</Paragraph>
    <Paragraph position="14"> A model of Definition 4 has many open issues. For example, what is the topic representation and what kind of impact will there be on the evaluation? We will try to address the former question in the following.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Multi-vector Event Model
</SectionTitle>
    <Paragraph position="0"> It has been difficult to detect two distinct train accidents or bombings as different events (Allan et al., 1998a).</Paragraph>
    <Paragraph position="1"> The terms occurring in the two documents are so similar that the term-space or the weighting-scheme in use fails to represent the required very delicate distinction.</Paragraph>
    <Paragraph position="2"> Furthermore, Allan, Lavrenko and Papka suspect that only a small number of terms is adequate to make the distinction between different news events (Allan et al., 1998b). Intuitively, when reporting two different train accidents, it would seem that the location and the time, possibly some names of people, are the terms that make up the difference. Papka observes that when increasing the weights of noun phrases and dates the classification accuracy improves and when decreasing them, the accuracy declines (Papka, 1999).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Event Vector
</SectionTitle>
      <Paragraph position="0"> A news document reporting an event states at the very barest what happened, where it happened, when it happened, and who was involved. The automatic extraction of these facts for natural language understanding is quite troublesome and time-consuming, and could still perform poorly. Previous detection and tracking approaches have tried to encapsulate these facts in a single vector. In order to attain the delicate distinctions mentioned above, to avoid the problems with the term-space maintenance and still maintain robustness, we assign each of the questions a semantic class, i.e., i.e. groups of semantically related words, similarly to approach suggest by Makkonen et al.</Paragraph>
      <Paragraph position="1"> (2002). The semantic class of LOCATIONS contains all the places mentioned in the document, and thus gives an idea, where the event took place. Similarly, TEMPORALS, i.e., the temporal expressions name an object, that is, a point or an interval of time, and bind the document onto the time-axis. NAMES are proper noun phrases that represent the people or organizations involved in the news story. What happened is represented by 'normal' words which we call TERMS.</Paragraph>
      <Paragraph position="2"> This approach has an impact on the document and the event representations. Instead of having just one vector, we issue four sub-vectors - one for each semantic class as illustrated in Figure 2.</Paragraph>
      <Paragraph position="3">  Yasser Arafat appointed his longtime deputy Mahmoud Abbas as prime minister Wednesday, . . . &amp;quot; (AP: Wednesday, March 19, 2003)</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Similarity of Hands
</SectionTitle>
      <Paragraph position="0"> One could claim that the meaning of a word is in the word's relation to other words without getting too deep into philosophical discussions as to what and how the meaning is. This meaning, that is, relation, can be represented in an ontology, where similar terms relate to each other in different manner than dissimilar ones.</Paragraph>
      <Paragraph position="1"> The similarity of event vectors is determined classwise: Each semantic class has its own similarity measure, and the over-all similarity could be the weighted sum of these measures, for example. The interesting thing is that now we can introduce semantics into the vector-based similarity by mapping the terms of a semantic class onto a formal space. Each pair of terms in this space has a similarity, i.e., a distance. Two TEMPORAL terms relate to each other on the time-axis, and the similarity of two LO-CATION terms can be based on a geographical proximity represented in an ontology. For example, the utterances next week and the last week of March 2003 do not coincide on the surface, but when evaluated with respect to the utterance time, the expressions refer to the same temporal interval. Similarly, London and Thames can be found relevant based on an spatial ontology. Similarity in these ontologies could be a distance on the time-axis or a distance in a tree, as we have previously noted (Makkonen et al., 2003).</Paragraph>
      <Paragraph position="2"> Now, let us present the above discussion more formally. Each term in the document is a member of exactly one semantic class. Hence, the documents are composed of the union of semantic classes, or equivalently, the document is a structure of a language specified by the unary relations that represent the semantic classes.</Paragraph>
      <Paragraph position="3"> Definition 5 Let a96 be a universe and let a97 be a language consisting of a98 unary relations</Paragraph>
      <Paragraph position="5"> structure a102a103a2a54a104a105a96a57a10a106a97a108a107 .</Paragraph>
      <Paragraph position="6"> Now, consider a96 as the set of natural language terms and a97 as the set of semantic classes. A document representation would be a a97 -structure consisting of terms</Paragraph>
      <Paragraph position="8"> i.e., a document is simply a union of the semantic classes.</Paragraph>
      <Paragraph position="9"> The class-wise similarity of two such structures would be as follows: Definition 6 Let a112a16a36 be a function a112a7a36 a58 a96a113a44a43a96 a60a61a114a115 that indicates the similarity of two elements in a99a116a36 The similarity of two a97 -structures is a function</Paragraph>
      <Paragraph position="11"> This type of similarity we call the similarity of hands 1.</Paragraph>
      <Paragraph position="12"> Hence, the similarity of two documents, a102 and a121 , would be a vector a104a35a129 a8 a10a12a129 a13 a10a16a15a7a15a16a15a131a10a27a129a11a100a50a107a42a41 a114a115 a100 . There are many ways to go about turning the vector into a single score (Makkonen et al., 2002). One way is to define the similarity as a weighted sum of each value of a112a25a36 , i.e.,</Paragraph>
      <Paragraph position="14"> a107 have also been interpreted as van Rijsbergen's (van Rijsbergen, 1980) similarity coefficients. Unlike detective stories, news documents give away the plot in the first few sentences. Therefore, the similarity measure could exploit the ranking, the ordinal of the sentence in which the term appears, in weighting the  .</Paragraph>
      <Paragraph position="15"> Currently we are experimenting with similarity of hands technique as a relevance score (Yang et al., 2000) for ranking the a0 nearest neighbours for each semantic 1Consider a simple game where one would have to determine the similarity of two hands of cards of arbitrary size (up to 52) drawn from two distinct decks and assume that there is a designated similarity measure for each suit. For example, with hearts low cards could be of more value. Furthermore, the suits could be weighted, i.e., clubs could be trump and unchallenged clubs would lead to dissimilarity.</Paragraph>
      <Paragraph position="16"> class. In other words, we find the a0 nearest events with respect to TEMPORAL, a0 nearest events with respect to NAMES, etc. In a sense, each semantic class votes for a0 candidates based on the relevance score and the respective weight of the semantic class. Once we have the four sets of candidates, we elect the one with highest number of votes.</Paragraph>
      <Paragraph position="17"> Hence, let a149a103a2a54a4a40a86 a8 a10a27a86 a13 a10a16a15a16a15a7a15a16a10a26a86a151a150a152a21 be the set of previous a97 -structures (i.e., events). The function  Quite obviously, the intersection is too strong a function in this case. Some vector a104a81a1 a36 a107</Paragraph>
      <Paragraph position="19"> which would make the intersection empty as well. However, we believe that it would be easier to find optimal weights for the semantic classes via this voting scheme than trying to optimize Equation 2, because there are less parameters.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Dynamic Hierarchies
</SectionTitle>
    <Paragraph position="0"> One of the problems that plagues many TDT efforts is the need to compare each incoming document with all the preceding documents. Even if we issue a time-window and have a straight-forward similarity measure, the number of required comparisons increases drastically as new documents come in. There have been efforts to decrease the amount of work by centroid vectors (Yang et al., 2000), and by building an ad hoc classifier for each topiccategory (Yang et al., 2002), for example.</Paragraph>
    <Paragraph position="1"> We suggest the we adopt text categorization on top of topic detection and tracking, similar to Figure 3.</Paragraph>
    <Paragraph position="2"> There has been good results in text categorization (see, e.g., (Yang and Liu, 1999; Sebastiani, 2002)) The pre-defined categories would form the static hierarchy - the IPTC Subject Reference System 2, for example - on top of all event-based information organization, and the models for the categories could be built on the basis of the test set.</Paragraph>
    <Paragraph position="3"> Below the static hierarchy there would be a dynamic hierarchy that evolves as new documents come in and  new topics are detected. There is also a time-window to limit the temporal scope. Once a topic expires, it is removed from the dynamic hierarchy and archived to a news repository of lower operational priority.</Paragraph>
    <Paragraph position="4"> The use of static hierarchy has some of the benefits the topic-categories of Yang et al. (2002) had. It decreases the search space and enables a category-specific weighting-scheme for terms. For example, when a document is categorized to the class 'science', there is no need to compare it against the events of any other class; ideally, all the relevant events have also been categorized to the same class.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML