Story Link Detection and New Event Detection are Asymmetric
Francine Chen
PARC
3333 Coyote Hill Rd
Palo Alto, CA 94304
fchen@parc.com
Ayman Farahat
PARC
3333 Coyote Hill Rd
Palo Alto, CA 94304
farahat@parc.com
Thorsten Brants
PARC
3333 Coyote Hill Rd
Palo Alto, CA 94304
thorsten@brants.net
Abstract
Story link detection has been regarded as a
core technology for other Topic Detection and
Tracking tasks such as new event detection. In
this paper we analyze story link detection and
new event detection in a retrieval framework
and examine the effect of a number of tech-
niques, including part of speech tagging, new
similarity measures, and an expanded stop list,
on the performance of the two detection tasks.
We present experimental results that show that
the utility of the techniques on the two tasks
differs, as is consistent with our analysis.
1 Introduction
Topic Detection and Tracking (TDT) research is spon-
sored by the DARPA TIDES program. The research has
five tasks related to organizing streams of data such as
newswire and broadcast news (Wayne, 2000). A link
detection (LNK) system detects whether two stories are
“linked”, or discuss the same event. A story about a plane
crash and another story about the funeral of the crash vic-
tims are considered to be linked. In contrast, a story about
hurricane Andrew and a story about hurricane Agnes are
not linked because they are two different events. A new
event detection (NED) system detects when a story dis-
cusses a previously unseen event. Link detection is con-
sidered to be a core technology for new event detection
and the other tasks.
Several groups are performing research on the TDT
tasks of link detection and new event detection (e.g.,
(Carbonell et al., 2001) (Allan et al., 2000)). In this pa-
per, we compare the link detection and new event detec-
tion tasks in an information retrieval framework, examin-
ing the criteria for improving a NED system based on a
LNK system, and give specific directions for improving
each system separately. We also investigate the utility of
a number of techniques for improving the systems.
2 Common Processing and Models
The Link Detection and New Event Detection systems
that we developed for TDT2002 share many process-
ing steps in common. This includes preprocessing
to tokenize the data, recognize abbreviations, normal-
ize abbreviations, remove stop-words, replace spelled-
out numbers by digits, add part-of-speech tags, replace
the tokens by their stems, and then generating term-
frequency vectors. Document frequency counts are in-
crementally updated as new sources of stories are pre-
sented to the system. Additionally, separate source-
specific counts are used, so that, for example, the
term frequencies for the New York Times are com-
puted separately from stories from CNN. The source-
specific, incremental, document frequency counts are
used to compute a TF-IDF term vector for each story.
Stories are compared using either the cosine distance
a0a2a1a4a3a6a5a8a7a10a9a12a11a13a7a15a14a17a16a19a18 a20a22a21a17a23a25a24a27a26a4a28a29a31a30a33a32a4a34a23a25a24a27a26a4a28a29a36a35a33a32
a37
a20a38a21a39a23a25a24a27a26a4a28a29a31a30a33a32
a35
a34a20a22a21a39a23a40a24a41a26a4a28a29a36a35a42a32
a35
or Hellinger
distance a0a2a1a43a3a44a5a45a7 a9 a11a13a7 a14 a16a46a18
a20
a26
a47
a23a25a24a48a29a31a30a49a28a26a50a32
a20a22a21a12a23a25a24a48a29a31a30a49a28a26a50a32a52a51
a23a25a24a48a29a36a35a53a28a26a50a32
a20a22a21a12a23a40a24a54a29a36a35a36a28a26a50a32
for
terms a55 in documents a7 a9 and a7 a14 . To help compensate for
stylistic differences between various sources, e.g., news
paper vs. broadcast news, translation errors, and auto-
matic speech recognition errors (Allan et al., 1999), we
subtract the average observed similarity values, in similar
spirit to the use of thresholds conditioned on the sources
(Carbonell et al., 2001)
3 New Event Detection
In order to decide whether a new document a7 describes a
new event, it is compared to all previous documents and
the document a7a57a56 with highest similarity is identified. If
the score a0a17a58a31a59a39a60a12a61a10a5a45a7a10a16a62a18a64a63a66a65a67a0a2a1a4a3a44a5a45a7a57a11a13a7a68a56a2a16 exceeds a thresh-
old a69a25a70 , then there is no sufficiently similar previous doc-
ument, and a7 is classified as a new event.
4 Link Detection
In order to decide whether a pair of stories a7a71a9 and a7a10a14
are linked, we compute the similarity between the two
documents using the cosine and Hellinger metrics. The
similarity metrics are combined using a support vector
machine and the margin is used as a confidence measure
that is thresholded.
5 Evaluation Metric
TDT system evaluation is based on the number of false
alarms and misses produced by a system. In link detec-
tion, the system should detect linked story pairs; in new
event detection, the system should detect new stories. A
detection cost
a72a52a73a75a74
a26
a18
a72a77a76a79a78
a70a33a70
a51a43a80
a76a79a78
a70a33a70
a51a8a80
a26a45a81a53a82a84a83
a72a52a85a87a86
a51a8a80
a85a87a86
a51a8a80a89a88a25a90a13a88
a26a45a81a53a82a92a91
(1)
is computed where the costs a72a93a76a52a78 a70a33a70 and a72a79a85a87a86 are set to 1
and 0.1, respectively.
a80
a76a79a78
a70a33a70 and
a80
a85a94a86 are the computed
miss and false alarm probabilities.
a80
a26a45a81a53a82
and
a80 a88a92a90a42a88
a26a45a81a31a82
are
the a priori target and non-target probabilities, set to 0.02
and 0.98, respectively. The detection cost is normalized
by dividing by mina5 a72 a76a79a78 a70a33a70
a51a8a80
a26a45a81a53a82
a11
a72 a85a87a86
a51a8a80 a88a25a90a13a88
a26a45a81a53a82
a16 so that a
perfect system scores 0, and a random baseline scores 1.
Equal weight is given to each topic by accumulating error
probabilities separately for each topic and then averaged.
The minimum detection cost is the decision cost when the
decision threshold is set to the optimal confidence score.
6 Differences between LNK and NED
The conditions for false alarms and misses are reversed
for the LNK and NED tasks. In the LNK task, incor-
rectly flagging two stories as being on the same event is
considered a false alarm. In contrast, in the NED task, in-
correctly flagging two stories as being on the same event
will cause a true first story to be missed. Conversely, in-
correctly labeling two stories that are on the same event
as not linked is a miss, but for the NED task, incorrectly
labeling two stories on the same event as not linked may
result in a false alarm.
In this section, we analyze the utility of a number of
techniques for the LNK and NED tasks in an information
retrieval framework. The detection cost in Eqn. 1 assigns
a higher cost to false alarms since a72a95a76a79a78 a70a33a70
a51a25a80
a26a45a81a31a82
a18a97a96
a91
a96a15a98
and a72 a85a87a86
a51a99a80 a88a92a90a42a88
a26a45a81a31a82
a18a100a96
a91
a96a92a101a25a102 . A LNK system should
minimize false alarms by identifying only linked stories,
which results in high precision for LNK. In contrast, a
NED system will minimize false alarms by identifying all
stories that are linked, which translates to high recall for
LNK. Based on this observation, we investigated a num-
ber of precision and recall enhancing techniques for the
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 LNK − Hellinger vs. Cosine
score
CDF
on cos
off cos
on hell
off hell
Figure 1: CDF for cosine and Hellinger similarity on the
LNK task for on-topic and off-topic pairs.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NED − Hellinger vs. Cosine
Similarity
CDF(Similarity)
Hellinger on−topic
Hellinger off−topic
cosine on−topic
cosine off−topic
Figure 2: CDF for cosine and Hellinger similarity on the
NED task for on-topic and off-topic pairs.
LNK and NED systems, namely, part-of-speech tagging,
an expanded stoplist, and normalizing abbreviations and
transforming spelled out numbers into numbers. We also
investigated the use of different similarity measures.
6.1 Similarity Measures
The systems developed for TDT primarily use cosine
similarity as the similarity measure. In work on text seg-
mentation (Brants et al., 2002), better performance was
observed with the Hellinger measure. Table 1 shows
that for LNK, the system based on cosine similarity per-
formed better; in contrast, for NED, the system based on
Hellinger similarity performed better.
The LNK task requires high precision, which corre-
sponds to a large separation between the on-topic and
off-topic distributions, as shown for the cosine metric in
Figure 1. The NED task requires high recall (low CDF
Table 1: Effect of different similarity measures on topic-
weighted minimum normalized detection costs on the
TDT 2002 dry run data.
System Cosine Hellinger Change(%)
LNK 0.3180 0.3777 -0.0597(-18.8)
NED 0.7059 0.5873 +0.1186(+16.3)
Table 2: Effect of using part-of-speech on minimum nor-
malized detection costs on the TDT 2002 dry run data.
System a65 PoS
a83
PoS Change (%)
LNK 0.3180 0.3334 -0.0154 (a65a104a103
a91
a102 %)
NED 0.6403 0.5873 +0.0530 (
a83
a102
a91a105
%)
values for on-topic). Figure 2, which is based on pairs
that contain the current story and its most similar story in
the story history, shows a greater separation in this region
with the Hellinger metric. For example, at 10% recall, the
Hellinger metric has 71% false alarm rate as compared to
75% for the cosine metric.
6.2 Part-of-Speech (PoS) Tagging
To reduce confusion among some word senses, we tagged
the terms as one of five categories: adjective, noun,
proper nouns, verb, or other, and then combined the stem
and part-of-speech to create a “tagged term”. For exam-
ple, ‘N train’ represents the term ‘train’ when used as a
noun. The LNK and NED systems were tested using the
tagged terms. Table 2 shows the opposite effect PoS tag-
ging has on LNK and NED.
6.3 Stop Words
The broadcast news documents in the TDT collection
have been transcribed using Automatic Speech Recog-
nition (ASR). There are systematic differences between
ASR and manually transcribed text. For example “30”
will be spelled out as “thirty” and ‘CNN” is represented
as three separate tokens “C”, “N”, and “N”. To handle
these differences, an “ASR stoplist” was created by iden-
tifying terms with statistically different distributions in a
parallel corpus of manually and automatically transcribed
documents, the TDT2 corpus. Table 3 shows that use of
an ASR stoplist on the topic-weighted minimum detec-
tion costs improves results for LNK but not for NED.
We also performed “enhanced preprocessing” to nor-
malize abbreviations and transform spelled-out numbers
into numerals, which improves both precision and re-
call. Table 3 shows that enhanced preprocessing exhibits
worse performance than the ASR stoplist for Link Detec-
tion, but yields best results for New Event Detection.
Table 3: Effect of using an “ASR stoplist” and “enhanced
preprocessing” for handling ASR differences on the TDT
2001 evaluation data.
ASRstop No Yes No
Preproc Std Std Enh
LNK 0.312 0.299 (+4.4%) 0.301 (+3.3%)
NED 0.606 0.641 (-5.5%) 0.587 (+3.1%)
7 Summary and Conclusions
We have presented a comparison of story link detection
and new event detection in a retrieval framework, show-
ing that the two tasks are asymmetric in the optimiza-
tion of precision and recall. We performed experiments
comparing the effect of several techniques on the perfor-
mance of LNK and NED systems. Although many of the
processing techniques used by our systems are the same,
the results of our experiments indicate that some tech-
niques affect the performance of LNK and NED systems
differently. These differences may be due in part to the
asymmetry in the tasks and the corresponding differences
in whether improving precision or recall for the link task
is more important.
8 Acknowledgments
We thank James Allan of UMass for suggesting that pre-
cision and recall may partially explain the asymmetry of
LNK and NED.
References
James Allan, Hubert Jin, Martin Rajman, Charles Wayne,
Dan Gildea, Victor Lavrenko, Rose Hoberman, and
David Caputo. 1999. Topic-based novelty detection.
Summer workshop final report, Center for Language
and Speech Processing, Johns Hopkins University.
James Allan, Victor Lavrenko, and Hubert Jin. 2000.
First story detection in TDT is hard. In CIKM, pages
374–381.
Thorsten Brants, Francine Chen, and Ioannis Tsochan-
taridis. 2002. Topic-based document segmentation
with probabilistic latent semantic analysis. In CIKM,
pages 211–218, McLean, VA.
Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin,
and Jian Zhang. 2001. Cmu tdt report. Slides at the
TDT-2001 meeting, CMU.
Charles Wayne. 2000. Multilingual topic detection
and tracking: Successful research enabled by corpora
and evaluation. In LREC, pages 1487–1494, Athens,
Greece.
