Multilingual NameTag TM 
Multilingual Internet Surveillance System 
Multimedia Fusion System 
SRA International 
4300 Fair Lakes Court 
Fairfax, VA 22033 
703-502-1180 (tel) 
703-803-1793 (fax) 
http://www.sra.com 
1 Multilingual NameTag TM 
SRA's Multilingual NameTag is name tagging- 
software that can handle multiple languages, includ- 
ing English, Japanese, Spanish, Chinese and Thai. 
It finds and disambiguates in texts the names of peo- 
ple, organizations, and places, as well as time and 
numeric expressions with very high accuracy. The 
design of the system makes possible the dynamic 
recognition of names: NameTag does not rely on 
long lists of known names. Instead, NameTag makes 
use of a flexible pattern specification language to 
identify novel names that have not been encountered 
previously. This makes maintenance and porting to 
new domains very easy. In addition, NameTag can 
recognize and link variants of names in the same 
document automatically. For instance, it can link 
"IBM" to "International Business Machines" and 
"President Clinton" to "Bill Clinton." 
NameTag incorporates a language-independent 
C++ pattern-matching engine along with the 
language-specific lexical, pattern, and other re- 
sources necessary for each language. The Japanese, 
Chinese, and Thai versions integrate word seg- 
reenters to deal with the challenges of these lan- 
guages. 
NameTag is an extremely fast and robust system 
that can be easily integrated with other applications 
through its API. For example, it can be integrated 
with an information retrieval system to improve re- 
trieval accuracy and with a machine translation sys- 
tem to prevent name translation mistakes. Versions 
are available for UNIX and Windows 95 and NT 
platforms. 
2 Multilingual Internet Surveillance 
System 
The Multilingual Internet Surveillance System uses 
SRA's NameTag, the powerful SQL capability of an 
RDBMS, and a Java-enhanced Web-based GUI to 
provide an intelligent surveillance capability. The 
special features include: 
• Built-in Java-based Web crawler: By using 
this built-in Web crawler, the user can choose 
key WWW sites for surveillance. It automat- 
ically retrieves Web documents for intelligent 
indexing. The crawler has a built-in scheduler 
and make uses of multiple threads for the quick- 
est possible acquisition of documents. 
• Concept-based intelligent indexing by 
NameTag: SRA's NameTag indexes retrieved 
Web documents and extracts the most impor- 
tant information, i.e. the proper names. In ad- 
dition, NameTag can be customized to identify 
collections of other domain specific terms which 
are of interest to a particular Internet surveil- 
lance system (e.g., financial, legal, medical or 
military terms). 
+ 
• Pro-active monitoring and alert capabili- 
ties: Using a variety of data mining techniques, 
the system can monitor daily activities on the 
Internet (what's new and hot today?) and alert 
the user to unusual activity as it is happening. 
• Powerful SQL queries through an easy-to- 
use Web-based GUI: Once alerts go off, the 
user can perform more in-depth analysis by re- 
trieving relevant information through the user- 
friendly GU\[. Powerful SQL capability along 
with concept-based indexing ensures high pre- 
cision and time saving. 
• Automated hyperlinking for intelligent 
browsing: Another way to analyze the infor- 
mation effectively is to browse texts by following 
hyperlinks automatically created by the system. 
Hyperlinks are added for each proper name and 
custom term found by NameTag. 
• Multilingual capability for monolingual 
speakers: By incorporating multilingual ver- 
sions of NameTag and machine translation 
modules, monolingual speakers can also re- 
trieve, browse, and analyze the content of for- 
eign language documents. 
The multilingual capability allows the user to 
gather and assimilate information in foreign lan- 
31 
guages without further effort. For example, by sim- 
ply clicking on one of the hyperlinks, the user can 
view a list of other articles in any language that con- 
tain the same term (either original and translated). 
By entering queries in English, the user can obtain 
all documents in any language that contain the En- 
glish terms or their translations. 
The Multilingual Internet Surveillance System 
provides a truly unique way to analyze and discover 
necessary information effectively and efficiently from 
a vast information repositories on the Internet. For 
example, it can answer types of questions which can- 
not be asked of traditional search engines, such as 
"Which companies are mentioned along with Inter- 
net and Netscape?" or "Which people are related to 
the Shinshintou Party?" 
In addition, the concept-based indexing allows 
high-precision search; the user can ask for docu- 
ments that contain "Dole," the former senator, in- 
stead of "Dole," the pineapple company. In short, 
the system can eliminate most of the noise associated 
with traditional search engines and focus attention 
on precisely the information of interest. 
The Web-based client runs on multiple platforms. 
The server currently runs on a SUN Solaris platform 
(other server ports are underway). 
time. The data is segmented as it is received and can 
be simultaneously stored and forwarded to viewers 
on the network. The server also handles data input 
through textual newswire feeds. 
The Web-based client runs on multiple platforms. 
The server currently runs on a SUN Solaris platform. 
Contact: 
Chinatsu Aone 
(technical) 
aonec@sra.com 
Dave Conetsco 
(administrative) 
dave_conetsco@sra.com 
3 Multimedia Fusion System 
The Multimedia Fusion System (MMF) combines an 
automated clustering algorithm with a summariza- 
tion module to automatically group multimedia in- 
formation by content and simultaneously determine 
concise keyword summaries of each cluster. MMF 
assists the user who must assimilate a vast amount 
of information from different sources quickly and ef- 
fectively. As MMF generates clusters in an unsuper- 
vised fashion, i.e., no pre-defined user profile need 
be used, the system can adapt to new and changing 
world events with no extra effort. 
Specifically, the system takes newspaper articles 
and CNN Headline News, and creates a hierarchi- 
cal cluster tree in which related stories are clustered 
together at tree nodes regardless of their sources. 
MMF consists of four main components: keyword 
selection, document clustering, cluster summariza- 
tion, and cluster display. The resulting cluster tree 
is visualized in a Java-based interactive GUI. The 
user can follow a cluster tree hierarchy and expand 
clusters all the way down to individual documents. 
For newspaper articles, the text is shown while for 
CNN Headline News, both the closed-captioned text 
and the captured video are displayed in-line with a 
browser plug-in. Each displayed cluster also has its 
concise keyword summary next to the corresponding 
tree node. 
In addition to its clustering capabilities, the MMF 
server is also responsible for capturing video, audio, 
and closed-captions from a live satellite feed in real 
3
