t 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
1 
I 
I 
I 
i 
I 
I 
Lexical Discovery with an Enriched Semantic Network 
Doug Beeferman 
School of Computer Science 
Carnegie Mellon University 
5000 Forbes Avenue 
Pittsburgh, PA 15213 
dougb@cs, cmu. edu 
Abstract 
The study of lexical semantics has produced a sys- 
tematic analysis of binary relationships between con- 
tent words that has greatly benefited lexical search 
tools and natural language processing algorithms. 
We first introduce a database system called FreeNet 
that facilitates the description and exploration of fi- 
nite binary relations. We then describe the design 
and implementation of Lexical FreeNet, a semantic 
network that mixes WordNet-derived semantic re- 
lations with data-derived and phonetically-derived 
relations. We discuss how Lexical FreeNet has aided 
in lexical discovery, the pursuit of linguistic and fac- 
tual knowledge by the computer-aided exploration 
of lexical relations. 
1 Motivation 
This paper discusses Lexical FreeNet, a database 
system designed to enhance lexical discovery. By 
this we mean the pursuit of linguistic and factual 
knowledge with the computer-aided exploration of 
lexical relations. Lexical FreeNet is a semantic net- 
work that leverages WordNet and other knowledge 
and data sources to facilitate the discovery of non- 
trivial lexical connections between words and con- 
cepts. 
A semantic network allied with the proper user 
interface can be a useful tool in its own right. By 
organizing words semantically rather than alphabet- 
ically, WordNet provides a means by which users can 
navigate its vocabulary logically, establishing con- 
nections between concepts and not simply character 
sequences. Exploring the WordNet hyponym tree 
starting at the word mammal, for instance, reveals 
to us that aardvarks are mammals; exploring Word- 
Net's meronym relation at the word tv, mr*al reveals 
to us that mammals have hair. From these two 
explorations we can accurately conclude that aard- 
varks have hair. 
Lexical exploration need not be limited to one step 
at a time, however. Viewing a semantic network as 
a computational structure awaiting graph-theoretic 
queries gives us the freedom to demand services be- 
yond mete lookup. "Does the aardvark have hair?", 
or "What is the closest connection between aard- 
varks and hair?" or "How interchangably can the 
words aardvark and anteater be used?" are all 
reasonable questions with answers staring us in the 135 
face. Of course, the idea of finding shortest paths 
in semantic networks (through so-called activation- 
spreading or intersection search) is not new. But 
these questions have typically been asked of very 
limited graphs, networks for domains far narrower 
than the lexical space of English, say. We feel that 
formalizing how WordNet can be employed for this 
broader sort of lexical discovery is a good start. We 
also feel that it is necessary first to enrich the net- 
work with information that, as we shall see, cannot 
be easily gleaned from WordNet's current battery of 
relations. The very large electronic corpora and wide 
variety of linguistic resources that today's comput- 
ing technology has enabled will in turn enable this. 
The remainder of this paper is organized as fol- 
lows. We shall first describe in Section 2 the FreeNet 
database system for the expression and analysis of 
relational data. In Section 3 we'll describe the de- 
sign and construction of an instance of this database 
called Lexical FreeNet. We'll conclude by providing 
examples of applications of Lexical FreeNet to lexi- 
cal discovery. 
2 FreeNet 
FreeNet, an acronym for finite relation expression 
network, is a system for describing and exploring 
finite binary relations. Here we mean relation in the 
mathematical sense, i.e. a set of ordered pairs. We 
concern ourselves with finite sets of pairs of tokens 
drawn from a finite set of tokens, or vocabulary. 
2.1 Tokens and relations 
A token in FreeNet is simply a normalized string 
of characters drawn from a finite vocabulary. The 
vocabulary might be a dictionary of English words, 
a set of movie titles, or a set of names of researchers. 
The system is assumed to implement normalization 
as a function from input strings to strings. 
A relation in FreeNet is a finite set of ordered pairs 
of tokens, or links. Each relation has a name that, 
like a token, is simply a normalized string of charac- 
ters drawn from a finite vocabulary (which we shall 
do better to call an alphabet, for reasons made clear 
below.) 
Use of the FreeNet system can be seen to consist 
of three distinct processing phases: the relation com- 
putation stage, in which a set of relations is derived 
from some knowledge or data source and transduced 
to an explicit set of labeled ordered pairs; the graph 
construction stage, in which this set of labeled pairs 
Is transduced to an efficient multigraph representa- 
tion; and the query stage, in which a user can inter- 
act with the system to find paths in the multigraph 
that match a certain specification. 
FreeNet consists of software to do the second and 
third phases. Implementation of a specific instance 
of FreeNet requires the user to write software to do 
the first phase, but support software exists for an 
optional filtering substage that constrains the input 
pair set in certain ways--eliminating pairs that con- 
tain stopwords, enforcing limits on the fanout of to- 
kens, and enforcing strength thresholds, for instance. 
The second phase, graph building, simply entails 
providing a set of triples {two tokens and a relation) 
to the system. The order in which the triples appear 
in the input does not matter, as it is the program's 
responsibility to reorder the links as necessary and 
to store the graph efficiently. 
The third phase, querying, is the chief novel con- 
tribution, and is described below. 
2.2 Regular expressions 
The power behind FreeNet lies in the user's ability to 
compose primitive relations to build more complex 
relations that it may use in its queries. 
The primary mechanism for building complex re- 
lations is the regular expression over the alphabet 
of relation names. Just as a regular expression over 
ASCII characters specifies a regular set of strings 
recursively in terms of other sets, so too can a reg- 
ular expression over relation names specify a set of 
ordered pairs recursively in terms of other sets and 
various operators. 
The following grammar specifies allowable regular 
expressions in FreeNet. 
regexp " 
<re1> (relation name) 
I (regexp) (parenthesization) 
regexp regexp (concatenation) 
regexp J regexp (union) 
regexpk regexp (conjunction) 
regexp, (transitive closure) 
regexp' (inverse) 
regexp- (complement) 
regexpX (sibling) 
These regexp-building operators are described be- 
low. 
Concatenation 
The concatenation operator is used to compose two 
relations directly. The expression rl r2 denotes 
the set of pairs (a,b) such that for some token c, 
(a,c) E rl and (c,b) E r2. For example, a net- 
work implementing a genealogy database might of- 
fer primitive parent and brother relations. In that 
case, the relation denoted by the regular expression 
{parent brother) is what we know of as the uncle 
relation. 
136 
Conjunction 
Conjunction takes the intersection of two relations: 
plainly, the intersection of their respective pair sets. 
The expression rl • r2 denotes the set of pairs 
(a, b) such that (a, b) E rl and (a, b) E r2. 
Supposing that in a lexical semantic net we have 
the relations required_by and requires, then a 
symmetric symbiotic_with relation might be im- 
plemented as their conjunction. 
Union 
The union operator is used to join two relations. The 
expression rl I r2 denotes the set of pairs (a,b) 
such that (a,b) E rl or (a,b) E r2. In an ErdSs- 
number like application, for example, two authors 
may be "related" if they have coauthored a paper or 
if one has cited the other. 
Transitive closure 
We commonly reason about the transitive closure 
of relations. The transitive closure operator imple- 
ments homogeneous reachability--is there a path be- 
tween the tokens using links only of a certain type? 
Namely, let r'l denote the relation r and r'i for 
i > 1 denote the relation (r r'(i-1)). Then r* 
denotes the union of all r'i as i ranges from 0 to 
infinity. (Note that since we assume finite relations, 
this set is always finite.) In the genealogy example, 
paxenc* would be what we consider the "ancestor" 
relation. 
Inverse, Complement, and Sibling 
A few more unary operators are minor conveniences 
in building relations. The inverse operator swaps 
every pair: r- denotes the set of pairs (a,b) such 
that (b.a) E r. Taking the union of a relation with 
its inverse produces a new relation that is guaranteed 
to be symmetric. 
The complement operator produces a set contain- 
ing all pairs but those in a certain relation, r' de- 
notes the set of pairs (a, b) such that (b, a) ~r. (The 
vocabulary is assumed to be fixed after the graph is 
built, and so the universe is well-defined.) 
The sibling operator produces pairs that have in 
common their relation with a certain other token. 
rX denotes the set of pairs (a, b) such that a ~ b and 
there exists a c such that (a,c) E r and (b,c) E r. 
Thus (parent-)~, relation is the genealogical sibling 
relation formed by applying the inverse operator and 
then the sibling operator to the "parent" relation. 
Note 
A simple structural induction can be used to prove 
that any relation built from these operators is also 
a relation. Additional operators to support set ad- 
dition and subtraction of constant pair sets are also 
available. 
2.3 Queries 
Queries in FreeNet are path specifications expressed 
as a sequence of tokens or token variables with inter- 
leaved relation regexps. More precisely, every query 
is ofthe form (W <regexp>)* W. where W is either a 
i 
I 
/.i- 
I 
I 
I 
I 
I 
-y 
constant token or a variable wi, and <regexp> is a 
regular expression over relations, as defined above. 
FreeNet returns a shortest path (or all paths) in 
the multigraph that match the query, binding the 
variables in the query to concrete tokens. The out- 
put includes the names of all of the primitive relation 
links traversed. 
Queries in the Internet version of FreeNet can take 
one of four forms, each parameterized by one or two 
tokens; but these demonstrate what are expected 
to be common queries. Below, the "ANY" regexp 
is the union of all available (or selected) primitive 
relations. The comma (",') represents the univer- 
sal relation, linking all pairs of tokens; the comma 
relation can thus be used in FreeNet queries to im- 
plement conjunction of clauses. 
• Shortest path: This query takes two arguments 
s and t, and outputs the result of the query 
"s ANY* t'. This finds a shortest path, using 
any of the selected relations, between the source 
and the target. 
• Fanout: This query takes a single argument s 
and outputs the result of "s ANY wz". This 
simply shows all words related in some way to 
the source. 
• Intersection search: This query takes two ar- 
guments s and t and outputs the result of 
"s AI~ wl , t t.tlY wt'. This is useful for 
finding what two tokens "have in common" in 
terms of primitive relationships with other to- 
kens. The two relations involved in such a path 
need not be identical. 
• Coercion: This query takes two arguments s 
and t, two relations rel and re2, and outputs 
the result of"s rot wt re2 w2 ret t". This 
is useful for a wide variety of constraint-solving, 
such as, in the lexical semantic net case, pun 
and rhyme generation. 
2.4 Implementation issues 
A FreeNet multigraph is stored sparsely for efficient 
ofltine (disk) access as a list of variable-length ad- 
jacency lists. Each element in an adjacency list is 
a single 32-bit word that describes an arc by com- 
bining its destination token ID and relation ID; the 
source token ID for an arc is implicit in its row. 
An index of offsets into the list is precomputed and 
stored together with hash tables for the token and 
relation namespaces. At no point in query process- 
ing is more than a single line of the list (equivalently, 
a set of links emanating from the same source node) 
in memory at once. 
Graph construction 
A number ofoptimizations in the layout of the multi- 
graph on disk are essential if arbitrary searches over 
large multigraphs are to be efficient. Of particu- 
lar concern is disk seek time, because traversing the 
graph entails accessing different rows of the adja- 
cency list representation in rapid succession. One 
simple preprocessing step is to sort each row of the' 
137 
representation by the word identifier's row location, 
so that all of the nodes emanating from a fixed source 
can be accessed wixth a unidirectional sweep. 
A trickier concern is the ordering of the rows 
themselves. We desire to order the rows so that re- 
lated words tend to appear near each other so that 
seek time between them is minimized. We can for- 
malize this problem by asking for an ordering that 
minimizes the average offset difference between a 
randomly chosen edge in the multigraph. This prob- 
lem is at least as computationally hard as the well- 
studied, NP-complete bandwidth problem in graph 
theory (Papadimitriou, 1076), which is to find a lin- 
ear ordering of the vertices of a given graph such that 
the maximum difference in the ordering between any 
two adjacent vertices is minimal. We are studying 
approximation algorithms (Blum et al., to appear) 
that allow this preprocessing step to be carried out 
efficiently during database construction. 
Querying 
Supporting arbitrary FreeNet queries that allow the 
full range of regular expression operators, is a non- 
trivial data structures problem, because it is pro- 
hibitively expensive to add new links with the oc- 
currence of a new regexp. Instead, the graph is 
static. Each relation in the "alphabet" of relations 
is converted to an ASCII character, and stock reg- 
exp .processing software is used to convert each reg- 
exp m a query to a state machine. A query is con- 
verted to a single state machine by concatenating 
its constituent regexp state machines, interleaving 
"constraint points" that enforce the identity of mul- 
tiple bindings of the same variable. A dynamic set of 
state IDs and backtrace IDs is associated with each 
token to support breadth-first search. 
The query templates above are implemented 
without all this machinery, by simply performing 
breadth-first-search on the graph, maintaining a sin- 
gle backtrace ID for each node, and allowing or pro- 
hibiting certain relations as specified by the user. 
Coercion is implemented as a hard-coded path con- 
straint. 
3 Lexical FreeNet 
Lexical FreeNet is an instance of FreeNet supporting 
a range of lexical semantic applications. It achieves 
this by mixing statistically-derived and knowledge- 
derived relations. 
Tokens 
The tokens in Lexical FreeNet are the words that 
appear in at least one of the program's various data 
sources. This includes over 130,000 words from 
the CMU Pronouncing Dictionary vl.6d (CMU, 
1997), 160,000 words and multiple-word phrases 
from WordNet 1.6, and 60,000 words from the broad- 
cast news transcripts used to train the trigger rela- 
tion. The intersection between these three sources is 
significant, of course, and in total there are slightly 
under 200,000 distinct tokens, including phrases. 
m 
I 
i 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
i 
I 
(a) 
TdgJ~3 TRG 3548OO 
~ynonymt~s ~ 2&9156 
Ge~g~di~ GEN 26127J 
co,~u, co. 
~uxo~ v~ z~ao 
~mol ANT I I10~2 
pJ~ymm RI~ 4J3~36 s~, ,~- s~ 
,~o 
.AmtSntm ANA 91072 
197598 distUgt togcm 
11G :s~ \[G~ ISt~C 
TRG 3-',44100 I 
i " SYN Irt3 2491J6, 
C~N 972- J3qO 12m 
SI~ 1164 33gO 16~ 12612111 
i ' COM 330 18oa 
(b) 
co.lvAs I~r i~ 
aul~Imnum 
.mmmmmm.m mm )mmmmmmmmmm 
\[\] .--m<,m ml~m ml.m m.,i j;:i,i,n.m iml~! 1,J,:.: 
Illa, m.~m.~'~ .m,m mi,,mm m.:,-%, 91072. 
Figure h Statistics on the relations in Lexical 
FreeNet. (a) The number of links in each relation. 
(b) Relation crossover counts. Each cell reports the 
number of word pairs that exist in both relations. 
One of the 5 pairs counted in the cell at (ANT, C0M), 
for example, is (DAY, ~IGh'T). 
Relations 
Lexical FreeNet includes seven semantic relations, 
two phonetic relations, and one orthographic rela- 
tion. These relations connect the token set with 
about seven million links, costing 30 MB of disk 
space. A summary of the relations is shown in Fig- 
ure 1. Below we use a bidirectional arrow (.,t-->) to 
indicate a symmetric relation, and a unidirectional 
arrow (==:,) to indicate an assymetric relation. 
"Synonym of" (~) 
This relation is computed by taking, for each syn- 
onym set (or synset) in all WordNet 1.6 word cat- 
egories, the cross-product of the synonym set with 
itself, excluding reflexive links (self-loops). That is 
to say, we include all pairs of lexemes in each synset 
except the links from a lexeme to itself. Thus we mix 
different lexeme senses into the same soup, conflat- 
ing, for example, the noun and verb senses of BIKE 
in bike ~ bicycle and bike ~=~ pedal. 
"Triggers" (~) 
Trigger pairs are ordered word pairs that co-occur 
significantly in data; that is, they are pairs that ap- 
pear near each other in text more frequently than 
138 
would be expected if the words were unrelated. 
Given a large corpus of text data, we built the as- 
symetric trigger relation by finding the pairs in the 
cross-product of the vocabulary that have the high- 
est average mutual information, as in (Rosenfeld, 
1994; Beeferman et al., 1997). Mutual information 
is one measure of whether an observed co-occurrence 
of two vocabulary words is not due to chance. Word 
pairs with high mutual information are likely to be 
semantically related in some way. 
We chose 160 million words of Broadcast News 
data (LDC, 1997) for this computation, and defined 
co-occurrence as "occurring within 500 words", ap- 
proximately the average document length. We se- 
lected the top 350,000 trigger pairs from the rank- 
ing to use in the relation, putting the size of the 
relation on par with the synonym relation. 1 Some 
of the top trigger pairs discovered by this procedure 
are shown in Table 2. In our implementation we 
limit the number of trigger links emanating from a 
token to the top 50, and prune away links that in- 
clude any member of a handcoded stopword set that 
includes function words. 
s 
Los 
United 
White 
President 
New 
health 
campaign 
Haitian 
films 
fed 
cottrt 
care 
Angeles 
States 
House 
Clinton 
York 
care 
Bush 
hristide 
film 
rates 
evidence 
insurance 
Figure 2: The top six trigger pairs (s,t), ranked 
by mutual information, in the Lexical FreeNet trig- 
ger relation, and the 500th through 505th-ranked 
pairs. The highest-ranked pairs tend to be distance- 
one bigram phrases, while the remainder co-occur at 
greater distances. 
"Specializes" (~:~) and "Generalizes" (~:~g) 
The specialization relation captures the lexical in- 
heritance system underlying WordNet nouns (Miller, 
1990) and verbs (Fellbaum, 1990). It is computed 
by taking, for each pair of WordNet synsets that ap- 
pear as parent and child in the WordNet hyponym 
trees, the cross-product of the pair. For example, 
shoe ~ footrest. 
The generalization relation is simply the inverse 
of specialization relation, or SPC-. For example: 
tree ~ cypress. 
I We used the Trigger Toolkit, available at 
http ://v~. cs. cmu. edu/ aberger/softeare, h~l, for 
this computation 
I 
I 
I 
i 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
i 
I 
"Part of" (:~) and "Comprises" (~,) 
PAR The ==¢, relation captures meronomy, another in- 
heritance system which can informally be thought 
of as a "part of" tree over nouns. It is computed 
by taking, for each pair of WordNet synsets that are 
related in WordNet by the meronym relation, the 
cross-product of the pair. For example, shoe =~g 
footwear. The "comprises" relation is simply its 
COb! inverse, PAR-, as in tree ==~ cypress. 
"Antonym of" (~=~) 
The antonym relation uses the antonym relation de- 
fined in WordNet for nouns, verbs, adjectives, and 
adverbs. It is computed by taking, for each pair of 
WordNet synsets that are related in WordNet by the 
antonym relation, the cross-product of the pair. For 
example, clear ~ opaque. 
"Phonetically similar to" (qs~) and 
"Rhymes with" (a,_~.) 
To allow users to cross the dimensions of sound and 
meaning in their queries, two phonetic relations are 
added to the mix in Lexical FreeNet. These rela- 
tions, while amusing for shortest path queries, are 
not expected to contribute to the text processing 
applications discussed later in this paper. Both re- 
lations leverage the phonetic and lexical stress tran- 
scriptions in the CMU Pronouncing Dictionary. 
The ~ relation is computed by adding every 
pair of words in the vocabulary that have pronunci- 
ations which differ in edit distance by at most some 
number of edits. Edit distance is computed us- 
ing a dynamic programming algorithm as the mini- 
mum number of substitutions, insertions, and dele- 
tions (unweighted, and blind to nearness in substitu- 
tion) to the first word's phonetic sequence required 
to reach the second word's phonetic sequence. In 
our current implementation we limit the relation to 
pairs with edit distance at most 1, e.g. cancel 
candle. 
The ~:~ relation is computed by adding each pair 
of words that have pronunciations such that their 
phonetic suffixes including and following the primary an,( 
stressed syllables match, e.g. Reno ~ Casino. 
"Anagram of" (~:~:~) 
AN The final relation, ~:~, is almost, but not quite, 
completely useless, symmetrically linking lexemes 
that use the same distribution of letters, as in 
ANA Geraldine ¢=~ realigned. This is perhaps best 
described as a "wormhole" in lexical space. 
Extensions 
A portion of the wealth of WordNet was discarded 
in Lexical FreeNet--the verb entailment relation, for 
instance. Adjectives are somewhat slighted by the 
system, as their WordNet description in terms of 
bipolar attributes (Gross and Miller, 1990) is largely 
ignored. 
Other possible semantic relations include the more 
specialized knowledge-engineered links that appear 
139 
in typically narrow-coverage semantic nets, such as 
"acts on", "uses", "stronger than", and the like. 
Data-driven approaches to relation induction that 
dig deeper than the collocation extraction of the trig- 
ger computation may prove useful and interesting. 
One approach (Richardson, 1997; Richardson et al., 
1993) bootstraps a parser to induce many uncon- 
ventional semantic relations from dictionary data. A 
link grammar (Sleator and Temperley, 1991) applied 
to data can conceivably be used to extract some in- 
teresting relations that live at the syntax/semantics 
interface. 
4 Lexical discovery 
A World Wide Web interface to Lexical FreeNet, 
depicted in Figure 3, is available and has become a 
popular online resource since its release in late Jan- 
uary, 1998.:. The program allows the user to issue 
one of the four template queries to the database do 
scribed in Section 2.3. One of these query templates 
("Fanout") requires only a single source token as in- 
put, and this has become a popular lookup tool, pro- 
viding some of the functionality of a thesaurus and 
rhyming dictionary. The other query functions re- 
quire source and target tokens to be specified. Each 
token can itself contain spaces in the case of phrasal 
inputs, which are normalized to the underscore char- 
acter in processing. The four basic queries allow the 
user to specify a subset of the ten primitives rela- 
tions to permit in the output paths by clicking a 
series of checkboxes. Upon submission, the state 
of the checkboxes sets the ANY relation to be the 
union of checked relations. 
An additional "Spell check" query mode allows 
the user to find database tokens that have similar (or 
exact) spelling to a given input token, where simi- 
larity is measured by an orthographic edit distance. 
Upon submission, the system finds and displays 
the path or paths resulting from the query with 
arrow glyphs representing the various relations. 
Queries typically finish within an acceptable time 
window of three to ten seconds. The results screen 
summarizes the query and allows the user to re- 
submit it with modifications, improving the ease of 
database "navigation" over having to return to the 
title screen. 
Feedback from the Web site indicates that the sys- 
tem has been used as an aid in writing poetry and 
lyrics; devising product names; generating puzzles 
for elementary school language arts classes; writ- 
ing greeting cards; devising insults and compliments; 
and, above all, just exploring. Following are selected 
examples of the system's output in various configu- 
rations. 
Shortest path queries 
The shortest path query is the primary vehicle for es- 
tablishing connections between words and concepts: 
• Shortest path queries that allow all lexical re- 
lations can be used to aid in generating puns 
2See http://~w.link.cs.cmu.edu/lexfn/ 
I 
I 
L=xical FreeNet 
Lee ~'m I~ ~ me e~:e 
~I '---~- I-'F"----~,--~------,-, - 
Figure 3: The front page of the Web interface to 
Lexical FreeNet 
and quips involving the two endpoint concepts. 
For example, below is the shortest path between 
Clinton and Lewinsky using all relations: 
CLINTON ~ HOUSE ~ CABIN 
KACZYNSKI ~:~ LEWINSKY 
• Shortest path queries allowing only the hypon- 
omy relations can connect any two nouns in 
the WordNet hyponymy tree through their least 
common ancestor. For example, animals can be 
connected taxonomically, as in the shortest path 
between porto and langur using only the spe- 
cialization (:~) and generalization (~:~) and 
relations: 
POTT0 ~ LEMUR ~ PRIMATE ~ MONKEY 
OLD_WORLD_MONKEY :~ LANGUR 
• Shortest path queries allowing only the meron- 
omy relations can connect many noun pairs. 
For example, geographical connections can be 
made between place names to find the largest 
enclosing region, as in the shortest path between 
Saskatoon and Winnipeg using only the com- 
prise (~::~) and part-of (=~;) relations: 
SASKAT00N ~ SASKATCHENAN ~ CANADA 
MANITOBA ~ WINNIPEG 
• It is counter-intuitive but true that most com- 
mon words can be connected using only the syn- 
onym relation (~*::~). This demonstrates the 
high degree of polysemy exhibited by familiar 
words. Consider the shortest synonym path be- 
tween one and zero. a computer scientist's fa- 
vorite antonym pair. Every successive word pair 
exhibits a different sense: 
140 
ZERO ~ CIPHER ~=~ CALCULATE SYN 
DIRECT ~ LEAD ~ STAR <~ ACE <~ ONE 
* Using only the trigger (~=~) relation, one can 
connect concepts that occur in the domain of 
the data used to train the trigger pairs, in this 
case broadcast news: 
TRG SMOKING ~ CIGARETTES ~ MACHINES 
COMPUTERS 
• The trigger relation enriches the WordNet- 
derived vocabulary of common nouns with topi- 
cal proper names, as in the shortest paths shown 
below. Trigger pairs are often expressible in 
terms of a sequence of one or more WordNet- 
derived relations. In many cases, however, 
news-based triggers defy any fixed set of hand- 
coded lexical relations. 
TRO TITANIC ~:~ SANK ~:~ SHIP ~ VALDEZ 
COFFEE 
TRG NADER ~ REGULATIONS 
ENVIRONMENTAL ~ GORE 
FALWELL ~ CHRISTIAN 
CONSERVATIVE ~ GINGRICH 
• But when the WordNet-derived semantic rela- 
tions are permitted in addition to the trigger 
relation, shortest paths become shorter, over- 
coming the inherent limitations of the data- 
derived triggers. In the case below, the pair 
(relativity, physics) did not occur suffi- 
ciently often in training data for the pair to 
make the grade as a trigger. 
EINSTEIN ~ RELATIVITY ~ PHYSICS 
VELOCITY ~ SPEED_OF.LIGHT 
• For amusement, the phonetic relations, rhymes- 
with (~=~) and sounds-like (,~:~), can be used 
alone to produce "word ladders" of sequentially 
similar words, as in the example below. In com- 
bination with the semantic relations, the pho- 
netic relations can aid in creating rhymed po- 
etry and puns. 
 IFE NINE sPINE sPOON 
Intersection queries 
Intersection queries can be used in Lexical FreeNet 
to find the set of concepts and words that two inputs 
both directly relate to in some way. We use the 
notation (wl =~, w.~ =~)w3 to mean that "wl is 
related to w3 by relation rt, and w~. is related to w3 
by relation r~. 
• For concrete nouns, the results are often ex- 
pected but sometimes subtle: 
(FROG ~.TURTLE ~:~.) POND 
II 
! 
(0aANGE TRO ==~, APPLE JUICE 
(BANANA ~, ONION ~) PEEL 
(BOOK ~, TELEVISION ~) STORY 
(TREE 5, TOOTH CRO  
* Triggers can be a useful tool for discovering 
what two names in the news have in common, 
or two names in history: 
(STARR ~, MCDOUGAL ~:~:~) WEtITEVATER 
(CHURCHILL r.o T.O :==:~, STALIN =:::~) 
HITLER, ROOSEVELT, TRUMAN, POTSDAM 
• In some cases, identification questions can be 
formulated as intersection queries. For exam- 
ple, "What's the name of that congresswoman 
from Colorado I'm always hearing about?" can 
be asked as an intersection query with argu- 
ments (congressvoman, Colorado). "What's 
the capital of the state of Nebraska?" can be 
asked as an intersection query with arguments (Nebraska, state_capital): 
(COLORADO ~=~, CONGRESSWOHAN :~:~:~) 
SCHROF..DEE 
(NEBRASKA coM =*., STAT'~._CAPITAL :Z~:~) LINCOLN 
Rhyme coercion queries 
The phonetic relations in Lexical FreeNet are par- 
ticularly useful for finding rhyming words with cer- 
tain target meanings. The coercion function on the 
Web interface is hardcoded such that the relation 
ret (see Section 2.3) is simply the union of all se- 
mantic relations, and re2 is the union of all phonetic 
relations. Thus, given two endpoint words (wt, w.,), 
the system tries to find words (w~, w'), with respec- 
tively related meanings, that rhyme or sound alike. 
For example, if you wanted to write a poem about 
petting a lion, you might do a coercion query with 
the words Couch and lion. Amongst a few oth- 
ers, you'll get back the suggestions (RUB, CUB), since 
TOUCH ~ RUB and LION ~ CUB; and (PAT, CAT), 
since TOUCH ~ PAT and LION ~ CAT. Most rhyme 
coercion queries to the online system have produced 
at least one result in this manner. 
5 Conclusion 
We have introduced a database system called 
FreeNet that facilitates the description and explo- 
ration finite binary relations, and also an instance 
of the system called Lexical FreeNet that supports a 
range of lexical semantic applications. The program 
has proven itself to be a useful and entertaining re- 
source for lexical discovery by lnternet users. We 
hope to employ the system as a common algorithmic 
core for three text processing applications as well~ 
segmentation, summarization, and information ex- 
traction. 
141 
Acknowledgments 
The author thanks Michael Turniansky for early 
feedback on this work; Adam Berger for developing 
the Trigger Toolkit; Carl Burch for help with the 
phonetic and orthographic edit distance functions; 
Bob Harper and John Lafferty for useful discussions; 
and the many users of the World Wide Web inter- 
face who have provided entertaining feedback on the 
system. 

References 
D. Beeferman, A. Berger, and J. Lafferty. 1997. A 
model of lexical attraction and repulsion. In Pro- 
ceedings of the ACL, Madrid, Spain. 
A. Blum, G. Konjevod, R. Ravi, and S. Vempala. 
to appear. Semi-definite relaxations for minimum 
bandwidth and other vertex-ordering problems. 
In Proc. of the 30th A CM Symposium on the The- 
ory of Computing, pages 95-100. 
CMU. 1997. Carnegie Mellon Univer- 
sity Pronouncing Dictionary v0.6d. 
http://www.speech.cs.cmu .edu/cgi- bin/cmudict. 
C. Fellbaum. 1990. English verbs as a semantic net. 
International Journal of Lezzcography, 3,4:278- 
301. 
D. Gross and K. Miller. 1990. Adjectives in 
WordNet. International Journal of Le~cography, 
3,4:265-277. 
LDC. 1997. DARPA Continuous Speech Recogni- 
tion Corpus-IV: Radio Broadcast News (CSRIV 
Hub-4). http://morph.ldc.upenn.edu/. 
G. Miller. 1990. Nouns in WordNet: a lexical inher- 
itance system. International Journal of Lexicog- 
raphy, 3,4:245-264. 
C. Papadimitriou. 1976. The NP-completeness of 
the bandwidth minimization problem. Comput- 
ing, 16:263-270. 
S. Richardson, L. Vanderwende, and W. Dolan. 
1993. Combining dictionary-based and example- 
based methods for natural language analysis. In 
Proc. Fifth International Conference on Theoret- 
ical and Methodological Issues in Machine Trans- 
lation, pages 69-79. 
S. Richardson. 1997. Determinin9 Similarity and 
In\[erring Relations in a Lezical Knowledge Base. 
Ph.D. thesis, The City University of New York. 
R. Rosenfeld. 1994. Adaptive Statistical Language 
Modeling: a Maximum Entropy Approach. Ph.D. 
thesis, Carnegie Mellon University, April. 
D. Sleator and D. Temperley. 1991. Parsing English 
with a link grammar. Technical Report CMU- 
CS-91-196, School of Computer Science, Carnegie 
Mellon University. 
