Experiments in Automated Lexicon Building for Text Searching 
Barry Schiffman and Kathleen R. McKeown 
Department of Computer Science 
Columbia University 
New York, NY 10027, USA 
{ bschiff,kathy} @)cs.columbia.edu 
Abstract 
This paper describes experiment's in the automat'ic 
construction of lexicons that would be useflfl in 
searching large document collect'ions tot text frag~ 
ments tinct address a specific inibrmation need, such 
as an answer to a quest'ion. 
1 Introduction 
In develot)ing a syst'em to find answers in text to 
user questions, we mmovered a major obstacle: Doe- 
mnent sentences t'hat contained answers dkl not of_ 
ten use the same expressions as the question. While 
an:;wers in documents and questiolts llse terms that' 
are relat'e(l to each other, a system that sear(:hes for 
answers based on the quesl:ion wording will often 
fail. 3.b address t'his probleln, we develol)ed tech- 
niques to al,tomatically build a lexicon of associated 
terms t'hat can be used to hell) lind al)lIrol/riate bext' 
seglllent,s. 
The mismatch })et'ween (tuestion an(l doctlttlent 
wording was I)rought home to us in an analysis of a 
testbed of question/answer l/airs. \~Ze had a collec- 
tion of newswire articles about the Clinton impeach- 
ment t'() use as a small-scale corl)uS fin' development 
of ;_t system. V~Ze asked several l)eol)le to 1)ose ques- 
tions about this well-known t'opic, but we (lid not 
make the corpus availal)le to our cont'ril)utors. \~Ze 
wanted to avoid quest'ions that tracked t'he terminol- 
ogy in t'he corlms too (:losely to sinnllate quest'ions 
t'o a real-world syst'em. The result was a set of ques- 
tions that used  that' rarely nmtched t'he 
phrasing in the. corl)us. \,Ve had expected t'hat' we 
would be able to make most of these lexical connec- 
tions with the hel l) of V~rordnet (Miller, 1990). 
For example, consider a simple quest'ion al)out tes- 
timony: "Did Secret Service agents give testimony 
about' Bill Clinton?" There is no reason t'o expect 
that' the answer would appear 1)aldly st'ated as "Se- 
cret Service. agents dkl testi(y ..." What we need 
to know is what' testimony is about', where it: occurs, 
who gives it. The answer would lie likely to be found 
in a passage ment'ioning juries, or 1)roseeut'ors, like 
these tbund in our Clinton corl)uS: 
Starr immediately brought Secret Service 
employees before tim grand jury for ques- 
tioning. 
Prosecutors repeat'edly asked Secret Ser- 
viee 1)ersonnel to rel)eat' gossil) they may 
have heard. 
Yet, tile V~ordnet synsets fbr "testinlony" offer: 
"evidence, assertion, averment alia asseveration," 
not a very hell)tiff selection here. -Wordnet hyper- 
nyms become general quickly: "declarat'ion," "indi- 
cat'ion" and "inforlnation" are only one st, eli u 1) in 
t'lle hierarehy. Following these does not lead us into 
a courtroom. 
We asked our cont'ril)ut'ors for a second round of 
questions, but this time made the corpus available 
to them, exl)laining t'hat we wanted to be sure the 
answers were contained in t'he collection of articles. 
'J'he result was a set of questions that' mueh more 
closely matched t'he wording in the corpus. This was~ 
in t'aet, what' the 1999 DARPA question-answering 
(:oml)et'ition did in order t'o ensure that their ques- 
tions couhl be answered (Singhal, 1!199). The sec- 
trod question-answering conference adopted a new 
approach to gathering questions and verifying sepa- 
rately that' they a.re answerable. 
Our intuition is t'hat if we can lind the tyl)ical 
lexical neighborhoods of concept's, we can efficiently 
locate a concept described in a query or a question 
without needing to know the precise way the answer 
is phrased and without relying on a cost'ly, hand- 
built concept' hierarchy. 
The example above illustrat'es the 1)oint. Tes- 
timony is given 1) 3, wit'nesses, defendant's, eyewit- 
nesses. It is solicited by 1)rosecutors, counsels, 
lawyers. It is heard by judges, juries at trials, hear- 
ings, and recorded in depositions and transcripts. 
What' we wanted was a complete description of t'he 
world of testimony - the who, what, when and 
where of the word. Or, in other words, the "meta- 
aboutness" of terms. 
To this end, we exl)erimented /tSitlg shallow lin- 
guist.k: techniques t'o gat'her and analyze word co- 
occurrence data in various configurat'ions. Unlike 
previous collocation research, we were int'erested 
in an expansive set' of relationships between words 
719 
rather than a specific relationship. More important, 
we felt that the information we needed could be de- 
rived from an analysis that crossed clause and sen- 
tence boundaries. We hyl)othesized that news ar- 
ticles would be coherent so that the sequences of 
sentences and clauses would be linked conceptually. 
We exanfined the nouns in a number of configura- 
tions - paragraphs, sentences, clauses and sequences 
of clauses - and obtained tile strongest results from 
configurations that count co-occurrences across the 
surface subjects of sequences of two to six clauses. 
Exl)eriments with multi-clause configurations were 
generally more accurate in a variety of experiments. 
In the next section, we briefly review related re- 
search. In section 3 we describe our experiments. 
In section 4, we discuss the problem of evaluation, 
and look ahead to future directions in the concluding 
sections. 
2 Related Work 
There has been a large body of work ill the collec- 
tion of co-occurrence data from a broad spectrum of 
perspectives, fi'om information retrieval to the devel- 
opnlent of statistical methods for investigating word 
similarity and classification. Our efforts fall some- 
where in tile middle. 
Compared with document retrieval tasks, we are 
more closely focused on the words themselves and 
on specific concepts than on document "aboutness." 
Jing and Croft (1994) exanfined words and phrases 
in paragraph units, and found that the association 
data improves retrieval performance. Callan (1994) 
compared paragraph units and fixed windows of text 
in examining passage-level retrieval. 
In the question-answering context, Morton (1999) 
collected document co-occurrence statistics to un- 
cover 1)art-whole and synonymy relationships to use 
in a question-answering system. The key differ- 
ence here was that co-occurrence was considered on 
a whole-docmnent basis. Harabagiu and Maiorano 
(1999) argued that indexing in question answering 
should be based on 1)aragraphs. 
One recent al)proach to automatic lexicon build- 
ing has used seed words to lmild up larger sets of 
semmltically similar words in one or nlore categories 
(Riloff and Shepherd, 1997). In addition, Strza- 
lkowski and Wang (1996) used a bootstrapping tech- 
nique to identify types of references, and Riloff and 
Jones (1999) adapted bootstrapping techniques to 
lexicon building targeted to information extraction. 
In the same vein, researchers at Brown Univer- 
sity (Caraballo and Charniak, 1999)~ (Berland and 
Charniak, 1999), (Caraballo, 1999) and (Roark and 
Charniak, 1998) focused on target constructions, in 
particular complex noun t)hrases, and searched for 
information not only on identifying classes of nouns, 
lint also hypernyms, noun specificity and meronymy. 
We have a diflbrent perspective than these lines of 
inquiry. They were specifying various semantic rela- 
tionships and seeking ways to collect similar pairs. 
We. have a less restrictive focus and are relying on 
surface syntactic information about clauses. 
For more than a decade, a variety of statistical 
techniques have been developed and refilled. Tile 
focus of much of this work was to develop the 
methods themselves. Church and Hanks (1989) ex- 
plored tile use of mutual information statistics in 
ranking co-occurrences within five-word windows. 
Smadja (1992) gathered co-occurrences within five- 
word windows to find collocations, particularly in 
specific domains. Hindle (1990) classified nouns 
on the basis of co-occurring patterns of subject- 
verb and verb-object pairs. Hatzivassiloglou and 
MeKeown (1993) clustered adjectives into semantic 
classes, and Pereira et al. (1993) clustered nouns on 
their appearance ill verb-object pairs. We are try- 
ing to be less restrictive in learning multiple salient 
relationshil)s between words rather than seeldng a 
particular relationship. 
Ill a way, our idea is the mirror image of Barzilay 
and Elhadad (1997), who used Wordnet to identify 
lexical chains that would coincide with cohesive text 
segments. We assunmd that documents are cohesive 
and that co-occurrence l)atterns call uncover word 
relationships. 
3 Experiments 
Tile focus of onr experiment was on units of text in 
which the constituents must fit together in order for 
the discourse to be coherent. We made the assump- 
tion that the documents in our corpus were coherent 
and reasoned that if we had enough text, covering 
a broad range of topics, we could pick out domain- 
independent associations. For example, testimony 
can be about virtually anything, since anything can 
wind up in a court dispute. But over a large enough 
collection of text, the terms that directly relate to 
tile "who," "what" and "where" of testimony per 
se should appear in segments with testimony more 
frequently than chance. 
These associations do not necessarily appear in a 
dictionary or thesaurus. When huntans explain all 
unfamiliar word, they often use scenarios and analo- 
gies. 
We divided the experiments in two groups: one 
group that looks at co-occurrences within a single 
unit, and another that looks at a sequence of units. 
In the first group of experinmnts, we considered 
paragraphs, sentences and clauses, each with and 
without prepositional phrases. 
• Single paragraphs with/without PP 
• Single sentences with/without PP 
• Single clauses with/without PP 
720 
\]in the second group, we considered two clauses 
and sequences of subject 110un phrases from two to 
six chmses. Ill this group, we had: 
,, Two clauses with/without Pl) 
,, A sequence of subject NPs fl'onl 2 clauses 
A sequence of subject NPs Dora 3 clauses 
,, A sequence of subject NPs from 4 clauses 
• A sequence of subject NPs fi'om 5 clauses 
,, A sequence of subject NPs from 6 clauses 
The intuition for the second groul) is that a topic 
flows from one granmm.tical unit to another so that 
the salient nouns, l)articularly the surface subjects, 
in successive clauses should reveal the associations 
we are seeldng. 
'\[lo illustrate the method, consider the three-clause 
configuration: Say that ~vordi apl)ears in clausc,~. 
We maintain a table of all word pairs and increment 
the entries for O,,o,'(h , ',,,o,'d~ ), where ,,0,% is a sub- 
ject noun in cla'usc,~, clauscn+~, or ell'use,+2. No 
effort was made to resolve pronomial references, and 
these were skipped. 
We used nollnS Olfly' because l)reliminary tests 
showed that pairings between nouns seemed to stand 
out. V~Te included tokens that were tagged as 1)roper 
nallleS when they also have have conlnlon nleanings. 
For example, consider the Linguistic Data Consor- 
l;ium at the University of Pennsylvania. Data, Con- 
sortium and University wouM be on tile list used to 
build the table of nmtchul)s with other nouns, \])lit 
l)emlsylvania would not. V~To also collected noun 
modifiers as well as head nouns as they can carry 
more information than the surface heads, such as 
"business group", '".science class" or "crinm scene." 
The corpus consisted of all tile general-interest ar- 
ticles from the New York Tinms newswire in 1996 
in the North American News Corlms , and (lid not 
include either st)orts or l)usiness news. We tirst re- 
moved dul)licate articles. The data fl'om 1996 was 
too slmrse for the sequence-of-subjects contigura- 
lions. '\]'o l)alance the expcrinmnts better, we added 
another year's worth of newswire articles, from 1995, 
tbr the sequence-of subject configurations so that we 
had more than one million matchups for each con- 
figuration (Table 1). 
The I)roeess is flflly automatic, requiring no su- 
1)ervision or training examples. The corpus was 
tagged with a decision-tree tagger (Schmid, 1994) 
and parsed with a finite-state parser (Abney, 1996) 
using a specially written context-fi'ee-grannnar that 
focused on locating clause boundaries. The gram- 
mar also identified extended noun l)hrases in tile sub- 
ject position, verb l)hrases and other noun l)hrases 
and prepositional 1)hrases. The nouns in the tagged, 
parsed corl)uS were reduced to their syntactic roots 
(removing l)lurals from nouns) with a lookup table 
created t'rom Wordnet (Miller, 1990) and CELEX 
(1995). We. performed this last step mainly to ad- 
dress the sparse data problem. There were a sub- 
stantial nunfl)er of paMngs that occurred only once. 
We elinfinated from considerat;ion all such single- 
tons, although it did not al)peal to have much etfect 
on the overall outcome. 
Confi.q Matchups 
Para +pp 6.5 million 
Sent 1.7 million 
Sent +pp 4 million 
1 Clause 1.1 million 
1 Clause +pp 2.8 million 
2 Clause 1.9 million 
2 Clause +I)P 5 nfillion 
Subj 2 Clause 1.1 million* 
Subj 3 Clause 1.6 million* 
Subj 4 Clause 2.1 million* 
Subj 5 Clause 2.6m~ 
Subj 6 Clause 3.1 million* 
'lhble 1: Nmnl)er of matchut)s ibund; tile "*" de- 
notes the inclusion of 1995 data 
There were about 1.2 million paragraphs, 2.2 mil- 
lion sentences and 3.4 million clauses in the selected 
portions of the 1996 COl'pus. The total number of 
words was 57 million. Table 2 shows the nmnl)er of 
distinct nouns. 
I I All Extracted 
No l)ps 74,500 
W/pps 91,700 
Subjs 51,000 
Counts > 1. 
44,400 
53,900 
30,800 
Td)le 2: Distinct Nouns, 1996 Data 
To score the nmtchups in our initial exlmriments , 
we used the Dice Coeliicient, which l)roduces values 
i'ronl 0 to 1, to measure the association between pairs 
of words and then produced an ordered association 
list fl'om the co-occurrence table, ranked according 
to the scores of the entries. 
2 • f,'~q(wo,.,h n,oo,%) 
score,, = frcq(wordi) + frcq(wordj) 
One 1)roblem was immediately a l)parent: The 
quality of tile association lists wxried greatly. Tile 
scoring was doing an acceptable job in ranking the 
words within each list, but tile scores varied greatly 
from one list to another. Our initial strategy was 
to choose a cutoff, which we set at 21 tbr each list, 
and we tried several alternatives to weed out weak 
associations. 
721 
In one method, we filtered the association lists 
by cross-referencing, removing from the association 
list for wordi any wordj that failed to reciprocate 
and to give a high rank to wordi on its association 
list. Another similar approach was to try to con> 
bine evidence fl'om different experiments by taking 
the results fl'om two configurations into considera- 
tion. A third strategy was to calculate the mutual 
information between the target word and the other 
words on its association list. 
scorc,,i = p(xy) * log \p(z)p(y) (p(xy) ) 
Using the mutual information computation pro- 
vided an way of using a single measure that was able 
to compare matchups across lists. We set a threshold 
of lxl.0 -6 for all matchups. Thus these association 
lists vary in length, depending on the distributions 
for the words, allowing them to grow up to 40, while 
some ended up with only one or two words. 
4 Evaluation 
The evaluation of a system like ours is problematic. 
The judgments we made to determine correctness 
were not only highly subjective but time-consunfing. 
We had 12 large lexicons fl'om the different config- 
urations. We had chosen a random sample of 10 
percent of the 2,700 words that occurred at least 
100 times in tile corpus, and manually constructed 
an answer key, which ended up with ahnost 30,000 
entries. 
From the resulting 270 words, we discarded 15 of 
those that coincided with common names of peo- 
ple, such as "carter," which could refer to the for- 
mer American president, Chris Carter (creator of 
tile television show "X-Files"), among others. We 
thought it better to delay making decisions on how 
to handle such cases, especially since it would require 
distinguishing one Carter fl'om another. Such words 
presented several difficulties. Unless the individuals 
involved were well-known, it was often impossible to 
distinguish whether the system was making errors 
or whether the resulting descriptive terms were in- 
tbrmative. 
Tables 3 and 4 show an example from the answer 
key tbr the word "faculty." 
The overall results from the first stage of the pro- 
cess, before the cross-referencing filter are shown in 
Table 5, ranging from 73% to 80% correct. The con- 
figurations that included prepositional phrases and 
those that used sequences of subject noun phrases 
outperformed the configurations that relied on suh- 
jects and objects in a single grammatical unit. These 
differences were statistically significant, with p < 
0.01 in all eases. 
The overall results after cross-referencing, in Ta- 
ble 6, showed improvements of 5 to 10 percentage 
enrollment hiring adnfinistrator 
journalism alumnus student 
school union math 
engineering curriculum trustee 
group seminar thesis 
tenure stair department 
mathematician educator member 
ivy arts college 
chancellor report senate 
activism university el,airman 
professor teaching law 
regent doctorate mtministration 
academic committee semester 
board camI)us undergraduate 
salary council research 
president adviser mathematics 
course advisor sociology 
dean study science 
teacher cannon provost 
vote 
Table 3: Answer Key for Faculty: OK 
load tratllcway unrest 
architecture diversity hurdle 
shield minority revision 
disburse percent woman 
clement 
Table 4: Answer Key ff)r Faculty: Wrong 
points, while the effect of the number of matchups 
was diminished. Here, the subject-sequence config- 
urations showed a distinct advantage. While more 
noise might be expected when a large segment of text; 
is considered, these results support the notion that 
the nnderlying coherence of a discourse can be recov- 
ered with the prol)er selection of linguistic features. 
The improvements in each configuration over the 
corresponding configuration in the first stage were 
all statistically significant, with p < 0.01. Likewise, 
the edge the sequence-of subjects configurations had 
over tile other configurations, was also statistically 
significant. 
The results fl'om combining the evidence from dif- 
ferent configurations, in Table 7, showed a much 
higher accnrae> but a sharp drop in the total nnm- 
ber of associated words found. The most fl'uitful 
pairs of experiments were those that combined dis- 
tinct approaches, for example, tile five-subject con- 
figuration with either fifll paragraphs or with sen- 
tences with prepositional phrases. It will remain 
unclear until we conduct a task-based evaluation 
whether the smaller number of associations will be 
harnfful. 
The final experiment, computing the mutual in- 
formation statistic tbr the matchul)s of a key word 
with co-occurring words was perhaps the most ill- 
teresting because it gave us the ability to apply a 
722 
(Jontig OK Wrong l)ct OK 
Para +l/l) 3832 1054 78 
,qent 3773 1270 75 
Sent +Pl) 3973 1070 79 
\] Clause 3652 1371 73 
\] Clauses q-l)l) 3935 1108 78 
"! Clauses 3695 1328 74 
"! Clauses -t-l)l) 3983 1018 80 
Subj 2 CI 3877 1139 77 
Sul)j 3 CI 3899 1117 78 
Subj 4 CI 3!)(/5 :1082 78 
Sul)j 5 C1 390d 1076 78 
Sul)j 6 CI 3909 1066 7!) 
Table 5: Results 13efore Cross Iloferencing 
Contig ()K Wrong Pet ()K 
Para q-Pl) 3651} 73/1 83 
Sent 3328 742 82 
Sent -bpp 3751 8:18 82 
:1 Clause 3067 748 80 
1 Clauses +1)I / 3659 826 82 
2 Cbmses 3048 55d 85 
2 Clauses +pp 3232 60d 8d 
Subj 2 CI 2910 450 87 
t-;ul~j 3 CI 3020 4d() 87 
Subj 4 CI 3050 d28 88 
l~tll).j 5 (J\] ;1:12t3 dd2 88 
Subj 6 C1 3237 dd9 88 
'lhble 6: l{esults After Cross Referencing 
single threshold across different key words, saving 
the effort of performing the cross-retbrencing calm> 
lations and providing a deeper assorl:ment in SOllle 
C~lSeS. lilt lnost of the configurations, lltlltllPl illfOr- 
mat.ion gave 118 lllore \Vol'ds, and greater ln'ecision 
at; the sanle time, but nlost of all, gave us a reason- 
able threshold to apply throughout the exlicrinlent. 
While the accuracies in most of the configurations 
were close to one another, those that used only sin- 
g\]e units tended to be weaker than the multi-clause 
units. Note that the paragraI)h contiguration was 
tested with far more data than any of the others. 
Our system maD~s no eth)rt to aeCOllnt for lexi- 
cal aml)iguil;y. The uses we intend for our lexicon 
should provide some insulation from the ett'ects of 
polysemy, since searches will be conducted on a nun> 
l)er of terms, which should converge to one meaning. 
It is clear that in lists for key words with multi- 
ple senses, the donfinant sense where there is one, 
al)pears much lnore frequently, such as "faculty ," 
where the meaning of "teacher" is more t:'re(tuent 
than the meaning of "al)ility." Figure \] shows the 
top 21 words in the sequence-otCsix subjects, beibre 
the cross-referencing iilter was applied. Twenty of 
the 21. entries were scored aeceptal)le. 
After the cross-referencing is applied, doctorate,, 
education and revision were elinfinated. 
Contig OK \¥rong Pcl; OK 
l~ara 2003 183 92 
Sent 1962 222 90 
Sent-t- 2033 213 91 
1 Clause 1791 218 89 
1 Clause+ 2004 198 91 
2 Clause 2028 277 88 
2 Clause+ 2:129 24,1 90 
Tal)le 7: Results of coml)ining evidence; all configu- 
rations were combined with the sequence of six sub- 
jects 
Conlig OK "Wrong Pet OK 
Para +pp 4923 807 86 
Sent 5193 990 84 
Sent +Pl) 4876 775 86 
1 Claus(; 52!)!/ 1233 81 
1 Clauses-t-l)l) 5047 878 85 
2 Clauses 5025 928 84 
2 Clauses -I-Pl) d668 728 87 
Subj 2 C1 5229 939 85 
Subj 3 C1 5187 860 85 
Subj ~1 C1 5119 808 86 
Subj 5 C1 500"{ 76d 87 
Subj 6 CI 4!)80 736 87 
Table 8: llesults with mutual information 
The results from the single clause configuration 
(Figure 2) were almost as strong, with three erroFs, 
and a fair amount of overlap between the two. 
The word "admiral" was more difficult %r the ex- 
\])erilllellt ilSilig the l)ice coefficient. The. list shows 
some of l.he confusion arising from our strate.~y Oll 
prot)er nouns. Admiral would be expected to oc- 
cur with many proper ll~tnles, illcluding some that 
axe st)elled liD; common 11o1111.q, bill the list h)r the 
single clause q pp configuratkm presented a lmzzling 
list (Figure 3). 
The sparseness of the data is also al)lmrent, but it 
was the dog reDxenees that al)peared quite strange 
at a ghulce: Inspection of the. articles showed that 
they callle froln all a.rticle on the pets of famous 
people. Note that the dogs did not al)l)ear in top 
ranks of the sequence of subjects configuration in 
the Dice experiment (Figure 4), nor were they in the 
results t'rom the experiments with cross-referencing, 
combining evidence and mutual information. 
After cross-reR;rencing, the much-shorter list for 
the Sub j-6 configuration had "aviator", "break-up", 
';commander", "decoration", "equal-ot)portunity", 
"tleet", "merino", "navf', "pearl", "promotion", 
"rear" ~ alia "short". 
'l'he combined-evidence list contained only eight 
words: "navy", "short", "aviator", "merino", "dis- 
honor", "decoration", "sul)" and "break-ul)". 
Using the mutual intbrlnation scoring, the list 
in the Subj-6 configuration tbr admiral had only 
723 
faculty trustee(51) 0.053; carat)us(d1) (/.045; 
college(ll3) 0.034; member(369) 0.028; profes- 
sor(102) 0.028; university(203) 0.027; student(206) 
0.025; regent(19) 0.025; tenure(15) 0.025; ctmncel- 
lor(28) 0.023; administrator(34) 0.023; provost(12) 
0.023; dean(27) 0.021; ahmmus(13) 0.021; math(12) 
0.017; revision(8) 0.013; salary(13) 0.013; so- 
ciology(7) 0.013; educator(l l) 0.012; doctorate(6) 
0.011.; teaching(9) 0.011; 
Figure 1: Tile top-ranked matchups for "fac- 
ulty" from the Subj-6-Clause configuration be- 
fore cross-referencing. The nmnbers in paren- 
theses are the number of matchups and the real 
umnbers following are the scores. Errors are in 
bold 
faculty trustee(31) 0.033; meml)er(266) 0.025; ad- 
nfinistrator(31) 0.023; college(42) 0.012; dean(15) 
0.012; tenure(8) 0.011; ivy(6) 0.011; staff(a3) 0.01; 
semester(6) 0.01; regent(7) 0.01; salary(12) 0.01; 
math(7) 0.008; professor(a1) 0.008; load(6) 0.007; 
curricuhun(5) 0.006; revision(4) 0.006; minor- 
ity(ll) 0.006; 
Figure 2: The top-ranked matchups for "fac- 
ulty" under the single clause confignration. Er- 
rors are in bold. 
nine words: "navy", "general", "commander", 
"vice", "promotion", "officer", "fleet", "military" 
and "smith." 
Finally, the even-sparser mutual information list 
for the paragraph configuration lists only "navy" 
and "suicide." 
5 Conclusion 
Our results are encouraging. We were able to deci- 
pher a broad type of word association, and showed 
that our method of searching sequences of subjects 
outperformed the snore traditional approaches in 
finding collocations. We believe we can use tiffs tech- 
nique to build a large-scale lexicon to help in diffi- 
cult information retrieval and information extraction 
tasks like question answering. 
The most interesting aspect of" this work lies in 
the system's ability to look across several clauses 
and strengthen tile connections between associated 
words. We are able to deal with input that con- 
tains numerous errors from the tagging and shallow 
parsing processes. Local context has been studied 
extensively in recent years with sophisticated statis- 
tical tools and the availability of enormous amounts 
of text in digital form. Perhaps we can expand this 
perspective to look at a window of perhaps several 
sentences by extracting the correct linguistic units in 
order to explore a large range of  processing 
problems. 
admiral- navy(all) 0.027; ayMon(d) 0.024; cheat- 
ing(5) 0.02; gallantry(3) 0.016; chow(4) 0.015; ser- 
vice,nan(d) 0.013; short(3) 0.013; wardroom(2) 
0.012; american(2) 0.012; enos(2) 0.012; self- 
assessment(2) 0.(/11; merino(2) 0.011; ocelot(2) 
0.011; wolfhound(2)0.011; igloo(2)0.011; pa- 
prika(2) 0.011; spaniel(2) 0.01; medal(8) 0.01; 
awe(a) 0.01; pedigree(2) 0.009; te,'rier(2) 0.009; 
Figure 3: Top-ranked matchups for "adnfiral" 
under the clause +pp configuration. 
admiral - navy(88) 0.071; short(7) 0.03; promo- 
tion(ll) 0.027; hal)l)iness(8) 0.026; fleet(ll) 0.024; 
aviator(5) 0.022; mnbition(8) 0.019; merino(3) 
0.019; dishonor(3)0.018; rear(4)0.018; deco- 
ration(4) 0.015; sub(a) 0.013; airman(3) 0.013; 
graveses(2) 0.012; submariner(2) 0.012; equal- 
opportunity(2) 0.012; break-up(2) 0.012; comman- 
der(18) 0.012; pearl(7) 0.012; l)rophccy(d) 0.01.2; 
torturer(2) 0.012; 
Figure 4: The list for admiral fi'om the Sub j-6 
contiguration. 
6 Future Work 
• We will have the scoring key itself evaluated by 
people who are not involved in tile research. 
• ~re are planning to conduct task-based evalua- 
tion in question answering. 
• We are considering deploying a named entity 
module to provide sonic classification of which 
proper nouns should be counted and which 
should not. 
• We 1)lan to experiment with ways to incorpo- 
rate using examining verbs and making use of 
surface objects in the configurations with se- 
quences of clauses, as well as strengthen the fi- 
nite state grammar. 
• We will explore using tile system to extract bi- 
ographic information. 
Acknowledgments 
This material is based upon work supported by tile 
National Science Foundation under grants Nos. IIS- 
96-19124 and IRI-96-18797, and work jointly sup- 
ported by the National Science Foundation and the 
National Library of Medicine under grant No. IIS- 
98-17434. Any opinions, findings, and conclusions 
or recmmnendations expressed in this material are 
those of tile authors and do not necessarily reflect 
the views of the National Science Foundation. 
724 

References 

Steven Abney. 1996. Partial parsing via finite-state 
cascades. In Proceedin9s of th, e ESSLLI '95 Robust 
Parsin9 Workshop. 

Regina Barzilay and Michael Elhadad. 1997. Using 
lexical chains tbr text smmnarization. In Pwcced- 
ings of the Ntelligent Scalable Text b'ummariza- 
tion Workshop. ACL. 

Matthew Berland and Eugene Charniak. 1999. 
Finding parts in very large corpora. 'l.bchnical Re- 
port TR CS99-02, Brown University. 

James P. Callan. 1994. Passage-level evidence in 
document retrieval. In Proceedin9s of the Seven- 
teenth Annual Intcunational A CM SIGIR Confer- 
ence, Dublin, Ireland. ACM. 

Sharon Caraballo and Eugene Charniak. 1999. De- 
termining the speciticity of nouns from text. In 
P~vceedinfls of Co~@rcnce on E,mpi~eal Methods 
in Nat'u'ral Langua9e Processing. 

Sharon Caraballo. 1999. Automatic acquisition of 
a hylmrnym-labeled noun hierarchy from text. In 
Pwceedings of th, e 37th Annual Meeting of the As- 
sociation for Comp'utational Linguistics, June. 

CELEX, 19!)5. Tit(; CELEX lezical database 
Dutch,, English, Ge.rntan. Centr for Lexical hffor- 
mation, Max Planck Institute for Psycholinguis- 
ties, Nijmegen. 

Kenneth W. Church and Patrk:k Itanks. 1989. Word 
association norms, mutual infornmtion and lexi- 
cography. In Proceedings of th.e 27th. nteetin9 of 
the ACL. 

S&nda ~/\[. Ilara,bagiu anti S|;even J. Maiorano. 1999. 
Finding answers in large collectkms of texts: Para- 
graph indexiltg -t- adductive inference. In Q'aes- 
tion Answering Systema'. AAAI, November. 

Vasileios Hatziw~ssiloglou and Kathleen R. McKe- 
own. 1993. 'lbwards the automatic identification 
of adjectival scales: Clustering adjectives accord- 
ing to meaning. In P~vceedin9 s of th, c 31st Annual 
Meeting of th, e A CL. 

Donald Hindle. 1990. Noun classitication ti'om 
predicate-argument structures. In PTvceedin9s of 
the 28th Annual Meeting of the A CL. 

Yufcng Jing and W. Bruce Croft. 1994. An associa- 
tion thesaurus for information retrieval, tech. rep. 
no 94-17. 2bchnical report, Amherst: University 
of Massachusetts, Center for Intelligent hfforma- 
tion Retrieval. 

G. Millet'. 1990. Wordnet: An on-line lexical 
database. International .lournal of Lezicoqraphy. 

Thomas S. Morton. 1999. Using coreibrence tor 
question answering. In P~vccedings of thc Work- 
shop on Coreference and Its Applications, l)ages 
85-89, College Park, Maryland, June. Associa- 
tion for Computational Linguisties, Association 
for Computation Linguistics. 

Fernando Pereira, Naffali Tishby, and Lillian Lee. 
1993. Distributional clustering of english words. 
In Pwcecdings of the 31st Annual Meeting of the 
ACL. 

Ellen Rilotf and Pmsie Jones. 1999. Learning die- 
tionarics for intbrmation extraction by multi- 
level bootstral)ping. In Proceedings of the Six- 
teenth Na, tional Co~@rencc on Artificial Intelli- 
gence. AAAI. 

Ellen I{ilotf and Jessica Shepherd. 1997. A corpus- 
based approach for building semantic lexicons. In 
Proceedings of the Second Conference on Empir~i - 
cal Meth, ods in Natural Langua9 c Processing. 

Brian lloark and Eugene Charniak. 1998. Noun- 
phrasae co-occurrence statistics for semi- 
automatic semantk: lexicon construction. In 
P~vcccdings of thc 36th Annual Meetin9 of the 
Association for Computational Linguistics and 
the 17th htternational Conference on Computa- 
tion Linguistics. 

Hehnut Schmid. 1994. Probabilistic part-of speech 
tagging using decision trees. In Proceedings of the 
International Cor@rence on New Methods in Lan- 
9ua.qe Proecssin9. 

Amit Singhal. 1999. Question and answer track 
home page. WWW. 

Frank Smadja. 1992. Retrieving collocations fi'om 
text: Xtract. Comp'll, tational Linguistics, Special 
Issue. 

Tomek Strzalkowski and Jin V~rang. 1996. A self- 
learning universal concept spotter. In lhvceedinfls 
of th, e International 6'm@renee on Computational 
Linfluisties (Colin 9 199@. 
