Using Semantically Motivated Estimates to Help 
Subcategorization Acquisition 
Anna Korhonen 
Computer Laboratory, University of Cambridge 
Pembroke Street, Cambridge CB2 3QG, UK 
alk23@cl, cam. ac. uk 
Abstract 
Research into the automatic acquisition of 
subcategorization frames from corpora is 
starting to produce large-scale computational 
lexicons which include valuable frequency in- 
formation. However, the accuracy of the 
resulting lexicons shows room for improve- 
ment. One source of error lies in the lack 
of accurate back-off estimates for subcatego- 
rization frames, delimiting the performance 
of statistical techniques frequently employed 
in verbal acquisition. In this paper, we 
propose a method of obtaining more accu- 
rate, semantically motivated back-off esti- 
mates, demonstrate how these estimates can 
be used to improve the learning of subcatego- 
rization frames, and discuss using the method 
to benefit large-scale lexical acquisition. 
1 Introduction 
Manual development of large subcategorised 
lexicons has proved difficult because pred- 
icates change behaviour between sublan- 
guages, domains and over time. Yet parsers 
depend crucially on such information, and 
probabilistic parsers would greatly benefit 
from accurate information concerning the rel- 
ative frequency of different subcategorization 
frames (SCFs) for a given predicate. 
Over the past years acquiring subcatego- 
rization dictionaries from textual corpora has 
become increasingly popular (e.g. Brent, 
1991, 1993; Ushioda et al., 1993; Briscoe and 
Carroll, 1997; Manning, 1993; Carroll and 
Rooth 1998; Gahl, 1998; Lapata, 1999, Sarkar 
and Zeman, 2000). The different approaches 
vary according to the methods used and the 
number of SCFs being extracted. Regardless 
of this, there is a ceiling on the performance 
of these systems at around 80% token recall*. 
*Token recall is the percentage of SCF tokens in a 
sample of manually analysed text that were correctly 
acquired by the system. 
One significant source of error lies in the 
statistical filtering methods frequently used 
to remove noise from automatically acquired 
SCFs. These methods are reported to be par- 
ticularly unreliable for low frequency scFs 
(Brent, 1991, 1993; Briscoe and Carroll, 1997; 
Manning, 1993; Manning and Schiitze, 1999; 
Korhonen, Gorrell and McCarthy, 2000), re- 
sulting in a poor overall performance. 
According to Korhonen, Gorrell and Mc- 
Carthy (2000), the poor performance of sta- 
tistical filtering can be largely explained by 
the zipfian nature of the data, coupled with 
the fact that many statistical tests are based 
on the assumption of two zipfian distributions 
correlating: the conditional SCF distribution 
of an individual verb (p(scfilverbj)) and the 
unconditional SCF distribution of all verbs in 
general (p(scfl)). Contrary to this assump- 
tion, however, there is no significant correla- 
tion between the two distributions. 
Korhonen, Gorrell and McCarthy (2000) 
have showed that a simple method of filtering 
SCFs on the basis of their relative frequency 
performs more accurately than statistical fil- 
tering. This method sensitive to the sparse 
data problem is best integrated with smooth- 
ing. Yet the performance of the sophisticated 
smoothing techniques which back-off to an un- 
conditional distribution also suffer from the 
lack of correlation between p(scfi\[verbj) and p(scf0. 
In this paper, we propose a method for ob- 
taining more accurate back-off estimates for 
SCF acquisition. Taking Levin's verb classifi- 
cation (Levin, 1993) as a starting point, we 
show that in terms of SCF distributions, in- 
dividual verbs correlate better with other se- 
mantically similar verbs than with all verbs 
in general. On the basis of this observation, 
we propose classifying verbs according to their 
semantic class and using the conditional SCF 
distributions of a few other members in the 
216 
same class as back-off estimates of the class 
(p( sc filsernantic class j)). 
Adopting the SCF acquisition system of 
Briscoe and Carroll (1997) we report an ex- 
periment which demonstrates how these esti- 
mates can be used in filtering. This is done 
by acquiring the conditional SCF distributions 
for selected test verbs, smoothing these dis- 
tributions with the unconditional distribution 
of the respective verb class, and applying a 
simple method for filtering the resulting set 
of SCFs. Our results show that the proposed 
method improves the acquisition of SCFs sig- 
nificantly. We discuss how this method can be 
used to benefit large-scale SCF acquisition. 
We begin by reporting our findings that the 
SCF distributions of semantically similar verbs 
correlate well (section 2). We then introduce 
the method we adopted for constructing the 
back-~off estimates for the data used in our ex- 
periment (section 3.1), summarise the main 
features of the SCF acquisition approach (sec- 
tion 3.2), and describe the smoothing tech- 
niques adopted (section 3.3). Finally, we re- 
view the empirical evaluation (section 4) and 
discuss directions for future work (section 5). 
2 Examining SCF Correlation 
between Semantically Similar 
Verbs 
To examine the degree of SCF correlation 
between semantically similar verbs, we took 
Levin's verb classification (1993) as a start- 
ing point. Levin verb classes are based on 
the ability of a verb to occur in specific 
diathesis alternations, i.e. specific pairs of 
syntactic frames which are assumed to be 
meaning preserving. The classification pro- 
vides semantically-motivated sets of syntactic 
frames associated with individual classes. 
While Levin's shows that there is corre- 
lation between the SCFs related to the verb 
sense, our aim is to examine whether there is 
also correlation between the SCFs specific to 
the verb form. Unlike Levin, we are concerned 
with polysemic scF distributions involving all 
senses of verbs. In addition, we are not only 
interested in the degree of correlation between 
sets of SCFs, but also in comparing the rank- 
ing of SCFs between distributions. Neverthe- 
less, Levin classes provide us a useful starting 
point. 
Focusing on five broad Levin classes 
change of possession, assessment, killing, mo- 
tion, and destroy verbs - we chose four test 
verbs from each class and examined the de- 
gree with which the SCF distribution for these 
verbs correlates with the SCF distributions for 
two other verbs from the same Levin class. 
The latter verbs were chosen so that one of 
the verbs is a synonym, and the other a hyper- 
nym, of a test verb. We used WordNet (Miller 
et al., 1990) for defining and recognising these 
semantic relations. We defined a hypernym 
as a test verb's hypernym in WordNet, and a 
synonym as a verb which, in WordNet, shares 
this same hypernym with a test verb. We also 
examined how well the SCF distribution for 
the different test verbs correlates with the SCF 
distribution of all English verbs in general and 
with that of a semantically different verb (i.e. 
a verb belonging to a different Levin class). 
We used two methods for obtaining the 
scF distributions. The first was to acquire 
an unfiltered subcategorization lexicon for 20 
million words of the British National Corpus 
(BN¢) (Leech, 1992) data using Briscoe and 
Carroll's (1997) system (introduced in sec- 
tion 3.2). This gives us the observed distribu- 
tion of SCFs for individual verbs and that for 
all verbs in the BNC data. The second method 
was to manually analyse around 300 occur- 
rences of each test verb in the BNC data. This 
gives us an estimate of the correct SCF distri- 
butions for the individual verbs. The estimate 
for the correct distribution of SCFs over all 
English verbs was obtained by extracting the 
number of verbs which are members of each 
SCF class in the ANLT dictionary (Boguraev et 
al., 1987). 
The degree of correlation was examined by 
calculating the Kullback-Leibler distance (KL) 
(Cover and Thomas, 1991) and the Spearman 
rank correlation coefficient (Re) (Spearman, 
1904) between the different distributions 2. 
The results given in tables 1 and 2 were ob- 
tained by correlating the observed SCF distri- 
butions from the BNC data. Table 1 shows 
an example of correlating the SCF distribu- 
tion of the motion verb .fly against that of (i) 
its hypernym move, (ii) synonym sail, (iii) all 
verbs in general, and (iv) agree, which is not 
related semantically. The results show that 
the SCF distribution for .fly clearly correlates 
better with the SCF distribution for move and 
sail than that for all verbs and agree. The av- 
2Note that Io., >_ 0, with IO., near to 0 denoting 
strong association, and -1 _< RC < 1, with RC near to 
0 denoting a-low degree of association and ttc near to 
-1 and 1 denoting strong association. 
217 
1 I I KL I cl 
fly move 0.25 0.83 
.fly sail 0.62 0.61 
.fly all verbs 2.13 0.51 
.fly agree 2.27 0.12 
Table 1: Correlating the SCF distribution of.fly against other SCF distributions 
\[-' KL I RC 
hype~nym 0.65 0.71 
synonym 0.71 0.66 
all verbs 1.59 0.41 
semantically different verb 1.74 0.38 
Table 2: Overall 
erage results for all test verbs given in table 2 
indicate that the degree of SCF correlation is 
the best with semantically similar verbs. Hy- 
pernym and synonym relations are nearly as 
good, the majority of verbs showing slightly 
better SCF correlation with hypernyms. The 
SCF correlation between individual verbs, and 
verbs in general, is poor, but still better than 
with semantically unrelated verbs. 
These findings with the observed SCF dis- 
tributions hold as well with the correct SCF 
distributions, as seen in table 3. The results 
show that in terms of SCF distributions, verbs 
in all classes examined correlate better with 
their hypernym verbs than with all verbs in 
general. 
As one might expect, the polysemy of the 
individual verbs affects the degree of SCF cor- 
relation between semantically similar verbs. 
The degree of SCF correlation is higher with 
those verbs whose predominant 3 sense is in- 
volved with the Levin class examined. For 
example, the SCF distribution for the killing 
verb murder correlates better with that for 
the verb kill than that for the verb execute, 
whose predominant sense is not involved with 
killing verbs. 
These results show that the verb sense spe- 
cific SCF correlation observed by Levin ex- 
tends to the verb form specific SCF correlation 
and applies to the ranking of SCFs as well. 
This suggests that we can obtain more accu- 
rate back-off estimates for verbal acquisition 
by basing them on a semantic verb type. To 
find out whether such semantically motiwted 
SPredomlnant sense refers here to the most frequent 
sense of verbs in WordNet. 
correlation results 
estimates can be used to improve SCF acqui- 
sition, we performed an experiment which we 
describe below. 
3 Experiment 
3.1 Back-off Estimates for the Data 
The test data consisted of a total of 60 verbs 
from 12 broad Levin classes, listed in table 4. 
Two of the examined Levin classes were col- 
lapsed together with another similar Levin 
class, making the total number of test classes 
10. The verbs were chosen at random, sub- 
ject to the constraint that they occurred fre- 
quently enough in corpus data 4 and when ap- 
plicable, represented different sub-classes of 
each examined Levin class. To reduce the 
problem of polysemy, we required that the 
predominant sense of each verb corresponds to 
the Levin class in question. This was ensured 
by manually verifying that the most frequent 
sense of a verb in WordNet corresponds to the 
sense involved in the particular Levin class. 
To obtain the back-off estimates, we chose 
4-5 representative verbs from each verb class 
and obtained correct SCF distributions for 
these verbs by manually analysing around 300 
occurrences of each verb in the ant data. We 
merged the resulting set of SCF distributions 
to construct the unconditional SCF distribu- 
tion for the verb class. This approach was 
taken to minimise the sparse data problem 
and cover SCF variations within verb classes 
and due to polysemy. The bazk-off estimates 
4We required at least 300 occurrences for each verb. 
This was merely to guarantee accurate enough testing, 
as we evaluated our results against manual analysis of 
corpus data (see section 4). 
218 
Verb class 
change of possession 
assessment 
killing 
destroy 
motion 
AVERAGE 
Hypernym 
KL RC 
0.61 0.64 
0.28 0.71 
0.70 0.63 
0.30 0.60 
0.29 0.73 
0.44 0.66 
All Verbs 
KL RC 
1.16 0.38 
0.73 0.48 
1.14 0.37 
1.19 0.29 
1.72 0.42 
1.19 0.39 
Table 3: Correlation results for five verb classes 
Verb class 
putting 
sending and carrying, 
exerting force 
change of possession 
assessment, 
searching 
social interaction 
killing 
destroy 
appearance, disappearance 
and occurrence 
motion 
aspectuM 
Test verbs 
place, lay, drop, pour, load, fill 
send, ship, carry, bring, transport 
pull, push 
give, lend, contribute, donate, offer 
provide, supply, acquire, buy 
analyse 
fish, explore, investigate 
agree, communicate, struggle, marry, meet, visit 
kill, murder, slaughter, strangle 
demolish, destroy, ruin, devastate 
arise, emerge, disappear, vanish 
arrive, depart, march, move, slide, swing 
travel, walk, fly, sail, dance 
begin, end, start, terminate, complete 
Table 4: Test data 
for motion verbs, for example, were obtained 
by merging the SCF distributions of the verbs 
march, move, fly, slide and sail. Each verb 
used in obtaining the estimates was excluded 
when testing the verb itself. For example, 
when acquiring subcategorization for the verb 
fly, estimates were obtained only using verbs 
march, move, slide and sail. 
3.2 Framework for SCF Acquisition 
Briscoe and Carroll's (1997) verbal acquisition 
system distinguishes 163 SCFs and returns rel- 
ative frequencies for each SCF found for a given 
predicate. The SCFs are a superset of classes 
found in the ANLT and COMLEX (Grishman 
et al., 1994) dictionaries. They incorporate 
information about control of predicative ar- 
guments, as well as alternations such as ex- 
traposition and particle movement. The sys- 
tem employs a shallow parser to obtain the 
subcategorization information. Potential SCF 
entries are filtered before the final SCF lexi- 
con is produced. While Briscoe and Carroll 
(1997) used a statistical filter based on bi- 
nomial hypothesis test, we adopted another 
method, where the conditional SCF distribu- 
tion from the system is smoothed before fil- 
tering the SCFS, using the different techniques 
introduced in section 3.3. After smoothing, 
filtering is performed by applying a threshold 
to the resulting set of probability estimates. 
We used training data to find an optimal av- 
erage threshold for each verb class examined. 
This filtering method allows us to examine 
the benefits of smoothing without introducing 
problems based on the statistical filter. 
3.3 Smoothing 
3.3.1 Add One Smoothing 
Add one smoothing s has the effect of giving 
some of the probability space to the SCFs un- 
seen in the conditional distribution. As it as- 
sumes a uniform prior on events, it provides a 
baseline smoothing method against which the 
5See (Manning and Schiltze, 1999) for detailed in- 
formation about the smoothing techniques discussed 
here. 
219 
more sophisticated methods can be compared. 
Let c(x=) be the frequency of a SCF given a 
verb, N the total number of SCF tokens for 
this verb in the conditional distribution, and 
C the total number of SCF types. The esti- 
mated probability of the SCF is: 
P(xn) - c(xn) + 1 N + C (1) 
3.3.2 Katz Backing-off 
In Katz backing-off (Katz, 1987), some of the 
probability space is given to the SCFs unseen 
or of low frequency in the conditional distri- 
bution. This is done by backing-off to an un- 
conditional distribution. Let p(xn) be a prob- 
ability of a SCF in the conditional distribution, 
and p(xnv) its probability in the unconditional 
distribution, obtained by maximum likelihood 
estimation. The estimated probability of the 
scF is calculated as follows: 
P(xn)= { (1-d) xp(xn) ifc(x=) > cl c~ x p(xnp) otherwise (2) 
The cut off frequency ci is an empiri- 
cally defined threshold determining whether 
to back-off or not. When counts are lower 
than cl they are held too low to give an accu- 
rate estimate, and we back-off to an uncondi- 
tional distribution. In this case, we discount 
p(x~) a certain amount to reserve some of the 
probablity space for unseen and very low fre- 
quency scFs. The discount (d) is defined em- 
pirically, and a is a normalization constant 
which ensures that the probabilities of the re- 
sulting distribution sum to 1. 
3.3.3 Linear Interpolation 
While Katz backing-off consults different es- 
timates depending on their specificity, linear 
interpolation makes a linear combination of 
them. Linear interpolation is used here for the 
simple task of combining a conditional distri- 
bution with an unconditional one. The esti- 
mated probability of the SCF is given by 
P(xn) = Al(p(z,~)) + )~2(p(xnp)) (3) 
where the Ai denotes weights for differ- 
ent context sizes (obtained by optimising the 
smoothing performance on the training data 
for all zn) and sum to 1. 
4 Evaluation 
4.1 Method 
To evaluate the approach, we took a sample of 
20 million words of the BNC and extracted all 
sentences containing an occurrence of one of 
the 60 test verbs on average of 3000 citations 
of each. The sentences containing these verbs 
were processed by the SCF acquisition system, 
and the smoothing methods were applied be- 
fore filtering. We also obtained results for a 
baseline without any smoothing. 
The results were evaluated against a man- 
ual analysis of the corpus data. This was 
obtained by analysing up to a maximum of 
300 occurrences for each test verb in BN¢ 
or LOB (Garside et al., 1987), Susanne and 
SEC (Taylor and Knowles, 1988) corpora. We 
calculated type precision (percentage of SCFs 
acquired which were also exemplified in the 
manual analysis) and recall (percentage of the 
SCFs exemplified in the manual analysis which 
were also acquired automatically), and com- 
bined them into a single measure of overall 
performance using the F measure (Manning 
and Schiitze, 1999). 
F = 2. precision, recall (4) 
precision -4- recall 
We estimated accuracy with which the sys- 
tem ranks true positive classes against the cor- 
rect ranking. This was computed by calculat- 
ing the percentage of pairs of SCFs at posi- 
tions (n, m) such that n < m in the system 
ranking that occur in the same order in the 
ranking from the manual analysis. This gives 
us an estimate of the accuracy of the relative 
frequencies of SCFs output by the system. In 
addition to the system results, we also calcu- 
lated KL and Rc between the acquired unfil- 
tered SCF distributions and the distributions 
obtained from the manual analysis. 
4.2 Results 
Table 5 gives average results for the 60 test 
verbs using each method. The results indi- 
cate that both add one smoothing and Katz 
backing-off improve the baseline performance 
only slightly. Linear interpolation outper- 
forms these methods, achieving better results 
on all measures. The improved KL indicates 
that the method improves the overall accu- 
racy of SCF distributions. The results with 
rtc and system accuracy show that it helps 
to correct the ranking of SCFs. The fact 
220 
Method KL RC accuracy 
Baseline 0.63 0.72 79.2 
Add one 0.64 0.74 79.0 
Katz backing-off 0.61 0.75 79.0 
Linear interpolation 0.51 0.82 84.4 
System results (%) 
\]precision I recall F measure 
78.5 63.3 70.1 
85.3 59.7 70.2 
76.4 67.6 71.7 
87.8 68.7 77.1 
Table 5: Average results with different methods using semantically motivated back-off estimates 
for smoothing 
Method KL ac accuracy 
Baseline 0.63 0.72 79.2 
Katz backing-off 0.68 0.69 77.2 
Linear interpolation 0.79 0.64 76.7 
System results (%) 
I precision I recall IF measure 
78.5 63.3 70.1 
75.2 61.7 67.8 
71.4 64.1 67.6 
Table 6: Average results using the SCF distribution of all verbs as back-off estimates for smooth- 
ing 
that both precision and recall show clear im- 
provement over the baseline results demon- 
strates that linear interpolation can be suc- 
cessfully combined with the filtering method 
employed. These results seem to suggest that 
a smoothing method which affects both the 
highly ranked SCFs and SCFs of low frequency 
is profitable for this task. 
In this experiment, the semantically moti- 
vated back-off estimates helped to reduce the 
sparse data problem significantly. While a to- 
tal of 151 correct SCFs were missing in the test 
data, only three were missing after smoothing 
with Katz backing-off or linear interpolation. 
For comparison, we re-run these experi- 
ments using the general SCF distribution of 
all verbs as back-off estimates for smooth- 
ing 6. The average results for the 60 test verbs 
given in table 6 show that when using these 
estimates, we obtain worse results than with 
the baseline method. This demonstrates that 
while such estimates provide an easy solution 
to the sparse data problem, they can actually 
degrade the accuracy of verbal acquisition. 
Table 7 displays individual results for the 
different verb classes. It lists the results ob- 
tained with KL and Rc using the baseline 
method and linear interpolation with semanti- 
cally motivated estimates. Examining the re- 
sults obtained with linear interpolation allows 
us to consider the accuracy of the back-off es- 
6These estimates were obtained by extracting the 
number of verbs which are members of each SCF class 
in the ANLT dictionary. See section 2 for details. 
timates for each verb class. Out of ten verb 
classes, eight show improvement with linear 
interpolation, with both KL and Rc. However, 
two verb classes - aspectual verbs, and verbs 
of appearance, disappearance and occurrence 
- show worse results when linear interpolation 
is used. 
According to Levin (1993), these two verb 
classes need further classification before a full 
semantic account can be made. The prob- 
lem with aspectual verbs is that the class 
contains verbs taking sentential complements. 
As Levin does not classify verbs on basis 
of their sentential complement-taking proper- 
ties, more classification work is required be- 
fore we can obtain accurate SCF estimates for 
this type of verb. 
The problem with verbs of appearance is 
more specific to the verb class. Levin remarks 
that the definition of appearance verbs may 
be too loose. In addition, there are signifi- 
cant syntactic differences between the verbs 
belonging to the different sub-classes. 
This suggests that we should examine the 
degree of SCF correlation between verbs from 
different sub-classes before deciding on the fi- 
nal (sub-)class for which we obtain the es- 
timates. As the results with the combined 
Levin classes show, estimates can also be suc- 
cessfully built using verbs fromdifferent Levin 
classes, provided that the classes are similar 
enough. 
221 
RC 
Verb class baseline linear i. 
putting 0.68 0.70 
sending and carrying, 0.72 0.96 
exerting force 
change of possession 0.61 0.75 
assessment, 0.61 0.70 
searching 
social interaction 0.72 0.80 
killing 0.91 0.95 
destroy 0.70 0.97 
appearance, disappearance 
and occurrence 
0.91 
aspectual 
KL 
baseline linear i. 
0.70 0.66 
0.64 0.50 
0.61 0.60 
0.81 0.62 
0.65 0.58 
0.69 0.67 
0.95 0.2O 
0.14 0.17 0.83 
motion 0.66 0.58 0.56 0.66 
0.48 0.54 0.86 0.89 
Table 7: Baseline and linear interpolation results for the verb classes 
5 Conclusion 
In this paper, we have shown that the verb 
form specific SCF distributions of semantically 
similar verbs correlate well. On the basis 
of this observation, we have proposed using 
verb class specific back-off estimates in SCF 
acquisition. Employing the SCF acquisition 
framework of Briscoe and Carroll (1997), we 
have demonstrated that these estimates can 
be used to improve SCF acquisition signifi- 
cantly, when combined with smoothing and 
a simple filtering method. 
We have not yet explored the possibility 
of using the semantically motivated estimates 
with statistical filtering. In principle, this 
should help to improve the performance of the 
statistical methods which make use of back-off 
estimates. If filtering based on relative fre- 
quencies still achieves better results, it would 
be worth investigating ways of handling the 
low frequency data for integration with this 
method. As Korhonen, Gorrell and McCarthy 
(2000) discuss, any statistical filtering method 
would work better at low frequences than the 
one applied, since this simply disregards all 
low frequency SCFS. 
In addition to refining the filtering method, 
our future work will focus on integrating 
this approach with large-scale scF acquisition. 
This will involve (i) defining the set of seman- 
tic verb classes across the lexicon, (ii) obtain- 
ing back-off estimates for each verb class, and 
(iii) implementing a method capable of auto- 
matically classifying verbs to semantic classes. 
The latter can be done by linking the Word- 
Net synonym sets with semantic classes, using 
a similar method to that employed by Dorr 
(1997). With the research reported, verbs 
were classified to semantic classes according 
to their most frequent sense. While this ap- 
proach proved satisfactory, our future work 
will include investigating ways of addressing 
the problem of polysemy better. 
The manual effort needed for obtaining the 
back-off estimates was quite high for this pre- 
liminary experiment. However, our recent in- 
vestigation shows that the total number of se- 
mantic classes across the whole lexicon is un- 
likely to exceed 50. This is because many of 
the Levin classes have proved similar enough 
in terms of SCF distributions that they can be 
combined together. Therefore the additional 
effort required to carry out the proposed work 
seems justified, given the accuracy enhance- 
ment reported. 
6 Acknowledgements 
I thank Ted Briscoe and Diana McCarthy for 
useful comments on this paper. 

References 
Boguraev, B., Briscoe, E., Carroll, J., Carter, 
D. and Grover, C. 1987. The derivation of a 
grammatically-indexed lexicon from the Long- 
man Dictionary of Contemporary English. In 
Proceedings of the 25th Annual Meeting of 
the Association .for Computational Linguis- 
tics, Stanford, CA. 193-200. 
Brent, M. 1991. Automatic acquisition of 
subcategorization frames from untagged text. 
In Proceedings of the 29th Annual Meeting 
of the Association for Computational Linguis- 
tics, Berkeley, CA. 209-214. 
Brent, M. 1993. From grammar to lexicon: 
unsupervised learning of lexical syntax. Com- 
putational Linguistics 19.3: 243-262. 
Briscoe, E. and Carroll, J. 1993. Gener- 
alised probabilistic Lt~ parsing for unification- 
based grammars. Computational Linguistics 
19.1: 25-60. 
Briscoe, E.J. and J. Carroll 1997. Automatic 
extraction of subcategorization from corpora. 
In Proceedings of the 5th ACL Conf. on Ap- 
plied Natural Language Processing, Washing- 
ton, DC. 356-363. 
Briscoe, T., Carroll, J. and Korhonen, A. 
1997. Automatic extraction of subcategoriza- 
tion frames from corpora - a framework and 
3 experiments. Sparkle WP5 Deliverable. 
Available in http://www.ilc.pi.cnr.it/. 
Carroll, G. and Rooth, M. 1998. Valence 
induction with a head-lexicMized PCFG. In 
Proceedings of the 3rd Conference on Empir- 
ical Methods in Natural Language Processing, 
Granada, Spain. 
Cover, Thomas, M. and Thomas, J.A. 1991. 
Elements of Information Theory. Wiley- 
Interscience, New York. 
Dorr, B. 1997. Large-scale dictionary con- 
struction for foreign  tutoring and 
interlingual machine translation. Machine 
Translation 12.4: 271-325. 
Gahl, S. 1998. Automatic extraction of sub- 
corpora based on subcategorization frames 
from a part-of-speech tagged corpus. In Pro- 
ceedings of the COLING-ACL'98, Montreal, 
Canada. 
Grishman, R., Macleod, C. and Meyers, A. 
1994. Comlex syntax: building a computa- 
tional lexicon. In Proceedings of the Interna- 
tional Conference on Computational Linguis- 
tics, COLING-94, Kyoto, Japan. 268-272. 
Garside, R., Leech, G. and Sampson, G. 1987. 
The computational analysis of English: A 
corpus-based approach. Longman, London. 
Katz, S. M. 1987. Estimation of probabili- 
ties from sparse data for the  model 
component of speech recogniser. IEEE Trans- 
actions on Acoustics, Speech, and Signal Pro- 
cessing 35.3: 400-401. 
Korhonen, A., Gorrell, G. and McCarthy, 
D. 2000. Statistical filtering and subcatego- 
rization frame acquisition. In Proceedings of 
the Joint SIGDAT Conference on Empirical 
Methods in Natural Language Processing and 
Very Large Corpora, Hong Kong. 
Leech, G. 1992. 100 million words of English: 
the British NationM Corpus. Language Re- 
search 28(1): 1-13. 
Levin, B. 1993. English Verb Classes and 
Alternations. Chicago University Press, 
Chicago. 
Manning, C. 1993. Automatic acquisition of 
a large subcategorization dictionary from cor- 
pora. In Proceedings of the 31st Annual Meet- 
ing of the Association for Computational Lin- 
guistics, Columbus, Ohio. 235-242. 
Manning, C. and Schiitze, H. 1999. Founda- 
tions of Statistical Natural Language Process- 
ing. New York University, Ms. 
Miller, G., Beckwith, R., Felbaum, C., Gross, 
D. and Miller, K. 1993. Introduction to 
WordNet: An On-Line Lexical Database. 
ftp//clarity.princeton.edu/pub/WordNet/ 
5papers.ps. 
Sampson, G. 1995. English for the computer. 
Oxford, UK: Oxford University Press. 
Sarkar, A. and Zeman, D. 2000. Auto- 
matic Extraction of Subcategorization Frames 
for Czech. In Proceedings of the Inter- 
national Conference on Computational Lin- 
guistics, COLING-O0, Saarbrucken, Germany. 
691-697. 
Spearman, C. 1904. The proof and mea- 
surement of association between two things. 
American Journal of Psychology 15: 72-101. 
Taylor, L. and Knowles, G. 1988. Manual 
of information to accompany the SEC cor- 
pus: the machine-readable corpus of spoken 
English. University of Lancaster, UK, Ms. 
Ushioda, A., Ewns, D., Gibson, T. and 
Waibel, A. 1993. The automatic acquisition of 
frequencies of verb subcategorization frames 
from tagged corpora. In Boguraev, B. and 
Pustejovsky, J. eds. SIGLEX A CL Workshop 
on the Acquisition of Lexical Knowledge from 
Text. Columbus, Ohio: 95-106. 
