Statistical Filtering and Subcategorization Frame Acquisition 
Anna Korhonen and Genevieve Gorrell 
Computer Laboratory, University of Cambridge 
Pembroke Street, Cambridge CB2 3QG, UK 
alk23@cl, cam. ac. uk, genevieve, gorrell@netdecisions, co. uk 
Diana McCarthy 
School of Cognitive and Computing Sciences 
University of Sussex, Brighton, BN1 9QH, UK 
dianam@cogs, susx. ac. uk 
Abstract 
Research "into the automatic acquisition of 
subcategorization frames (SCFS) from corpora 
is starting to produce large-scale computa- 
tional lexicons which include valuable fre- 
quency information. However, the accuracy 
of the resulting lexicons shows room for im- 
provement. One significant source of error 
lies in the statistical filtering used by some re- 
searchers to remove noise from automatically 
acquired subcategorization frames. In this pa- 
per, we compare three different approaches to 
filtering out spurious hypotheses. Two hy- 
pothesis tests perform poorly, compared to 
filtering frames on the basis of relative fre- 
quency. We discuss reasons for this and con- 
sider directions for future research. 
1 Introduction 
Subcategorization information is vital for suc- 
cessful parsing, however, manual develop- 
ment of large subcategorized lexicons has 
proved difficult because predicates change be- 
haviour between subs, domains and 
over time. Additionally, manually devel- 
oped sucategorization lexicons do not provide 
the relative frequency of different SCFs for a 
given predicate, essential in a probabilistic ap- 
proach. 
Over the past years acquiring subcatego- 
rization dictionaries from textual corpora has 
become increasingly popular. The different 
approaches (e.g. Brent, !991, 1993; Ushioda 
et al., 1993; Briscoe and Carroll, 1997; Man- 
ning, 1993; Carroll and Rooth, 1998; Gahl, 
1998; Lapata, 1999; Sarkar and Zeman, 2000) 
vary largely according to the methods used 
and the number of SCFS being extracted. Re- 
gardless of this, there is a ceiling on the perfor- 
mance of these systems at around 80% token 
recall 1 
zWhere token recall is the percentage .of SCF to- 
kens in a sample of manually analysed text that were 
The approaches to extracting SCF informa- 
tion from corpora have frequently employed 
statistical methods for filtering (e.g. Brent, 
1993; Manning 1993; Briscoe and Carroll, 
1997; Lapata, 1999). This has been done to 
remove the noise that arises when dealing with 
naturally occurring data, and from mistakes 
made by the SCF acquisition system, for ex- 
ample, parser errors. 
Filtering is usually done with a hypothe- 
sis test, and frequently with a variation of 
the binomial filter introduced by Brent (1991, 
1993). Hypothesis testing is performed by for- 
mulating a null hypothesis, (H0), which is as- 
sumed true unless there is evidence to the con- 
trary. If there is evidence to the contrary, 
H0 is rejected and the alternative hypothe- 
sis (H1) is accepted. In SCF acquisition, H0 is 
that there is no association between a particu- 
lar verb (verbj) and a SCF (SCFi), meanwhile 
H1 is that there is such an association. For 
SCF acquisition, the test is one-tailed since H1 
states the direction of the association, a pos- 
itive correlation between verbj and scfi. We 
compare the expected probability of scfi oc- 
curring with verbj if H0 is true, to the ob- 
served probability of co-occurrence obtained 
from the corpus data. If the observed proba- 
bility is greater than the expected probability 
we reject Ho and accept H1, and if not, we 
retain H0. 
Despite the popularity of this method, it 
has been reported as problematic. Accord- 
ing to one account (Briscoe and Carroll, 1997) 
the majority of errors arise because of the sta- 
tistical filtering process, which is reported to 
be particularly unreliable for low frequency 
SCFs (Brent, 1993; Briscoe and Carroll, 1997; 
Manning, 1993; Manning and Schiitze, 1999). 
Lapata (1999) reported that a threshold on 
the relative frequencies produced slightly bet- 
ter results than those achieved with a Brent- 
correctly acquired by the system. 
199 
style binomial filter when establishing SCFs for 
diathesis alternation detection. Lapata deter- 
mined thresholds for each SCF using the fre- 
quency of the SCF in COMLEX Syntax dictio- 
nary (Grishman et al., 1994). 
Adopting the SCF acquisition system of 
Briscoe and Carroll, we have experimented 
with an alternative hypothesis test, the bi- 
nomial log-likelihood ratio (LLR) test (Dun- 
ning, 1993). Sarkar and Zeman (2000) have 
also used this test when filtering SCFs auto- 
matically acquired for Czech. This test has 
been recommended for use in NLP since it 
does not assume a normal distribution, which 
invalidates many other parametric tests for 
use with natural  phenomena. LLR 
can be used in a form (-2logA) which is 
X 2 distributed. Moreover, this asymptote is 
appropriate at quite low frequencies, which 
makes the hypothesis test particularly useful 
when dealing with natural  phenom- 
ena, where low frequency events are common- 
place. 
A problem with using hypothesis testing for 
filtering automatically acquired SCFs is ob- 
taining a good estimation of the expected oc- 
currence of scfi with verbj. This is often 
performed using the unconditional distribu- 
tion, that is the probability distribution over 
all SCFS, regardless of the verb. It is as- 
sumed that verbj must occur with scfi sig- 
nificantly more than is expected given this 
estimate. Our paper addresses the problem 
that the conditional distribution, dependent 
on the verb, and unconditional distribution 
are rarely correlated. Therefore statistical fil- 
ters which assume such correlation for H0 will 
be susceptible to error, 
In this paper, we compare the results of 
the Brent style binomial filter of Briscoe and 
Carroll and the LLR filter to a simple method 
which uses a threshold on the relative frequen- 
cies of the verb and SCF combinations. We 
do this within the framework of the Briscoe 
and Carroll SCF acquisition system, which is 
described in section 2.1. The details of the 
two statistical filters are described in section 
2.2, along with the details of the threshold ap- 
plied to the relative frequencies output from 
the SCF acquisition system. The details of the 
experimental evaluation are supplied in sec- 
tion 3. We discuss our findings in section 3.3 
and conclude with directions for future work 
(section 4). 
2 Method 
2.1 Framework for SCF Acquisition 
Briscoe and Carroll's (1997) verbal acquisition 
system distinguishes 163 SCFs and returns rel- 
ative frequencies for each SCF found for a given 
predicate. The SCFs are a superset of classes 
found in the Alvey NL Tools (ANLT) dictio- 
nary, Boguraev et al. (1987) and the COML~X 
Syntax dictionary, Grishman et al. (1994). 
They incorporate information about control 
of predicative arguments, as well as alterna- 
tions such as extraposition and particle move- 
ment. The system employs a shallow parser to 
obtain the subcategorization information. Po- 
tential SCF entries are filtered before the final 
SCF lexicon is produced. The filter is the only 
component of this system which we experi- 
ment with here. The three filtering methods 
which we compare are described below. 
2.2 Filtering Methods 
2.2.1 Binomial Hypothesis Test 
Briscoe and Carroll (1997) used a binomial 
hypothesis test (BHT) to filter the acquired 
SCFs. They applied BHT as follows. The sys- 
tem recorded the total number of sets of SCF 
cues (n) found for a given predicate, and the 
number of these sets for a given SCF (ra). The 
system estimated the error probability (pe) 
that a cue for a SCF (scfi) occurred with a 
verb which did not take scfi. pe was esti- 
mated in two stages, as shown in equation 1. 
Firstly, the number of verbs which are mem- 
bers of the target SCF in the ANLT dictionary 
were extracted. This number was converted 
to a probability of class membership by divid- 
ing by the total number of verbs in the dic- 
tionary. The complement of this probability 
provided an estimate for the probability of a 
verb not taking scfi. Secondly, this proba- 
bility was multiplied by an estimate for the 
probability of observing the cue for scfi. This 
was estimated using the number of cues for i 
extracted from the Susanue corpus (Sampson, 
1995), divided by the total number of cues. 
pe = (1 - Iverbsl    i  cZass il I eSlc e l, for il (1) 
The probability of an event with probability p 
happening exactly rn times out of n attempts 
is given by the following binomial distribution: 
20O 
n~ P(m,n,p) = m!(n- m)! pro(1 _p)n-m (2) 
The probability of the event happening m or 
more times is: 
= (3) 
k=rn 
Finally, P(m+, n,p e) is the probability that 
m or more occurrences of cues for scfi will oc- 
cur with a verb which is not a member ofscfi, 
given n occurrences of that verb. A threshold 
on this probability, P(m+,n, pe), was set at 
less than or equal to 0.05. This yielded a 95% 
or better confidence that a high enough pro- 
portion of cues for scfi have been observed for 
the verb to be legitimately assigned scfi. 
Other approaches which use a binomial fil- 
ter differ in respect of the calculation of the 
error probability. Brent (1993) estimated the 
error probabilities for each SCF experimen- 
tally from the behaviour of his SCF extrac- 
tor, which detected simple morpho-syntactic 
cues in the corpus data. Manning (1993) in- 
Creased the number of available cues at the ex- 
pense of the reliability of these cues. To main- 
tain high levels of accuracy, Manning applied 
higher bounds on the error probabilities for 
certain cues. These bounds were determined 
experimentally. A similar approach was taken 
by Briscoe, Carroll and Korhonen (1997) in a 
modification to the Briscoe and Carroll sys- 
tem. The overall performance was increased 
by changing the estimates of pe according to 
the performance of the system for the target 
SCF. In the work described here, we use the 
original BHT proposed by Briscoe and Carroll. 
2.2.2 The Binomial Log Likelihood 
Ratio as a Statistical Filter 
Dunning (1993) demonstrates the benefits of 
the LLR statistic, compared to Pearson's chi- 
squared, on the task of ranking bigram data. 
The binomial log-likelihood ratio test is 
simple to calculate. For each verb and SCF 
combination four counts are required. These 
are the number of times that: 
1. the target verb occurs with the target SCF 
(kl) 
2. the target verb occurs with any other SCF (nl 
- kl) 
3. any other verb occurs with the target SCF 
(k2) 
4. any other verb occurs with any other SCF 
- k2) 
The statistic -21ogA is calculated as follows:- 
log-likelihood = 
where 
2\[logL(pl, kl, nl ) 
+logL(p2, k2, n2) 
-logL(p, kl, nl) 
-logL(p, k2, n2) \] (4) 
logL(p, n, k) = k x logp + (n - k) x log(1 -p) 
and 
kl k2 kl + k2 Pl=--, P2------, P-- 
nl n2 nl -4- n2 
The LLR statistic provides a score that re- 
flects the difference in (i) the number of bits 
it takes to describe the observed data, using 
pl = p(SCFIverb ) and p2 = p(SCFl-~verb ), 
and (ii) the number of bits it takes to de- 
scribe the expected data using the probability 
p = p(scFlany verb). 
The LLR statistic detects differences be- 
tween pl and p2. The difference could 
potentially be in either direction, but we are 
interested in LLRS where pl > p2, i.e. where 
there is a positive association between the SCF 
and the verb. For these cases, we compared 
the value of -2logA to the threshold value 
obtained from Pearson's Chi-Squared table, 
to see if it was significant at the 95% level 2. 
2.2.3 Using a Threshold on the 
Relative Frequencies as a 
Baseline 
In order to examine the baseline performance 
of this system without employing any notion 
of the significance of the observations, we 
used a threshold on relative frequencies. This 
was done by extracting the SCFS, and rank- 
ing them in the order of the probability of 
their occurrence with the verb. The probabil- 
ities were estimated using a maximum likeli- 
hood estimate (MLE) from the observed rela- 
tive frequencies. A threshold, determined em- 
pirically, was applied to these probability esti- 
mates to filter out the low probability entries 
for each verb. .... 
2See (Gorrell, 1999) for details of this" method. 
201 
3 Evaluation 
3.1 Method 
To evaluate the different approaches, we took 
a sample of 10 million words of the BNC cor- 
pus (Leech, 1992). We extracted all sentences 
containing an occurrence of one of fourteen 
verbs 3. The verbs were chosen at random, 
subject to the constraint that they exhibited 
multiple complementation patterns. After the 
extraction process, we retained 3000 citations, 
on average, for each verb. The sentences con- 
taining these verbs were processed by the SCF 
acquisition system, and then we applied the 
three filtering methods described above. We 
also obtained results for a baseline without 
any filtering. 
The results were evaluated against a man- 
ual analysis of corpus data 4. This was ob- 
tained by analysing up to a maximum of 300 
occurrences for each of the 14 test verbs in 
LOB (Garside et al., 1987), Susanne and SEC 
(Taylor and Knowles, 1988) corpora. Follow- 
ing Briscoe and Carroll (1997), we calculated 
precision (percentage of SCFS acquired which 
were also exemplified in the manual analysis) 
and recall (percentage of the SCFs exemplified 
in the manual analysis which were acquired 
automatically). We also combined precision 
and recall into a single measure of overall per- 
formance using the F measure (MA.nniug and 
Schiitze, 1999). 
F = 2.precision. recall (5) 
precision + recall 
3.2 Results 
Table 1 gives the raw results for the 14 verbs 
using each method. It shows the number of 
true positives (TP), .false positives (FP), and 
.false negatives (FN), as determined accord- 
ing to the manual analysis. The results for 
high frequency SCFs (above 0.01 relative fre- 
quency), medium frequency (between 0.001 
and 0.01) and low frequency (below 0.001) 
SCFs are listed respectively in the second, 
3These verbs were ask, begin, believe, cause, expect, 
find, give, help, like, move, produce, provide, seem, 
swing. 
4The importance of the manual analysis is outlined 
in Briscoe and Carroll (1997). We use the same man- 
ual analysis as Briscoe and Carroll, Le. one from the 
Susanne, LOB, and SEC corpora. A manual analysis of 
the BNC data might produce better results. However, 
since the BNC is a heterogeneous corpus we felt it was 
reasonable to test the data on a different corpus, which 
is also heterogeneous. 
third and fourth columns, and the final col- 
umn includes the total results for all frequency 
ranges. 
Table 2 shows precision and recall for the 14 
verbs and the F measure, which combines pre- 
cision and recall. We also provide the baseline 
results, if all SCFs were accepted. 
From the results given in tables 1 and 2, the 
MLE approach outperformed both hypothesis 
tests. For both BHT and LLR there was an 
increase in FNs at high frequencies, and an 
increase in FPs at medium and low frequen- 
cies, when compared to MLE. The number of 
errors was typically larger for LLR than BHT. 
The hypothesis tests reduced the number of 
FNS at medium and low frequencies, however, 
this was countered by the substantial increase 
in FPs that they gave. While BHT nearly al- 
ways acquired the three most frequent SCFs of 
verbs correctly, LLR tended to reject these. 
While the high number of FNS can be ex- 
plained by reports which have shown LLR to 
be over-conservative (Ribas, 1995; Pedersen, 
1996), the high number of FPs is surprising. 
Although theoretically, the strength of LLR 
lies in its suitability for low frequency data, 
the results displayed in table 1 do not suggest 
that the method performs better than BHT on 
low frequency frames. 
MLE thresholding produced better results 
than the two statistical tests used. Preci- 
sion improved considerably, showing that the 
classes occurring in the data with the high- 
est frequency are often correct. Although MLE 
thresholding clearly makes no attempt to solve 
the sparse data problem, it performs better 
than BHT or LLR overall. MLE is not adept at 
finding low frequency SCFS, however, the other 
methods are problematic in that they wrongly 
accept more than they correctly reject. The 
baseline, of accepting all SCFS, obtained a high 
recall at the expense of precision. 
3.3 Discussion 
Our results indicate that MLE outperforms 
both hypothesis tests. There are two explana- 
tions for this, and these are jointly responsible 
for the results. 
Firstly, the SCF distribution is zipfian, as 
are many distributions concerned with nat- 
ural  (Manning and Schiitze, 1999). 
Figure 1 shows the conditional distribution 
for the verb find. This ~mf~ltered SCF prob- 
ability distribution was obtained from 20 M 
words of BNC data output from the SCF sys- 
202 
High Freq 
TP FP FN 
BHT 75 29 23 
LLR 66 30 32 
MLE 92 31 6 
Medium Freq Low Freq 
TP FP FN TP FP I FN 
11 37 31 4 23 15 
9 52 33 2 23 17 
0 0 42 0 0 19 
Totals 
TP FP I FN 
m 90 89 69 
77 105 82 
92 31 67 
Table 1: Raw results for 14 test verbs 
~r31ff.t: Precision % Recall % F measure 
BHT 50.3 56.6 53.3 
LLR 42.3 48.4 45.1 
MLE 74.8 57.8 65.2 
baseline 24.3 83.5 37.6 
Table 2: Precision, Recall, and F measure 
0.1 
0.01 
& 
0.001 
0.0001 
......... i ........ !. 
o 
I , r i i , , i,l , , i i i i , 
10 100 
rank 
0.1 
0.01 
o.oo~ 
0.01~ 
10 4 
\ 
, , , , t ,,r , i , , i , ,,1 
10 100 
rank 
Figure 1: Hypothesised SCF distribution for 
find 
tern. The unconditional distribution obtained 
from the observed distribution of SCFs in the 
20 M words of BNC is shown in figure 2. The 
figures show SCF rank on the X-axis versus 
SCF frequency on the Y-axis, using logarith- 
mic scales. The line indicates the closest Zipf- 
like power law fit to the data. 
Secondly, the hypothesis tests make the 
false assumption (H0) that the unconditional 
and conditional distributions are correlated. 
The fact that a significant improvement in 
performance is made by correcting the prior 
probabilities according to the performance of 
the system (Briscoe, Carroll and Korhonen, 
Figure 2: Hypothesised unconditional SCF dis- 
tribution 
1997) suggests the discrepancy between the 
unconditional and the conditional distribu- 
tions. 
We examined the correlation between the 
manual analysis for the 14 verbs, and the 
unconditional distribution of verb types over 
all SCFs estimated from the ANLT using the 
Spearman Rank Correlation Coefficient. The 
results included in table 3 show that only a 
moderate correlation was found averaged over 
all verb types. 
Both LLR and BHT work by comparing the 
observed value of p(scfi\[verbj) to that ex- 
pected by chance. They both use the observed 
203 
\[ Verb Rank Correlation 
ask 0.10 
begin 0.83 
believe 0.77 
cause 0.19 
expect 
find 
0.45 
0.33 
give 0.06 
help 0.43 
like 0.56 
move 0.53 
produce 0.95 
provide 0.65 
seem 0.16 
swing Average 0.50 
0.47 
Table 3: Rank correlation between the condi- 
tional SCF distributions of the test verbs and 
the unconditional distribution 
value for p(sc.filverbj) from the system's out- 
put, and they both use an estimate for the un- 
conditional probability distribution (p(scfi)) 
for estimating the expected probability. They 
differ in the way that the estimate for the un- 
conditional probability is obtained, and the 
way that it is used in hypothesis testing. 
For BHT, the null hypothesis is that the ob- 
served value ofp(scfiIverbj) arose by chance, 
because of noise in the data. We estimate the 
probability that the value observed could have 
arisen by chance using p(m+, n,pe), pe is cal- 
culated using: 
• the SCF acquisition system's raw (until- 
tered) estimate for the unconditional dis- 
tribution, which is obtained from the Su- 
sanne corpus and 
• the ANLT estimate of the unconditional 
distribution of a verb not taking scf~, 
across all SCFs 
For LLR, both the conditional (pl) and un- 
conditional distributions (p2) are estimated 
from the BNC data. The unconditional proba- 
bility distribution uses the occurrence of scfi 
with any verb other than our target. 
The binomial tests look at one point in the 
SCF distribution at a time, for a given verb. 
The expected value is determined using the 
unconditional distribution, on the assumption 
that if the null hypothesis is true then this dis- 
tribution will correlate with the conditional 
distribution. However, this is rarely the case. 
Moreover, because of the zipfian nature of 
the distributions, the frequency differences at 
any point can be substantial. In these exper- 
iments, we used one-tailed tests because we 
were looking for cases where there was a pos- 
itive association between the SCF and verb, 
however, in a two-tailed test the null hypoth- 
esis would rarely be accepted, because of the 
substantial differences in the conditional and 
unconditional distributions. 
A large number of false negatives occurred 
for high frequency SCFs because the probabil- 
ity we compared them to was too high. This 
probability was estimated from the combina- 
tion of many verbs genuinely occurring with 
the frame in question, rather than from an es- 
timate of background noise from verbs which 
did not occur with the frame. We did not use 
an estimate from verbs which do not take the 
SCF, since this would require a priori knowl- 
edge about the phenomena that we were en- 
deavouring to acquire automatically. For LLR 
the unconditional probability estimate (p2) 
was high, simply because this SCF was a com- 
mon one, rather than because the data was 
particularly noisy. For BHT, R e was likewise 
too high as the SCF was also common in the 
Susanne data. The ANLT estimate went some- 
way to compensating for this, thus we ob- 
tained fewer false negatives with BHT than 
LLR. 
A large number of false positives occurred 
for low frequency SCFs because the estimate 
for p(scf) was low. This estimate was more 
readily exceeded by the conditional estimate. 
For BHT false positives arose because of the 
low estimate of p(scf) (from Susanne) and 
because the estimate of p(-,SCF) from ANLT 
did not compensate enough for this. For LLR, 
there was no mean~ to compensate for the fact 
that p2 was lower than pl. 
In contrast, MLE did not compare two dis- 
tributions. Simply rejecting the low frequency 
data produced better results overall by avoid- 
ing the false positives with the low frequency 
data, and the false negatives with the high 
frequency data. 
4 Conclusion 
This paper explored three possibilities for fil- 
tering out the SCF entries produced by a SCF 
acquisition system. These were (i) a version 
of Brent's binomial filter, commonly used for 
this purpose, (ii) the binomial log-likelihood 
204 
ratio test, recommended for use with low fre- 
quency data and (iii) a simple method using 
a threshold on the MLEs of the SCFS output 
from the system. Surprisingly, the simple MLE 
thresholding method worked best. The BHT 
and LLR both produced an astounding mlm- 
ber of FPs, particularly at low frequencies. 
Further work on handling low frequency 
data in SCF acquisition is warranted. A non- 
parametric statistical test, such as Fisher's ex- 
act test, recommended by Pedersen (1996), 
might improve on the results obtained using 
parametric tests. However, it seems from our 
experiments that it would be better to avoid 
hypothesis tests that make use of the uncon- 
ditional distribution. 
One possibility is to put more effort into the 
estimation of pe, and to avoid use of the un- 
conditional distribution for this. In some re- 
cent experiments, we tried optimising the es- 
timates for pe depending on the performance 
of the system for the target SCF, using the 
method proposed by Briscoe, Carroll and Ko- 
rhonen (1997). The estimates of pe were ob- 
tained from a training set separate to the held- 
out BNC data used for testing. Results using 
the new estimates for pe gave an improvement 
of 10% precision and 6% recall, compared to 
the BHT results reported here. Nevertheless, 
the precision result was 14% worse for preci- 
sion than MLE, though there was a 4% im- 
provement in recall, making the overall per- 
formance 3.9 worse than MLE according to the 
F measure. Lapata (1999) also reported that 
a simple relative frequency cut off produced 
slightly better results than a Brent style BHT. 
If MLE thresholding persistently achieves 
better results, it would be worth investi- 
gating ways of handling the low frequency 
data, such as smoothing, for integration with 
this method. However, more sophisticated 
smoothing methods, which back-off to an un- 
Conditional distribution, will also suffer from 
the lack of correlation between conditional 
and unconditional SCF distributions. Any sta- 
tistical test would work better at low frequen- 
cies than the MLE, since this simply disregards 
all low frequency SCFs. In our experiments, ff 
we had used MLE only for the high frequency 
data, and BHT for medium and low, then over- 
all we would have had 54% precision and 67% 
recall. It certainly seems worth employing hy- 
pothesis tests which do not rely on the un- 
conditional distribution for the low frequency 
SCFS. 
5 Acknowledgements 
We thank Ted Briscoe for many helpful dis- 
cussions and suggestions concerning this work. 
We also acknowledge Yuval Krymolowski for 
useful comments on this paper. 

References 
Boguraev, B., Briscoe, E., Carroll, J., Carter, 
D. and Grover, C. 1987. The derivation of a 
grammatically-indexed lexicon from the Long- 
man Dictionary of Contemporary English. In 
Proceedings of the 25th Annual Meeting of 
the Association for Computational Linguis- 
tics, Stanford, CA. 193-200. 
Brent, M. 1991. Automatic acquisition of 
subcategorization frames from untagged text. 
In Proceedings of the 29th Annual Meeting 
of the Association for Computational Linguis- 
tics, Berkeley, CA. 209-214. 
Brent, M. 1993. From gra.mmar to lexicon: 
unsupervised learning of lexical syntax. Com- 
putational Linguistics 19.3: 243-262. 
Briscoe, E.J. and J. Carroll 1997. Automatic 
extraction of subcategorization from corpora. 
In Proceedings of the 5th ACL Conf. on Ap- 
plied Nat. Lg. Proc., Washington, DC. 356- 
363. 
Briscoe, E., Carroll, J. and Korhonen, A. 
1997. Automatic extraction of subcategoriza- 
tion frames from corpora - a framework and 
3 experiments. '97 Sparkle WP5 Deliverable, 
available in http://www.ilc.pi.cnr.it/. 
Carroll, G. and Rooth, M. 1998. Valence 
induction with a head-lexicalized PCFG. In 
Proceedings of the 3rd Conference on Empir- 
ical Methods in Natural Language Processing, 
Granada, Spain. 
Dunning, T. 1993. Accurate methods for the 
Statistics of Surprise and Coincidence. Com- 
putational Linguistics 19.1: 61-74. 
Gahl, S. 1998. Automatic extraction of sub- 
corpora based on subcategorization frames 
from a part-of-speech tagged corpus. In Pro- 
ceedings of the COLING-A CL'98, Montreal, 
Canada. 
Garside, R., Leech, G. and Sampson, G. 1987. 
The computational analysis of English: A 
corpus-based approach. Longman, London. 
Gorrell, G. 1999. Acquiring Subcategorisation 
from Textual Corpora. MPhil dissertation, 
University of Cambridge, UK. 
205 
Grishman, R., Macleod, C. and Meyers, A. 
1994. Comlex syntax: building a computa- 
tional lexicon. In Proceedings of the Interna- 
tional Conference on Computational Linguis- 
tics, COLING-94, Kyoto, Japan. 268-272. 
Lapata, M. 1999. Acquiring lexical gener- 
alizations from corpora: A case study for 
diathesis alternations. In Proceedings of the 
37th Annual Meeting of the Association for 
Computational Linguistics, Maryland. 397- 
404. 
Leech, G. 1992. 100 million words of English: 
the British National Corpus. Language Re- 
search 28(1): 1-13. 
Manning, C. 1993. Automatic acquisition of 
a large subcategorization dictionary from cor- 
pora. In Proceedings of the 31st Annual Meet- 
ing of the Association .for Computational Lin- 
guistics, Columbus, Ohio. 235-242. 
Manning, C. and Schiitze, H. 1999. Founda- 
tions of Statistical Natural Language Process- 
ing. MIT Press, Cambridge MA. 
Pedersen, T. 1996. Fishing for Exactness. In 
Proceedings of the South-Central SAS Users 
Group Conference SCSUG-96, Austin, Texas. 
Ribas, F. 1995. On Acquiring Appropriate Se- 
lectional Restrictions from Corpora Using a 
Semantic Taxonomy. Ph.D thesis, University 
of Catalonia. 
Sampson, G. 1995. English for the computer. 
Oxford University Press, Oxford UK. 
Sarkar, A. and Zeman, D. 2000. Auto- 
matic Extraction of Subcategorization Frames 
for Czech. In Proceedings of the Inter- 
national Conference on Computational Lin- 
guistics, COLING-O0, Saarbrucken, Germany. 
691-697. 
Taylor, L. and Knowles, G. 1988. Manual 
of information to accompany the SEC cor- 
pus: the machine-readable corpus of spoken 
English. University of Lancaster, UK, Ms. 
Ushioda, A., Evans, D., Gibson, T. and 
Waibel, A. 1993. The automatic acquisition of 
frequencies of verb subcategorization frames 
from tagged corpora. In Boguraev, B. and 
Pustejovsky, J. eds. SIGLEX A CL Workshop 
on the Acquisition of Lexieal Knowledge .from 
Text. Columbus, Ohio: 95-106. 
