Combination of Symbolic and Statistical Approaches for 
Grammatical Knowledge Acquisition 
Masaki KIYONO* and Jun'ichi TSUJII 
Centre for Computational Linguistics 
University of Manchester Institute of Science and Technology 
PO Box 88, Manchester, M60 1QD, United Kingdom 
kiyono©ccl .umist. ac.uk, tsuj ii~ccl .umist. ac .uk 
Abstract 
The framework we adopted for customiz- 
ing linguistic knowledge to individual ap- 
plication domains is an integration of sym- 
bolic and statistical approaches. In or- 
der to acquire domain specific knowledge, 
we have previously proposed a rule-based 
mechanism to hypothesize missing knowl- 
edge from partial parsing results of unsuc- 
cessfully parsed sentences. In this paper, 
we focus on the statistical process which se- 
lects plausible knowledge from a set of hy- 
potheses generated from the whole corpus. 
In particular, we introduce two statistical 
measures of hypotheses, Local Plausibility 
and Global Plausibility, and describe how 
these measures are determined iteratively. 
The proposed method will be incorporated 
into the tool kit for linguistic knowledge 
acquisition which we are now developing. 
1 Introduction 
Current technologies in natural language process- 
ing are not so mature as to make general purpose 
systems applicable to any domains; therefore rapid 
customization of linguistic knowledge to the sub- 
language of an application domain is vital for the 
development of practical systems. In the currently 
working systems, such customization has been car- 
ried out manually by linguists or lexicographers with 
time-consuming effort. 
We have already proposed a mechanism which 
acquires sublanguage-specific linguistic knowledge 
from parsing failures and which can be used as a 
tool for linguistic knowledge customization (Kiyono 
and Tsujii, 1993; Kiyono and Tsujii, 1994). Our ap- 
proach is characterized by a mixture of symbolic and 
statistical approaches to grammatical knowledge ac- 
quisition. Unlike probabilistic parsing, proposed by 
(Fujisaki et al., 1989; Briscoe and Carroll, 1993), 
*also a staff member of Matsushita Electric Industrial 
Co.,Ltd., Shinagawa, Tokyo, JAPAN. 
72 
which assumes the prior existence of comprehensive 
linguistic knowledge, our system can suggest new 
pieces of knowledge including CFG rules, subcate- 
gorization frames, and other lexical features. It also 
differs from previous proposals on lexical acquisi- 
tion using statistical measures such as (Church et 
al., 1991; Brent, 1991; Brown et al., 1993) which ei- 
ther deny the prior existence of linguistic knowledge 
or use linguistic knowledge in ad hoc ways. 
Our system consists of two components: (1) the 
rule-based component, which detects incompleteness 
of the existing knowledge and generates a set of hy- 
potheses of new knowledge and (2) the corpus-based 
component which selects plausible hypotheses on the 
basis of their statistical behaviour. As the rule-based 
component has been explained in our previous pa- 
pers, in this paper we focus on the corpus-based 
component. 
After giving a brief explanation of the framework, 
we describe a data structure called Hypothesis Graph 
which plays a crucial role in the corpus-based pro- 
cess, and then introduce two statistical measures of 
hypotheses, Global Plausibility and Local Plausibil- 
ity, which are iteratively determined to select a set 
of plausible hypotheses. An experiment which shows 
the effectiveness of our method is also given. 
2 The System Organization 
2.1 Hypothesis Generation 
Figure 1 shows the framework of our system. When 
the parser fails to analyse a sentence, the Hypothe- 
sis Generator (HG) produces hypotheses of missing 
knowledge each of which could rectify the defects of 
the current grammar. As the parser is a sort of Chart 
Parser and maintains partial parsing results in the 
form of inactive and active edges, a parsing failure 
means that no inactive edge of category S spanning 
the whole sentence exists. 
The HG tries to introduce an inactive edge of S by 
making hypotheses of missing linguistic knowledge. 
It generates hypotheses of rewriting rules which col- 
lect existing sequences of inactive edges into an ex- 
pected category. It also calls itself recursively to in- 
Sentence 
I Parser 
( Parsing Result) 
Generator 
( Hypotheses 
-  Corpus ) 
--  Grammar ) 
Human t lnteraction 
Plausible 
Hypotheses ) ÷ 
Hypothesis 
Selector 
 Hypothesis DB) 
Rule-Based Component Corpus-Based Component 
Figure 1: Framework of Grammar Acquisition 
troduce necessary inactive edges for each rule of the 
expected category whose application is prevented 
due to the lack of necessary inactive edges. The 
simplest form of the algorithm is shown below. 
\[Algorithm\] An inactive edge \[ie(A) : xo, xn\] can 
be introduced, with label A, between word po- 
sitions x0 and xn by each of the hypotheses gen- 
erated from the following two steps. 
\[Step 1\] For each sequence of inactive edges, 
\[~e(Bl) : x0, Xl\], ..., \[ie(Bn): Xn-l,Xn\], 
spanning from x0 to Xn, generates a new 
rule. 
A ==~ B1,.-.,B, 
\[Step 2\] For each existing rule of form A ::V 
A1, • • -, An, finds an incomplete sequence 
of inactive edges, \[ie(A1) : xo, xl\], ..., 
\[ie(A~_l) : x~-2, xi-1\], \[ie(Ai+l) : xi, xi+l\], 
..., \[ie(An) : xn-1, xn\], and calls this algo- 
rithm for \[ie(Ai) : xi-1, xi\]. 
This algorithm has been further augmented in or- 
der to treat sentences which contain more than one 
construction not covered by the current version of 
the grammar and to generate hypotheses concern- 
ing complex features like subcategorization frames. 
2.2 Hypothesis Filtering 
The greater number of the hypotheses generated by 
the algorithm are linguistically unnatural, because 
the algorithm does not embody any linguistic prin- 
ciple to judge the appropriateness of hypotheses, and 
therefore we introduced a set of criteria to filter out 
unnatural hypotheses (Kiyono and Tsujii, 1993; Kiy- 
ono and Tsujii, 1994). This includes, for example, 
• The maximum number of daughter constituents 
of a rule is set to 3. 
• Supposing that the current version of the gram- 
mar contains all the category conversion rules, 
a unary rule with one daughter constituent is 
not generated. 
• Using generalizations embodied in the current 
version of the grammar, a rule containing a se- 
quence of constituents which can be collected 
into a larger constituent by the current version 
of grammar is not generated. 
• Distinguishing non-lexical categories from lexi- 
cal categories, a rule whose mother category is 
a lexical category is not generated. 
These criteria significantly reduce the number of 
hypotheses to be generated. 
2.3 Hypothesis Graph 
As the criteria which the HG uses to filter out un- 
natural hypotheses are solely based on the forms of 
hypotheses, they cannot identify the "correct" hy- 
potheses on their own. The correct ones are rather 
chosen by the Hypothesis Selector (HS), which re- 
sorts to examining the statistical behaviour of hy- 
potheses throughout a given corpus. 
A straightforward method is to count the fre- 
quency of hypotheses, but this simple method does 
not work, because hypotheses are not independent of 
each other. A hypothesis is either competing with or 
complemenlary to other hypotheses generated from 
the same sentence. A group of hypotheses generated 
for restoring the same inactive edge constitutes a set 
of competing hypotheses and only one of them con- 
tributes to the correct structure of the sentence. On 
the other hand, two groups of hypotheses which are 
generated to treat two different parts of the same 
sentence stand in complementary relationships. 
A hypothesis should be recognized as being cor- 
rect, only when no other competing hypothesis is 
more plausible. That is, even if a hypothesis is gen- 
erated frequently, it should not be chosen as the 
correct one, if more plausible competing hypothe- 
ses are always generated together with it. On the 
other hand, even if a hypothesis is generated only 
once, it should be chosen as the correct one, if there 
is no other competing hypothesis. 
In order to realize the above conception, the HS 
maintains mutual relationships among hypotheses 
as an AND-OR graph. In a graph, AND nodes 
and OR nodes express complementary relationships 
and competing relationships, respectively. A node is 
shared, when different recursion steps in the HG try 
to restore the same inactive edge. Figure 2 shows the 
AND-OR graph for the hypotheses generated from 
the sentence '~Failing students looked embarrassed" 
when the current version of grammar does not con- 
tain rules for participles. The top node is an AND 
node which has two groups of hypotheses that treat 
two different parts of the sentence, i.e. "failing stu- 
dents" and "looked embarrased". 
73 
Sentence: Failing students looked embarrassed. 
HPI: NP =~ VP, NP ("failing students") 
HP2: ADJ =¢, \[failing\] 
HP3: VP ~ VP, VP ("looked embarrassed") 
HP4: ADV ~ \[embarrassed\] 
HP5: N ~ \[embarrassed\] 
HP6: ADJ ~ \[embarrassed\] 
Figure 2: AND-OR Graph of Hypotheses 
3 Statistical Analysis 
3.1 Two Measures of Plausibility 
The HS uses two measures of plausibility of hypothe- 
ses. One is computed for an instance hypothesis and 
the other is for a generic hypothesis. (See 3.3 for the 
relationship between the two types of hypotheses.) 
(1) Local Plausibility: This value shows how 
plausible an instance hypothesis is as grammat- 
ical knowledge to contribute to the correct anal- 
ysis of a unsuccessfully parsed sentence. 
(2) Global Plausibility: This value shows how 
plausible the hypothesis of the generic form is 
as grammatical knowledge to be acquired. 
As we describe in the following section, the Lo- 
cal Plausibility (LP) of an instance hypothesis is 
computed on the basis of the values of the Global 
Plausibility (GP) of the generic hyoptheses which 
are linked to instance hypotheses in the same hy- 
pothesis graph. On the other hand, the GP of a 
generic hypothesis is computed from the LP values 
of its instance hypotheses across the whole corpus. 
Intuitively speaking, the GP of a generic hypoth- 
esis is high if its instances are frequently generated 
and if they receive high LP values, while the LP of 
a instance hypothesis is high if the GP of the cor- 
responding generic hypothesis is high and if the GP 
values of the generic hypotheses corresponding to its 
competing hypotheses are low. Because of this mu- 
tual dependence between LP and GP, they cannot 
be computed in a single step but rather computed 
iteratively by repeating the following steps until the 
halt condition is satisfied. 
\[Step 1\] Estimates the initial values of LP. 
\[Step 2\] Calculates GP values from LP values. 
\[Step 3\] Checks the halt condition. 
\[Step 4\] Calculates LP values from GP values and 
GOT0 \[Step 2\]. 
3.2 Initial Estimation of Local Plausibility 
If the current version of the grammar is reason- 
ably comprehensive, pieces of linguistic knowledge 
which have to be acquired are likely to be lex- 
ical or idiosyncratic. That is, we assume that 
sublanguage-specificity tends to be manifested by 
unknown words, new usages of existing words, and 
syntactic constructions idiosyncratic to the sublan- 
guage. In order to quantify such plausibility, the 
following value is given to each hypothesis. 
W(Hypoi) × H(Hypoi) 
iP(Hypoi) = 1- W(S) x H(S) 
This value shows the proportion of the syntactic 
structure in the whole sentence which is not covered 
by the hypothesis. It ranges from 0 to 1 and gets 
larger if the hypothesis rectifies a smaller part of 
the sentence. W(Hypol), the width of the hypothe- 
sis, is defined as the word count of the subtree and 
H(Hypoi), the height, is defined as the shortest path 
from lexical nodes to the top node of the subtree. 
3.3 Generic Hypothesis and Global 
Plausibility 
The GP of a hypothesis is computed based on the 
LP values of its instance hypotheses, but the rela- 
tionship between a generic hypothesis and its in- 
stances is not straightforward because we adopted 
a unification-based grammar formalism. For exam- 
ple, the instance hypothesis of NP =:~ VP, NP in 
Figure 2 contains not only this CFG skeleton but 
also further feature descriptions of the three con- 
stituents which include specific surface words like 
"failing" and "students". Unless we generalize them, 
we cannot obtain the generic form of this instance 
hypothesis, and therefore cannot judge whether the 
hypotheses generated from different sentences are 
identical. 
Such generalization of instance hypotheses re- 
quires an inductive mechanism for judging which 
parts of the feature specification are common to all 
instance hypotheses and should be included in a hy- 
pothesis of the generic form. This kind of induc- 
tion is beyond the scope of the current framework, 
because such induction may need a lot of time and 
space if it is carried out from scratch. We first gather 
a set of instance hypotheses which are likely to be 
instances of the same generic hypothesis which, in 
turn, is likely to be "correct" linguistic knowledge. 
Our current framework uses a simple definition of 
generic hypotheses and their instances. That is, if 
two rule hypotheses have the same CFG skeleton, 
then they are judged to be instances of the same 
generic hypotheses. As for lexical hypotheses, we 
use a set of fixed templates of lexical entries in or- 
der to acquire detailed knowledge like subcategoriza- 
tion frames. Features which are not included in the 
templates are ignored in the judgement of whether 
generic hypotheses are identical. 
74 
TOP 
AN 
(HP2J (HP3J (.HP 
0.05 0.1 0.9 
(a) Collection of Global Plausibility 
,) 
0.45 
TOP 
AN: 
(HP2J (HP3) v.oo (HP 
0.06 
)D 
0.38 
0.38 
0.94 
(b) Distribution of Local Plausibility 
Figure 3: Calculation of Local Plausibility 
The GP of a generic hypotheses is defined as being 
the probability of the event that at least one instance 
hypothesis recovers the true cause of a parsing fail- 
ure, and it is computed by the following formula 
when a set of its instance hypotheses is identified. 
In the formula, HP is a generic hypothesis and HPi 
are its instances. 
n 
GP(HP) = 1 - H(1 - LP(HPi)) 
i=1 
The more instance hypotheses are generated, the 
closer to 1 GP(HP) becomes. If one of the instances 
is regarded to be recovering the true cause of a pars- 
ing failure, the GP of the generic hypothesis is as- 
signed 1, because the hypothesis is indispensable to 
the analysis of the corpus. 
3.4 Local Plausibility 
The calculation of LP is carried out on each hypoth- 
esis graph based on the assumption that an instance 
hypothesis or a set of instance hypotheses which re- 
covers the true cause(s) of the parsing failure should 
exist in the graph. This assumption means that the 
top node of a hypothesis graph is assigned 1 as its 
LP value. 
The LP value assigned to a node is to be dis- 
tributed to its daughter nodes by considering the 
GP values of the corresponding generic hypotheses. 
For example, the daughter nodes of an OR node, 
which constitute a set of competing hypotheses, re- 
ceive their LP values which are dividents of the LP 
value of the mother node proportional to their GP 
values. 
However, as GP is defined only for hypotheses, 
we first determine the GP values of all nodes in a 
hypothesis graph in a bottom-up manner, starting 
from the tip nodes of the graph to which instance 
hypotheses are attached. Therefore, \[Step 2\] in the 
statistical analysis is further divided into the follow- 
ing three steps. 
\[Step 2-1\] Bottom-up Calculation of GP 
The GP value of an intermediate node is deter- 
mined as follows (See Figure 3(a)). 
• The GP value of an OR node is computed by the 
following formula based on the GP values of the 
daughter nodes, which corresponds to the prob- 
ability that at least one of the daughter nodes 
represents "correct" grammatical knowledge. 
m 
GP(OR) = 1 - H(1 - GP(Nodei)) 
i=1 
• The GP value of an AND node is computed by 
the following formula, which corresponds to the 
probability that all the daughter nodes repre- 
sent "correct" grammatical knowledge. 
m 
GP(AND) = H GP(Nodei) 
i=1 
\[Step 2-2\] Deletion of Hypotheses 
The nodes which have significantly smaller GP 
values than the highest one among the daughter 
nodes of the same mother OR node (less than one 
tenth, in our current implementation) will be re- 
moved from the hypothesis graph. For example, HP2 
in Figure 3 was considered to be much less plau- 
sible than HP4 and removed from the graph. 
As a node in a hypothesis graph could have more 
than one mother nodes, the hypothesis deletion is 
realized by removing the link between the node rep- 
resenting the hypothesis and one of its mother OR 
nodes (not removing the node itself). For example, 
in Figure 3, when HP4 is removed in comparison 
with HP2 or HP3, the link between HP4 and the 
OR node is removed, while the link between HP4 
and the AND node still remains. 
The deletion of less viable nodes accelerates the 
convergence of the iterative process of computing 
GP and LP. 
75 
\[Step 2-3\] Top-down Calculation of LP 
This step distributes the LP assigned to the top 
node (that is, 1) to the nodes below in a top- 
down way according to the following rules (See Fig- 
ure 3(b)). 
• The LP value of an OR node is distributed to its 
daughter nodes proportional to their GP values 
so that the sum of their LP values is the same 
as that of the OR node because the daughter 
nodes of the same OR node represent mutually 
exclusive hypotheses. 
GP(Nodei) L " LP(gode,) = ~.~.---~l ~ej) P(OR) 
• The LP value of an AND node is distributed to 
its daughter nodes with the same values. 
LP(godei) = LP(AND) 
If a hypothesis has more than one mother nodes 
and its LP can be calculated through several paths, 
the sum of those is given to the hypothesis. For 
example, the value for HP4 in Figure 3 is 0.56 + 
0.38 = 0.94. 
As we discussed before, these newly computed LP 
values are used to compute the GP values at \[Step 
2\] in the next cycle of iteration. 
3.5 Halt Condition 
The iterative calculation process is regarded to have 
converged if the GP values of all the generic hypothe- 
ses do not change in comparison with the previous 
cycle, but as it possibly takes a lot of time for the 
process to reach such a situation, we use an easier 
condition to stop the process. That is, we count 
the number of deleted instance hypotheses at each 
cycle and terminate the iteration when no instance 
hypothesis is deleted in a number of consecutive it- 
erations. Actually, the process halts after 5 zero- 
deletion cycles in our current implementation. 
When the interative process terminates, the hy- 
potheses with high GP values are presented as the 
final candidates of new knowledge to be added to 
the current version of grammar. 
4 Preliminary Experiment 
In order to demonstrate how the HS works, we car- 
ried out a preliminary experiment with 1,000 sen- 
tences in the UNIX on-line manual (approximately 
one fifth of the whole manual). As the initial knowl- 
edge for the experiment, we prepared a grammar set 
which contains 120 rules covering English basic ex- 
pressions and deliberately removed rules for partici- 
ples in order to check whether the HS can discover 
adequate rules. The input data to the statistical pro- 
cess is a set of 5,906 instance hypotheses generated 
from 282 unsuccessfully parsed sentences. 
GP 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
1.000000 
0.925960 
0.913933 
0.750000 
0.683594 
0.500000 
0.336694 
0.000000 
0.000000 
Generic Hypothesis 
vp => vp,p. 
np => vp,np. 
n => \['double-quote'\]. 
n => \[filename\]. 
v => \[archived\]. 
n => \[directory\]. 
n => \['EOF'\]. 
adj => \['non-printing'\]. 
n => \[pathnames\]. 
n => \[cpp\]. 
n => \['NEWLINE'\]. 
n => \['.cshrc'\]. 
n--> \[backslash\]. 
n--> \[aliases\]. 
adj--> \[nonseekable\]. 
n => \[~ordlist\]. 
n => \[login\]. 
n => \['TERM'\]. 
n => \[cmdtool\]. 
n => \['command-line'\]. 
n => \[filenames\]. 
adj => \[backquoted\]. 
np => np,np. 
adj => \[blocking\]. 
#: Rank, 
adj => \[invisible\]. 
N: Number of instance hypotheses 
Table h List of "Correct" Hypotheses 
The statistical process removed 4,034 instance hy- 
potheses and stopped after 63 cycles of the iterative 
computation of GP and LP. The instance hypotheses 
were grouped into 2,876 generic hypotheses and the 
GP values of 2,331 generic hypotheses were reduced 
to 0 by the hypothesis deletion. 
Table 1 is the list of "correct" hypotheses picked 
up from the whole list of generic hypotheses sorted 
by GP values. The hypothesis for participles, np => 
vp,np, is one of the 128 hypotheses whose GP val- 
ues are 1. This table also shows that quite a few 
"correct" lexical hypotheses are in higher positions 
because lexical knowledge for unknown words is in- 
dispensable to the successful parsing of the corpus. 
The distribution of "correct" hypotheses within 
the whole list is shown in Table 2. The fact that 
"correct" hypotheses exist more in higher ranges 
supports our mechanism. Although some of the 
"correct" ones have zero GP values, they do not di- 
minish our framework because most of them are the 
hypotheses treating participles as adjectives, which 
are the alternative hypotheses of np => vp,np. 
The parameter which we can adjust to select 
more plausible hypotheses is the threshold for the 
hypothesis deletion. Generally speaking, giving a 
higher threshold causes an increase of the number 
of deleted hypotheses and therefore accelerates the 
convergence of the iterative process. In the experi- 
ment, however, the use of one fifth as the threshold 
instead of one tenth did not bring a major difference. 
76 
Range Rule Lexical Correct 
Hypothesis Hypothesis Hypothesis 
1- 100 
101- 200 
201- 300 
301- 400 
401- 500 
501-1000 
1001-2000 
2001-2876 
19 
41 
55 
76 
95 
473 
770 
653 
81 
59 
45 
24 
5 
27 
230 
223 
35 
25 
13 
5 
0 
9 
12 
9 
Total 2182 694 108 
Table 2: Distribution of "Correct" Hypotheses 
5 Conclusion 
The statistical analysis discussed in this paper is 
based on the assumption that types of linguistic 
knowledge to be acquired are: 
\[1\] Knowledge for syntactic constructions which is 
used frequently in the given sublanguage. 
\[2\] Lexical knowledge such as subcategorization 
frames and number properties, which is often 
idiosyncratic to the given sublanguage. 
\[3\] Knowledge which belongs neither to \[1\] nor to 
\[2\], but is indispensable to the given corpus. 
\[1\] implies that knowledge for less frequent con- 
structions can be ignored at the initial stage of lin- 
guistic knowledge customization. Such knowledge 
will be discovered after major defects of the current 
grammar are rectified, because the GP of a generic 
hypothesis is defined as being sensitive to the fre- 
quency of the hypothesis. 
\[2\] means that we assume that the set of initially 
provided grammar rules has a comprehensive cov- 
erage of English basic expressions. This assump- 
tion is reflected in the way of the initial estimation 
of LP values. Also note that only when this as- 
sumption is satisfied, can the HG produce a reason- 
able set of hypotheses. On the other hand, because 
of this assumption, our framework can learn struc- 
turally complex and linguistically meaningful lexical 
descriptions, like a subcategorization frame. 
\[3\] is reflected in the way of the computation of 
GP values. A generic hypothesis one of whose in- 
stances occurs as a single possible hypothesis that 
can recover a parsing failure will have the GP value 
of 1, even though its frequency is very low. 
The computation mechanism of GP and LP bears 
a resemblance to the EM algorithm(Dempster et al., 
1977; Brown et al., 1993), which iteratively com- 
putes maximum likelihood estimates from incom- 
plete data. As the purpose of our statistical analysis 
is to choose "correct" hypotheses from a hypothe- 
sis set which contains unnatural hypotheses as well, 
our motivation is different from that of the EM algo- 
rithm. However, if we consider that the hypothesis 
deletion is maxmizing the plausibility of "correct" 
hypotheses, the computation procedures of both al- 
gorithms have a strong similarity. 
The grammatical knowledge acquisition method 
proposed in this paper will be incorporated into the 
tool kit for linguistic knowledge customization which 
we are now developing. In the practical use of our 
method, a grammar maintainer will be shown a list 
of hypotheses with high GP values and renew the 
current version of grammatical knowledge. The re- 
newed knowledge will be used in the next cycle of 
hypothesis generation and selection to achieve the 
gradual enlargement of linguistic knowledge. 
Acknowledgements 
We would like to thank our colleagues in UMIST 
who gave us many usuful comments. We also want to 
thank Mr Tsumura and Dr Kawakami of Matsushita, 
who allowed the first author to study at UMIST. 

References 

Michael R. Brent. 1991. Automatic Acquisition 
of Subcategorization Frames from Untagged Text. 
In Proc. of the 29st ACL meeting, pages 209-214. 

Ted Briscoe and John Carroll. 1993. Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars. Computational Linguistics, 19(1):25-59. 

Peter F. Brown, Stephen A. Della Pietra, Vincent J. 
Della Pietra, and Robert L. Mercer. 1993. The 
Mathematics of Statistical Machine Translation: 
Parameter Estimation. Computational Linguistics, 19(2):263-311. 

Kenneth Church, William Gale, Patrick Hanks, and 
Donald Hindle. 1991. Using Statistics in Lexi- 
cal Analysis. In Uri Zernik, editor, Lexical Acqui- 
sition: Exploiting On-Line Resources to Build a 
Lexicon, chapter 6, pages 115-164. Lawrence Erl- 
baum Associates. 

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. 
Maximum Likelihood from Incomplete Data via 
the EM Algorithm. Journal of the Royal Statisti- 
cal Society, 39(B):1-38. 

T. Fujisaki, F. Jelinek, J. Cocke, E. Black, and T. 
Nishino. 1989. A Probabilistic Parsing Method 
for Sentence Disambiguation. In Proc. of the Int. 
Workshop on Parsing Technologies, pages 105- 
114. Carnegie-Mellon University. 

Masaki Kiyono and Jun'ichi Tsujii. 1993. Linguistic 
Knowledge Acquisition from Parsing Failures. In 
Proc. of EACL-93, pages 222-231. 

Masaki Kiyono and Jun'ichi Tsujii. 1994. Hypothesis Selection in Grammar Acquisition. In Proc. of 
COLING-94
