Exploiting Contextual Information in Hypothesis Selection for 
Grammar Refinement 
Thanaruk Theeramunkong Yasunobu Kawaguchi Manabu Okumura 
Japan Advanced Institute of Japan Advanced Institute of Japan Advanced Institute of 
Science and Technology Science and Technology Science and Technology 
1-1 Asahidai Tatsunokuchi 1-1 Asahidai Tatsunokuchi 1-1 Asahidai Tatsunokuchi 
Nomi Ishikawa Japan Nomi Ishikawa Japan Nomi Ishikawa Japan 
ping~j aist. ac. j p kawagut i©j aist. ac. j p oku~j aist. ac. j p 
Abstract 
In this paper, we propose a new frame- 
work of grammar development and some 
techniques for exploiting contextual infor- 
mation in a process of grammar refine- 
ment. The proposed framework involves 
two processes, partial grammar acquisition 
and grammar refinement. In the former 
process, a rough grammar is constructed 
from a bracketed corpus. The grammar is 
later refined by the latter process where 
a combination of rule-based and corpus- 
based approaches is applied. Since there 
may be more than one rules introduced as 
alternative hypotheses to recover the anal- 
ysis of sentences which cannot be parsed by 
the current grammar, we propose a method 
to give priority to these hypotheses based 
on local contextual information. By experi- 
ments, our hypothesis selection is evaluated 
and its effectiveness is shown. 
1 Introduction 
One of the essential tasks to realize an efficient 
natural language processing system is to construct 
a broad-coverage and high-accurate grammar. In 
most of the currently working systems, such gram- 
mars have been derived manually by linguists 
or lexicographers. Unfortunately, this task re- 
quires time-consuming skilled effort and, in most 
cases, the obtained grammars may not be com- 
pletely satisfactory and frequently fail to cover 
many unseen sentences. Toward these problems, 
there were several attempts developed for automat- 
ically learning grammars based on rule-based ap- 
proach(Ootani and Nakagawa, 1995), corpus-based 
approach(Srill, 1992)(Mori and Nagao, 1995) or hy- 
brid approach(Kiyono and Tsujii, 1994b)(Kiyono 
and Tsujii, 1994a). 
Unlike previous works, we have introduced a 
new framework for grammar development, which is 
a combination of rule-based and corpus-based ap- 
proaches where contextual information can be ex- 
ploited. In this framework, a whole grammar is not 
acquired from scratch(Mori and Nagao, 1995) or an 
initial grammar does not need to be assumed(Kiyono 
and Tsujii, 1994a). Instead, a rough but effective 
grammar is learned, in the first place, from a large 
corpus based on a corpus-based method and then 
later refined by the way of the combination of rule- 
based and corpus-based methods. We call the former 
step of the framework partial grammar acquisition 
and the latter grammar refinement. For the partial 
grammar acquisition, in our previous works, we have 
proposed a mechanism to acquire a partial gram- 
mar automatically from a bracketed corpus based on 
local contextual information(Theeramunkong and 
Okumura, 1996) and have shown the effectiveness 
of the derived grammar(Theeramunkong and Oku- 
mura, 1997). Through some preliminary experi- 
ments, we found out that it seems difficult to learn 
grammar rules which are seldom used in the corpus. 
This causes by the fact that rarely used rules oc- 
cupy too few events for us to catch their properties. 
Therefore in the first step, only grammar rules with 
relatively high occurrence are first learned. 
In this paper, we focus on the second step, gram- 
mar refinement, where some new rules can be added 
to the current grammar in order to accept un- 
parsable sentences. This task is achieved by two 
components: (1) the rule-based component, which 
detects incompleteness of the current grammar and 
generates a set of hypotheses of new rules and (2) 
the corpus-based component, which selects plausible 
hypotheses based on local contextual information. 
In addition, this paper also describes a stochastic 
parsing model which finds the most likely parse of a 
sentence and then evaluates the hypothesis selection 
based on the plausible parse. 
In the rest, we give an explanation of our frame- 
work and then describe the grammar refinement pro- 
cess and hypothesis selection based on local contex- 
tual information. Next, a stochastic parsing model 
which exploits contextual information is described. 
Finally, the effectiveness of our approach is shown 
through some experiments investigating the correct- 
ness of selected hypotheses and parsing accuracy. 
78 
2 The Framework of Grammar 
Development 
The proposed framework is composed of two phases: 
partial grammar acquisition and grammar refine- 
ment. The graphical representation of the frame- 
work is shown in Figure 1. In the process of grammar 
development, a partial grammar is automatically ac- 
quired in the first phase and then it is refined in 
the second phase. In the latter phase, the system 
generates new rules and ranks them in the order of 
priority before displaying a user a list of plausible 
rules as candidates for refining the grammar. Then 
the user can select the best one among these rules. 
Currently, the corpus used for grammar development 
in the framework is EDR corpus(EDR, 1994) where 
lexical tags and bracketings are assigned for words 
and phrase structures of sentences in the corpus re- 
spectively but no nonterminal labels are given. 
Partial Grammar 
\[ P ,l 1 l Grammar I ~"~l'-z'-'-'-'-~ ..... 
\] Acquisition I \[ t'-" I -I====:- 
°":"':! ...... 1 .,o,.,°°o.o 
...................... \[ Grammar 
.................... D I Refinement 
.................. L Phase 
:::::::7 I 
| 
New Rule 
Hypotheses 
Do,,olomr 
Figure 1: The overview of our grammar development 
framework 
2.1 Partial Grammar Acquisition 
In this section, we give a brief explanation for par- 
tial grammar acquisition. More detail can be found 
in (Theeramunkong and Okumura, 1996). In par- 
tial grammar acquisition, a rough grammar is con- 
structed from the corpus based on clustering anal- 
ysis. As mentioned above, the corpus used is a 
tagged corpus with phrase structures marked with 
brackets. At the first place, brackets covering a 
same sequence of categories, are assumed to have 
a same nonterminal label. We say they have the 
same bracket type. The basic idea is to group brack- 
ets (bracket types) in a corpus into a number of 
similar bracket groups. Then the corpus is auto- 
matically labeled with some nonterminal labels, and 
consequently a grammar is acquired. The similarity 
between any two bracket types is calculated based 
on divergencel(Harris, 1951) by utilizing local con- 
textual information which is defined as a pair of 
categories of words immediately before and after a 
bracket type. This approach was evaluated through 
some experiments and the obtained result was al- 
most consistent with that given by human evalua- 
tots. However, in this approach, when the number 
of occurrences of a bracket type is low, the simi- 
larity between this bracket type and other bracket 
types is not so reliable. Due to this, only bracket 
types with relatively frequent occurrence are taken 
into account. To deal with rarely occurred bracket 
types, we develop the second phase where the sys- 
tem shows some candidates to grammar developers 
and then they can determine the best one among 
these candidates, as shown in the next section. 
2.2 Grammar Refinement with Additional 
Hypothesis Rule 
The grammar acquired in the previous phase is a 
partial one. It is insufficient for analyzing all sen- 
tences in the corpus and then the parser fails to 
produce any complete parse for some sentences. In 
order to deal with these unparsable sentences, we 
modify the conventional chart parser to keep record 
of all inactive edges as partial parsing results. Two 
processes are provoked to find the possible plausible 
interpretations of an unparsable sentence by hypoth- 
esizing some new rules and later to add them to the 
current grammar. These processes are (1) the rule- 
based process, which detects incompleteness of the 
current grammar and generates a set of hypothe- 
ses of new rules and (2) the corpus-based process, 
which selects plausible hypotheses based on local 
contextual information. In the rule-based process, 
the parser generates partial parses of a sentence as 
much as possible in bottom-up style under the gram- 
mar constraints. Utilizing these parses, the process 
detects a complete parse of a sentence by starting 
at top category (i.e., sentence) covering the sentence 
and then searching down, in top-down manner, to 
the part of the sentence that cannot form any parse. 
At this point, a rule is hypothesized. In many cases, 
there may be several possibilities for hypothesized 
rules. The corpus-based process, as the second pro- 
cess, uses the probability information from parsable 
sentences to rank these hypotheses. In this research, 
local contextual information is taken into account for 
this task. 
1 The effectiveness of divergence for detecting phrase 
structures in a sentence is also shown in (Brill, 1992). 
79 
3 Hypothesis Generation 
When the parser fails to parse a sentence, there ex- 
ists no inactive edge of category S (sentence) span- 
ning the whole sentence in the parsing result. Then 
the hypothesis generation process is provoked to find 
all possible hypotheses in top-down manner by start- 
ing at a single hypothesis of the category S covering 
the whole sentence. This process uses the partial 
chart constructed during parsing the sentence. This 
hypothesis generation is similar to one applied in 
(Kiyono and Tsujii, 1994a). 
\[Hypothesis generation\] 
An inactive edge \[is(A) : xo, xn\] can be in- 
troduced from x0 to x,, with label A, for 
each of the hypotheses generated by the fol- 
lowing two steps. 
1. For each sequence of inactive edges, \[is(B1) : 
xo, x l \] , ..., \[ ie( Bn ) : Xn- l , agn \] , spanning from x0 
to xn, generate a new rule, A ---, Bz, ..., Bn, and 
propose a new inactive edge as a hypothesis, 
\[hypo(A) : xo, xn\]. (Figure 2(1)) 
2. For each existing rule A --+ A1, ..., An, find an 
incomplete sequence of inactive edges, \[ie(A1) : 
xo, xl\],...,\[ie(ai-1) : xi-2, zi-1\], \[ie(Ai+l) : 
xi, xi+l\], ..., \[ie(An) : xn-z, xn\], and call this al- 
gorithm for \[ie(Ai): xl-z, xl\].(Figure 2(2)) 
(1) B1 Bn 
9~D- e- e- (~-~ .... .... .... 
xO X l Xn. f xn # 
Assumeerule: A ..~ BI ..... Bn 
(2) An existing rule : A.-~ A1,...,Ai-I, Ai, Ai+ I,...An 
A1 Ai-1 Ai + l An 
.... .... 
.~ xl xl,~ xl-1 ~ xl. 1 x~1 Xn # 
Find Ai between xi-1 and Xi 
Figure 2: Hypothesis Rule Generation 
By this process, all of possible single hypotheses 
(rules) which enable the parsing process to succeed, 
are generated. In general, among these rules, most of 
them may be linguistically unnatural. To filter out 
such unnatural hypotheses, some syntactical crite- 
ria are introduced. For example, (1) the maximum 
number of daughter constituents of a rule is limited 
to three, (2) a rule with one daughter is not pre- 
ferred, (3) non-lexical categories are distinguished 
from lexical categories and then a rule with lexical 
categories as its mother is not generated. By these 
simple syntactical constraints, a lot of useless rules 
can be discarded. 
4 Hypothesis Selection with Local 
Contextual Information 
Hypothesis selection utilizes information from local 
context to rank the rule hypotheses generated in the 
previous phase. In the hypothesis generation, al- 
though we use some syntactical constraints to reduce 
the number of hypotheses of new rules that should 
be registered into the current grammar, there may 
still be several candidates remaining. At this point, 
a scoring mechanism is needed for ranking these can- 
didates and then one can select the best one as the 
most plausible hypothesis. 
This section describes a scoring mechanism which 
local contextual information can be exploited for this 
purpose. As mentioned in the previous section, lo- 
cal contextual information referred here is defined 
as a pair of categories of words immediately before 
and after the brackets. This information can be used 
as an environment for characterizing a nonterminal 
category. The basic idea in hypothesis selection is 
that the rules with a same nonterminal category as 
their mother tend to have similar environments. Lo- 
cal contextual information is gathered beforehand 
from the sentences in the corpus which the current 
grammar is enough for parsing. 
When the parser faces with a sentence which can- 
not be analyzed by the current grammar, some new 
rule hypotheses are proposed by the hypothesis gen- 
erator. Then the mother categories of these rules 
will be compared by checking the similarity with the 
local contextual information of categories gathered 
from the parsable sentences. Here, the most likely 
category is selected and that rule will be the most 
plausible candidate. The scoring function (probabil- 
ity p) for a rule hypothesis Cat ---* a is defined as 
follows. 
p(Cat --* ~) -" p(Cat\[l, r) - g(Cat, l, r) N(l, r) (1) 
where N(Cat, l, r) is the number of times that Cat 
is occurred in the environment (l, r). I is the cat- 
egory immediately before Cat and r is the lexical 
category of the word immediately after Cat. N(l, r) 
is the number of times that i and r are occurred 
immediately before and after any categories. Note 
that because it is not possible for us to calculate the 
probability of Cat ---+ ot in the environment of (l, r), 
we estimate this number by the probability that Cat 
occurs in the environment of (l, r). That is, how easy 
the category Cat appears under a certain environ- 
ment (l, r). 
80 
5 The Stochastic Model 
This section describes a statistical parsing model 
which finds the most plausible interpretation of a 
sentence when a hypothesis is introduced for recov- 
ering the parsing process of the sentence. In this 
problem, there are two components taken into ac- 
count: a statistical model and parsing process. The 
model assigns a probability to every candidate parse 
tree for a sentence. Formally, given a sentence S 
and a tree T, the model estimates the conditional 
probability P(TIS). The most likely parse under the 
model is argrnaxT P(TIS ) and the parsing process is 
a method to find this parse. In general, a model of 
a simple probabilistic context free grammar (CFG) 
applies the probability of a parse which is defined as 
the multiplication of the probability of all applied 
rules. However, for the purposes of our model where 
left and right contexts of a constituent are taken into 
account, the model can be defined as follows. 
P(T\]S) = H p(rli, li, ri) (2) 
(rl, ,I,,ri)ET 
where rli is an application rule in the tree and l~ and ri 
are respectively the left and right contexts at the 
place the rule is applied. In a parsing tree, there is 
a hypothesis rule for which we cannot calculate the 
probability because it does not exist in the current 
grammar. Thus we estimate its probability by using 
the formula (1) in section 4. 
Similar to most probabilistic models, there is 
a problem of low-frequency events in this model. 
Although some statistical NL applications apply 
backing-off estimation techniques to handle low- 
frequency events, our model uses a simple interpola- 
tion estimation by adding a uniform probability to 
every events. Moreover, we make use of the geomet- 
ric mean of the probability instead of the original 
probability in order to eliminate the effect of the 
number of rule applications as done in (Magerman 
and Marcus, 1991). The modified model is: 
P(T\]S) = 
( H (~*p(rl"l"r')+(1-c~)*N-'~))ff'I 
(rll ,l, ,r~)ET 
(3) 
Here, o~ is a balancing weight between the observed 
distribution and the uniform distribution. It is as- 
signed with 0.8 in our experiments. Nrl is the num- 
ber of rules and Nc is the number of possible con- 
texts, i.e., the left and right categories. The applied 
parsing algorithm is a simple bottom-up chart parser 
whose scoring function is based on this model. A 
dynamic programming algorithm is used to find the 
Viterbi parse: if there are two proposed constituents 
which span the same set of words and have the same 
label, then the lower probability constituent can be 
safely discarded. 
81 
6 Experimental Evaluation 
Some evaluation experiments and their results are 
described. For the experiments, we use texts from 
the EDR corpus, where bracketings are given. The 
subject is 48,100 sentences including around 510,000 
words. Figure 3 shows some example sentences in 
the EDR corpus 
(((ART," a" )((ADJ ," large" )(NOUN ,"festival" ))) 
((VT,"held")(ADV,"biennially"))) 
((AOV,"again")((PRON,"he")((VT,"says") 
((P RON," he")((A DV," completely") ((VT,"forgot")((PaEe,"about") 
((eaOi," his" )(NOUN," homework" ))))))))) 
Figure 3: Some example sentences in the EDR cor- 
pus 
The initial grammar is acquired from the same 
corpus using divergence shown in section 2.1. The 
number of rules is 272, the maximum length of rules 
is 4, and the numbers of terminal and nonterminal 
categories are 18 and 55 respectively. A part of the 
initial grammar is enumerated in Figure 4. In the 
grammar, llnl is expected to be noun phrase with an 
article, lln2 is expected to be noun phrase without 
an article, and iln3 is expected to be verb phrase. 
Moreover, among 48,100 sentences, 5,083 sentences 
cannot be parsed by the grammar. We use these 
sentences for evaluating our hypothesis selection ap- 
proach. 
llnl ---. adv, noun 
llnl ---* adv, llnl 
llnl ~ adv, lln2 
llnl ---* art, noun 
lln2 ---* adj, noun 
lln2 ---* adj, llnl 
lln2 ---* adj, lln2 
lln2 ~ adj, lln8 
lln3 -* adv, lln3 
lln3 ---* aux, vt 
lln3 ---* aux, llnl3 
lln3 ---* llnl2, vt 
Figure 4: A part of initial grammar 
6.1 The Criterion 
In the experiments, we use bracket crossing as a cri- 
terion for checking the correctness of the generated 
hypothesis. Each result hypothesis is compared with 
the brackets given in the EDR corpus. The correct- 
ness of a hypothesis is defined as follows. 
Ranking A/all • At least one of the derivations inside the hy- pothesis include the brackets which do not cross 
with those given in the corpus 
• When the hypothesis is applied, it can be used 
to form a tree whose brackets do not cross with 
those given in the corpus. 
6.2 Hypothesis Level Evaluation 
From 5,083 unparsable sentences, the hypothesis 
generator can produce some hypotheses for 4,730 
sentences (93.1%). After comparing them with the 
parses in the EDR corpus, the hypothesis sets of 
3,127 sentences (61.5 %) include correct hypothe- 
ses. Then we consider the sentences for which some 
correct hypotheses can be generated (i.e., 3,127 sen- 
tences) and evaluate our scoring function in selecting 
the most plausible hypothesis. For each sentence, 
we rank the generated hypotheses by their prefer- 
ence score according to our scoring function. The 
result is shown in Table 1. From the table, even 
though only 12.3 % of the whole generated hypothe- 
ses are correct, our hypothesis selection can choose 
the correct hypothesis for 41.6 % of the whole sen- 
tences when the most plausible hypothesis is selected 
for each sentence. Moreover, 29.8 % of correct hy- 
potheses are ordered at the ranks of 2-5, 24.3 % at 
the ranks of 6-10 and just only 6.2 % at the ranks of 
more than 50. This indicates that the hypothesis se- 
lection is influential for placing the correct hypothe- 
ses at the higher ranks. However, when we consider 
the top 10 hypotheses, we found out that the accu- 
racy is (1362+3368+3134)/(3217+11288+12846)= 
28.8 %. This indicates that there are a lot of hy- 
potheses generated for a sentence. This suggests us 
to consider the correct hypothesis for each sentence 
instead of all hypotheses. 
Ranking 
1 
2-5 
6-10 
11-20 
21-30 
31-50 
51- 
all 
whole (A) 
hypotheses 
3217 
11288 
12846 
22105 
17743 
27001 
102015 
196214 
correct (B) A/B 
hypotheses 
1340 41.6% 
3368 29.8% 
3134 24.3% 
4300 19.4% 
2673 i5.0% 
3033 11.2% 
6315 6.2% 
24203 12.3% 
Table 1: Hypothesis Level Evaluation 
6.3 Sentence Level Evaluation 
In this section, we consider the accuracy of our hy- 
pothesis selection for each sentence. Table 2 displays 
the accuracy of hypothesis selection by changing the 
number of selected hypotheses. 
From the table, the number of sentences whose 
best hypothesis is correct, is 1,340 (41.6%) and we 
1 
2-5 
6-10 
11-20 
21-30 
31-50 
51- 
all 
sentences with 
correct hypo.(A) 
1340 
1006 
277 
225 
111 
121 
136 
3217 
41.6 % 
31.2 % 
8.6 % 
7.0 % 
3.5 % 
3.8 % 
4.2 % 
100.0 % 
Table 2: Sentence Level Evaluation 
can get up to 2,623 (81.5%) accuracy when the top 
10 of the ordered hypotheses are considered. The 
result shows that our hypothesis selection is effective 
enough to place the correct hypothesis at the higher 
ranks. 
6.4 Parsing Evaluation 
Another experiment is also done for evaluating the 
parsing accuracy. The parsing model we consider 
here is one described in section 5. The chart parser 
outputs the best parse of the sentence. This parse 
is formed by using grammar rules and a single rule 
hypothesis. The result is shown in Table 3. In this 
evaluation, the PARSEVAL measures as defined in 
(Black and et al., 1991) are used: 
Precision : 
number of correct brackets in proposed parses 
Recall = 
number of brackets in proposed parses 
number of correct brackets in proposed parses 
number of brackets in corpus parses 
From this result, we found out that the parser can 
succeed 57.3 % recall and 65.2 % precision for the 
short sentences (3-9 words). In this case, the aver- 
aged crossings are 1.87 per sentence and the number 
of sentences with less than 2 crossings is 69.2 % of 
the comparisons. For long sentences not so much ad- 
vantage is obtained. However, our parser can achieve 
51.4 % recall and 56.3 % precision for all unparsable 
sentences. 
7 Discussion and Conclusion 
In this paper, we proposed a framework for exploit- 
ing contextual information in a process of grammar 
refinement. In this framework, a rough grammar 
is first learned from a bracketed corpus and then 
the grammar is refined by the combination of rule- 
based and corpus-based methods. Unlike stochastic 
parsing such as (Magerman, 1995)(Collins, 1996), 
our approach can parse sentences which fall out the 
current grammar and suggest the plausible hypoth- 
esis rules and the best parses. The grammar is not 
acquired from scratch like the approaches shown in 
82 
Sent. Length 
Comparisons 
Avg. Sent. Len. 
Corpus Parses 
System's Parses 
Crossings/Sent. 
Sent. cross.= 0 
Sent. cross.< 1 
Sent. cross.< 2 
Recall 
Precision 
3-9 
1980 
6.9 
5.15 
5.78 
1.87 
20.1% 
43.9% 
69.2% 
57.3% 
65.2% 
3-15 
3864 
9.5 
7.65 
8.27 
3.32 
10.6% 
25.0% 
41.7% 
53.2% 
58.7% 
10-19 
2491 
13.4 
11.47 
12.07 
5.69 
0.4% 
3.9% 
9.7% 
47.3% 
50.0% 
all length 
4730 
10.8 
8.95 
9.57 
4.18 
8.9% 
21.1% 
35.1% 
51.4% 
56.3% 
Table 3: Parsing Accuracy 
(Pereira and Schabes, 1992)(Mort and Nagao, 1995). 
Through some experiments, our method can achieve 
effective hypothesis selection and parsing accuracy 
to some extent. As our further work, we are on the 
way to consider the correctness of the selected hy- 
pothesis of the most plausible parses proposed by 
the parser. Some improvements are needed to grade 
up the parsing accuracy. Another work is to use 
an existing grammar, instead of an automatically 
learned one, to investigate the effectiveness of con- 
textual information. By providing a user interface, 
this method will be useful for grammar developers. 
Acknowledgements 
We would like to thank the EDR organization for 
permitting us to access the EDR corpus. Special 
thanks go to Dr. Ratana Rujiravanit, who helps me 
to keenly proofread a draft of this paper. We also 
wish to thank the members in Okumura laboratory 
at JAIST for their useful comments and their tech- 
nical supports. 

References 
Black, E. and et al. 1991. A procedure for quantita- 
tively comparing the syntactic coverage of English 
grammars. In Proc. of the 1991 DARPA Speech 
and Natural Language Workshop, pages 306-311. 
Brill, Eric. 1992. Automatically acquiring phrase 
structure using distributional analysis. In Proc. 
of Speech and Natural Language Workshop, pages 
155-159. 
Collins, Michael John. 1996. A new statistical 
parser based on bigram lexical dependencies. In 
Proc. of the 3~th Annual Meeting of the ACL, 
pages 184-191. 
EDR: Japan Electronic Dictionary Research Insti- 
tute, 1994. EDR Electric Dictionary User's Man- 
ual (in Japanese), 2.1 edition. 
Harris, Zellig. 1951. Structural Linguistics. 
Chicago: University of Chicago Press. 
Kiyono, Masaki and Jun'iehi Tsujii. 1994a. Combi- 
nation of symbolic and statistical approaches for 
grammatical knowledge acquisition. In Proc. of 
4th Conference on Applied Natural Language Pro- 
cessing (ANLP'94), pages 72-77. 
Kiyono, Masaki and Jun'ichi Tsujii. 1994b. Hy- 
pothesis selection in grammar acquisition. In 
COLING-94, pages 837-841. 
Magerman, D. M. and M. P. Marcus. 1991. Pearl: 
A probabilistic chart parser. In Proceedings of the 
European A CL Conference. 
Magerman, David M. 1995. Statistical decision-tree 
models for parsing. In Proceeding of 33rd Annual 
Meeting of the ACL, pages 276-283. 
Mort, Shinsuke and Makoto Nagao. 1995. Parsing 
without grammar. In Proc. of the 4th Interna- 
tional Workshop on Parsing Technologies, pages 
174-185. 
Ootani, K. and S Nakagawa. 1995. A semi- 
automatic learning method of grammar rules for 
spontaneous speech. In Proc. of Natural Language 
Processing Pacific Rim Symposium'95, pages 514- 
519. 
Pereira, F. and Y. Schabes. 1992. Inside-outside 
reestimation from partially bracketed corpora. In 
Proceedings of 30th Annual Meeting of the ACL, 
pages 128-135. 
Theeramunkong, Thanaruk and Manabu Okumura. 
1996. Towards automatic grammar acquisition 
from a bracketed corpus. In Proc. of the 4th Inter- 
national Workshop on Very Large Corpora, pages 
168-177. 
Theeramunkong, Thanaruk and Manabu Okumura. 
1997. Statistical parsing with a grammar acquired 
from a bracketed corpus based on clustering anal- 
ysis. In International Joint Conference on Artifi- 
cial Intelligence (IJCAI-97), Poster Session. 
