II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
A Method of Incorporating Bigram Constraints into an LR 
Table and Its Effectiveness in Natural Language Processing 
i 
Hiroki Imai and Hozumi Tanaka 
Graduate School of Information Science and Technology 
Tokyo Institute of Technology 
2-12-10-okayama, Meguro, Tokyo 152 Japan 
{imai, tanaka}@cs, t itech, ac. jp 
Abstract 
In this paper, we propose a method for con- 
structing bigram LR tables by way of incor- 
porating bigram constraints into an LR table. 
Using a bigram LR table, it is possible for a 
GLR parser to make use of both big'ram and 
CFG constraints in natural language process- 
ing. Applying bigram LR tables to our GLR 
method has the following advantages: 
(1) Language models utilizing bigzam LR ta- 
bles have lower perplexity than simple bigram 
language models, since local constraints (hi- 
gram) and global constraints (CFG) are com- 
bined in a single bigram LR table. 
(2) Bigram constraints are easily acquired from 
a given corpus. Therefore data sparseness is 
not likely to arise. 
(3) Separation of local and global constraints 
keeps down the number of CFG rules. 
The first advantage leads to a reduction in 
complexity, and as the result, better perfor- 
mance in GLR parsing. 
Our experiments demonstrate the effectiveness 
of our method. 
1 Introduction 
In natural language processing, stochastic language 
models are commonly used for lexical and syntactic 
disambiguation (Fujisaki et al., 1991; Franz, 1996). 
Stochastic language models are also helpful in re- 
ducing the complexity of speech and language pro- 
cessing by way of providing probabilistic linguistic 
constraints (Lee, 1989). 
N-gram language models (Jelinek, 1990), includ- 
ing bigram and trigram models, are the most com- 
monly used method of applying local probabilis- 
tic constraints. However, context-free grammars 
(CFGs) produce more global linguistic constraints 
than N-gram models. It seems better to combine 
both local and global constraints and use them both 
concurrently in natural language processing. The 
reason why N-gram models are preferred over CFGs 
is that N-gram constraints are easily acquired from 
a given corpus. However, the larger N is, the more 
serious the problem of data sparseness becomes. 
CFGs are commonly employed in syntactic pars- 
ing as global linguistic constraints, since many eifi- 
cient parsing algorithms are available. GLR (Gen- 
eralized LR) is one such parsing algorithm that uses 
an LR table, into which CFG constraints are pre- 
compiled in advance (Knuth, 1965; Tomita, 1986). 
Therefore if we can incorporate N-gram constraints 
into an LR table, we can make concurrent use of 
both local and global linguistic constraints in GLR 
parsing. 
In the following section, we will propose a method 
that incorporates bigram constraints into an LR ta- 
ble. The advantages of the method are summarized 
as follows: 
First, it is expected that this method produces a 
lower perplexity than that for a simple bigram lan- 
guage model, since it is possible to utilize both local 
(bigram) and global (CFG) constraints in the LR 
table. We will evidence this reduction in perplexity 
by considering states in an LR table for the case of 
GLR parsing. 
Second, bigram constraints are easily acquired 
from smaller-sized corpora. Accordingly, data 
sparseness is not likely to arise. 
Third, the separation of local and global con- 
straints makes it easy to describe CFG rules, since 
CFG writers need not take into'account tedious de- 
scriptions of local connection constraints within th 
CFG I . 
2 CFG, Connection Matrix and LR 
table 
2.1 Relation between CFG and Connection 
Constraints 
Figure 1 represents a situation in which ai and bj are 
adjacent to each other, where a~ belongs to Setl (i = 
1,..-,I) and bj belongs to Setj (j = 1,..., J). Set~ 
l(Tana~ et al., 1997) reported that the separate de- 
scription of local and global constraints reduced the CFG 
rules to one sixth of their original number. 
Imai and Tanaka 225 A Method of Incorporating Bigram Constraints 
Hirold Ima/ and Hozumi Tanaka (1998) A Method of Incorporating Bigram Constraints into an LR Table and Its 
Effectiveness in Natural Language Processing. In D.M.W. Powers (ed.) NeMLaP3/CoNLL98: New Methods in Language 
Processing and Computational Natural Language Learning, ACL, pp 225-233. 
X 
Y Z 
bj 
Figure 1: Connection check by CFG 
and Sets are defined by last1 (Y) and first1 ( Z) (Aho 
et al., 1986), respectively. If a E Setz and b E Sets 
happen not to be able to occur in this order, it be- 
comes a non-trivial task to express this adjacency 
restriction within the framework of a CFG. 
One solution to this problem is to introduce a new 
nonterminal symbol Ai for each a~ and a nonterminal 
symbol Bj for each hi. We then add rules of the form 
A --* Ai and Ai "* ai, and B ~ Bj and B i --* bj. 
As a result of this rule expansion, the order of the 
number of rules will become I x J in the worst case. 
The introduction of such new nonterminal symbols 
leads to an increase in grammar rules, which not 
only makes the LR table very large in size, but also 
diminishes efficiency of the GLR parsing method. 
The second solution is to augment X ~ Y Z with 
a procedure that checks the connection between a~ 
and bj. This solution can avoid the problem of the 
expansion of CFG rules, but we have to take care of 
the information flow from the bottom leaves to the 
upper nodes in the tree, Y, Z, and X. 
Neither the first nor the second solution are prefer- 
able, in terms of both efficiency of GLR parsing and 
description of CFG rules. Additionally, it is a much 
easier task to describe local connection constraints 
between two adjacent terminal symbols by way of 
a connection matrix such as in Figure 2, than to 
express these constraints within the CFG. 
The connection matrix in Figure 2 is defined as: 
= { ~ if bj can follow ai Connect( a~, bj ) 
otherwise (1) 
The best solution seems to be to develop a method 
that can combine both a CFG and a connection ma- 
trix, avoiding the expansion of CFG rules. Conse- 
quently, the size of the LR table will become smaller 
and we will get better GLR parsing performance. 
In the following section, we will propose one such 
method. Note that we are considering connections 
between preterminals rather than words. Thus, we 
will have Connect(ai, bj) = 0 in the preterminal con- 
nection matrix similarly to the case of words. 
a2 
a~ 
I 
• "-- .... bj .-- 
I a 
I I 
I i 
I .... I .... 
! | 
o | 
e I 
--1 .... 0 .... 
| | 
| | 
I i 
o | 
i e 
o j 
i e 
f I | | 
I i 
| | 
= | 
I s 
t | 
| I 
| | 
| I 
| i 
Figure 2: Connection matrix 
2.2 Relation between the LR Table and 
Connection Matrix 
First we discuss the relation between the LR table 
and a connection matrix. The action part of an LR 
table consists of lookahead symbols and states. Let 
a shift action sh m be in state l with the lookahead 
symbol a. After the GLR parser executes action 
sh m, the symbol a is pushed onto the top of the 
stack and the GLR parser shifts to state m. Sup- 
pose there is an action A in state m with looka- 
head b (see Figure 3). The action A is executable 
if Connect(a,b) ~ 0 (b can follow a), whereas if 
Connect(a, b) = 0 (b cannot follow a), the action 
A in state m with lookahead b is not executable and 
we can remove it from the LR table as an invalid 
action. Removing such invalid actions enables us to 
incorporate connection constraints into the LR table 
in addition to the implicit CFG constraints. 
In section 3.2, we will propose a method that in- 
tegrates both bigram and CFG constraints into an 
LR table. After this integration process, we obtain 
a table called a bigram LR table. 
3 Integration of Bigram and CFG 
Constraints into an LR Table 
3.1 The Definition of a Probabilistic 
Connection Matrix 
A close relation exists between bigrams and connec- 
tion matrices, in that the bigram probability P(bla ) 
corresponds to the matrix dement of Connect(a, b). 
A connection matrix incorporating bigram probabil- 
ities is called a probabilistic connection matrix, in 
which Connect(a, b) = 0 still means b cannot follow 
a, but instead of connection matrix entries having 
a binary value of 0 or 1, a probability is associated 
with each element. This is then used to construct a 
probabilistic LR table. 
lmai and Tanaka 226 A Method of Incorporating Bigram Constraints 
I 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
m m 
m 
LR table 
• °.o°°° , 
I | ...... ' 
m ........ 
a 
sh m \ 
' =°°==" ' b , .... ~ ...... 
Acl ............ 
Act is removed if Connect (a,b) = 0 
Stack Input symbols 
Figure 3: LR table and Connection Constraints 
The N-gram model is the most commonly used 
probabilistic language model, and it assumes that a 
symbol sequence can be described by a higher order 
Markov process. The simplest N-gram model with 
N = 2 is called a bigram model, and approximates 
the probability of a string X = xzx2xa...x,~ as the 
product of conditional probabilities: 
P(X) = e(xzl#)P(x2lxl)... P(x=lx=-l)P($1x,) (2) 
In the above expression, "#" indicates the sen- 
tence beginning marker and "$" indicates the sen- 
tence ending marker. The above big-ram model can 
be represented in a probabilistic connection matrix 
defined as follows. 
DEFINITION 1 (probabilistic connection matriz) 
Let G = (V~v, Vr, P, S) be a context-free gram- 
mar. For Va, b E VT (the set of terminal symbols), 
the probabilistic connection matrix named PConnect 
is defined as follows. 
PConnect(a, b) = P(bla ) (3) 
where P(bJa ) is a conditional probability and 
P(bla) = 1. 
PConnect(a,b) = 0 means that a and b 
cannot occur consecutively in the given order. 
PConnect(a, b) ~ 0 means b can follow a with prob- 
ability P(b\[a). 
3.2 An algorithm to construct a bigram LR 
table 
An algorithm to construct a probabilistic LR ta- 
ble, combining both bigram and CFG constraints, 
is given in Algorithm I: 
Algorithm 1 
Input: A CFG G = (Vjv, VT, P, S) and a probabilis- 
tic connection matrix PConnect. 
Output: An LR table T with CFG and big-ram con- 
straints. 
Method: 
Step 1 Generate an LR table To from the given 
CFG G. 
Step 2 Removal of actions: 
For each shift action shm with lookahead a in 
the LR table To, delete actions in the state m 
with lookalaead b if PConnect(a, b) = O. 
Step 3 Constraint Propagation (Tanaka et al., 
1994): 
Repeat the following two procedures until no 
further actions can be removed: 
1. Remove actions which have no succeeding 
action, 
2. Remove actions which have no preceding 
action. 
Step 4 Compact the LR table if possible. 
Step 5 Incorporation of big-ram constraints into the 
LR table: 
For each shift action shm with lookahead a in 
the LR table To, let 
N 
P = ~ PConnect(a, bi) 
i=1 
where {hi : i = 1,-.-,N} is the set of looka- 
heads for state m. For each action Aj in state 
rn with lookahead bi, assign a probability p to 
action Aj: 
P(bila ) _ PConnect( a, b~ ) . 
P= Pxn Pxn 
where n is the number of conflict actions in 
state m with lookahead bi. The denominator 
is dearly a normalization factor. 
Step 6 For each shift action A with lookahead a 
in state 0, assign A a probability p = p(al#), 
where "#" is the sentence beginning marker. 
Step 7 Assign a probability p = 1/n to each ac- 
tion. A in state m with lookahead symbol a that 
has not been assigned a probability, where n is 
the number of conflict actions in state m with 
lookahead symbol a. 
Step 8 Return the LR table T produced at the 
completion of Step 7 as the Bi#ram LR table. 
As explained above, the removal of actions at Step 
2 corresponds to the operation of incorporating con- 
nection constraints into an LR table. We call Step 
3 Constraint Propagation, by which the size of the 
LR table is reduced (Tanaka et al., 1994). As many 
lmai and Tanaka 227 A Method of Incorporating Bigram Constraints 
(1) S--*XY (6) A--*al 
(2) X ~ A (7) A--, ae 
(3) X-~AB (S) n-~bl (4) 
Y--,A (9) B--,b2 
(5) Y--*blA 
Figure 4: Grammar G1 
al a2 bl b2 $ 
# 0.6 0.4 0.0 0.0 0.0 
al 0.0 0.0 0.0 1.0 0.0 
a2 0.0 0.0 0.3 0.0 0.7 
bl 0.0 0.1 0.9 0.0 0.0 
b2 0.0 0.0 1.0 0.0 0.0 
Figure 5: Probabilistic connection matrix Mz 
actions are removed from the LR table during Steps 
2 and 3, it becomes possible to compress the LR ta- 
ble in Step 4. We will demonstrate one example of 
this process in the following section. 
It should be noted that the above algorithm can be 
applied to any type of LR table, that is a canonical 
LR table, an LALR table, or an SLR table. 
4 An Example 
4.1 Generating a Big-ram LR Table 
In this section, we will provide a simple example of 
the generation of a bigram LR table by way of ap- 
plying Algorithm 1 to both a CFG and a probabilis- 
tic connection matrix, to create a big'ram LR table. 
Figure 4 and Figure 5 give a sample CFG Gz and a 
probabilistic connection matrix M1, respectively. 
Note that grammar G1 in Figure 4 does not ex- 
plicitly express local connection constraints between 
terminal symbols. Such local connection constraints 
are easily expressed by a matrix M1 as shown in 
Figure 5. 
From the CFG given in Figure 4, we can generate 
an LR table, Table 1, in Step 1 using the conven- 
tional LR table generation algorithm. 
Table 2 is the resultant LR table at the comple- 
tion of Step 2 and Step 3, produced based on Table 1. 
Actions numbered (2) and (3) in Table 2 are those 
which are removed by Step 2 and Step 3, respec- 
tively. 
In state 1 with a lookahead symbol bl, re6 is car- 
ried out after executing action shl in state 0, push- 
ing al onto the stack. Note that al and bl are 
now consecutive, in this order. However, the proba- 
bilistic connection matrix (see Figure 5) does not 
allow such a sequence of terminal symbols, since 
PConnect( al , bl ) = O. Therefore, the action re6 
in state 1 with lookahead bl is removed from Ta- 
ble 1 in Step 2, and thus marked as (2) in Table 2. 
For this same reason, the other re6s in state 1 with 
lookahead symbols al and a$ are also removed from 
Table 1. 
On the other hand, in the case of re6 in state 1 
with lookahead symbol b$, as al can be followed by 
b2 (PConnect(al, b~) ~ 0), action re6 cannot be 
removed. The remaining actions marked as (2} in 
Table 2 should be self-evident to readers. 
Next, we would like to consider the reason why 
action sh9 in state 4 with lookahead al is removed 
from Table 1. In state 9, re6 with lookahead symbol 
$ has already been removed in Step 2, and there is 
no succeeding action for shg. Therefore, action sh9 
in state 3 is removed in Step 3, and hence marked 
as(3). 
Let us consider action re3 in state 8 with looka- 
head al. After this action is carried out, the GLR 
parser goes to state 4 after pushing X onto the stack. 
However, sh9 in state 4 with lookahead al has al- 
ready been removed, and there is no succeeding ac- 
tion for re3. As a result, re3 in state 8 with looka- 
head symbol al is removed in Step 3. Similarly, re9 
in state 7 with lookahead symbol al is also removed 
in Step 3. In this way, the removal of actions prop- 
agates to other removals. This chain of removals is 
called Constraint Propagation, and occurs in Step 
3. Actions removed in Step 3 are marked as (3) in 
Table 2. 
Careful readers will notice that there is now no 
action in state 9 and that it is possible to delete this 
state in Step 4. Table 3 shows the LR table after 
Step 4. 
As a final step, we would like to assign big-ram 
constraints to each action in Table 3. Let us consider 
the two tess in state 6, reached after executing sh6 
in state 4 by pushing a lookahead of bl onto the 
stack. In state 6, P is calculated at Step 5 as shown 
below: 
P = PConnect(bl,a2) +PConnect(bl,bl) 
= 0.1+0.9 
= 1 
We can assign the following probabilities p to each 
re8 in state 6 by way of Step 5: 
PCon,~ect(bl,ae) for re8 with 
p×n = ~ = 0.I lookahead a2 
P = PConnect(bl,bl ) for re8 with 
P×n = ~ = 0.9 lookahead bl 
After assigning a probability to each action in the 
LR table at Step 5, there remain actions without 
probabilities. For example, the two conflict actions 
(re2/sh6) in state 3 with lookahead bl are not as- 
signed a probability. Therefore, each of these ac- 
tions is assigned the same probability, 0.5, in Step 7. 
A probability of 1 is assigned to remaining actions, 
since there is no conflict among them. 
Table 4 shows the final results of applying Algo- 
rithm 1 to G, and M,. 
lmai and Tanaka 228 A Method of Incorporating Bigram Constraints 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
II 
state 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14. 
al 
shl 
re6 
re7 
re2 
sh9 
re8 
re9 
re3 
sh9 
a2 
she 
re6 
re7 
re2 
shlO 
re8 
re9 
re3 
shlO 
action 
bl 
re6 
re7 
ree/sh6 
sh11 
re8 
re9 
re3 
b2 
re6 
re7 
sh7 
i 
$ A 
3 
12 
ace 
re6 
re7 
~4 
re1 
re5 
Table 1: Initial LR table for G1 
goto 
B X 
4 
8 
Y 
13 
S 
5 
state 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14. 
al 
shl 
re6(2) 
reT(e) 
tee(3) 
shg(3) 
re8(2) 
reg(3) 
re3(3) 
shg(3) 
a2 
she 
re6(2) 
re7(2) 
re2 
shlO 
re8 
re9(2) 
re3 
shlO 
action 
bl 
re7 
re2/sh6 
sh11 
re8 
re9 
re3 
b2 $ A 
3 
re6 
re7(2) 
sh7 
12 
ace 
re7 
14 
re4 
re1 
re5 
Table 2: LR table after Steps 2 and 3 
goto 
B X Y S 
4 5 
8 
13 
lmai and Tanaka 229 A Method of Incorporating Bigram Constraints 
state 
0 
1 
2 
3 
4 
5 
6 
7 
8 
10 
11 
12 
i3 
14 
al 
sh'l 
a2 
sh2 
re2 
shlO 
re8 
re3 
sh10 
action 
bl 
re7 
re2/sh6 
shll 
re8 
re9 
re3 
b2 
re6 
sh7 
$ A 
3 
12 
acc 
re7 
14 
re4 
re1 
re5 
Table 3: LR table after Step 4 
goto 
B X 
4 
Y 
13 
S 
5 
state 
0 
1 
.2 
3 
4 
5 
6 
7 
8 
10 
11 
12 
13 
14 
al 
shl 
0.6 
a2 
sh2 
0.4 
re2 
1.0 
shlO 
1.0 
re8 
0.1 
re3 
1.0 
shlO 
1.0 
action 
bl b2 $ 
re6 
1.0 
re7 
1.0 
re2/sh6 Sh 7 
0.5/0.5 1.o 
shll 
1.0 
re8 
0.9 
re9 
1.0 
re3 
1.0 
acc 
1.0 
re7 
1.0 
re4 
1.0 
tel 
1.0 
re5 
1.0 
A 
3 
goto 
B X Y 
8 
12 13 
14 
Table 4: The Bigram LR table constructed by Algorithm 1 
II 
II 
II 
II 
II 
II 
I! 
II 
k 
lmai and Tanaka 230 A Method of Incorporating Bigram Constraints 
k 
4.2 Comparison of Language Models 
Using the bigram LR table as shown in Table 4, the 
probability P1 of the string "a2 bl ag' is calculated 
as: 
P1 = P(a2 bl ae) 
T 
= E P(Tree0 
i=l 
= P(O, ae, she) x P(2, bl,re7) 
xP(3, bI, re._~2) x P(4, bl, sh11) 
xP(ll, a2, shlO) x P(10, $, re7) 
xP(14, $, re5) x P(13, $, re1 ) 
×P(5, $, acc) 
+ P(O, a2, she) x P(2, bl, re7) 
xP(3, bl, sh_66) x P(6, a2, reS) 
xP(8, ae, re3) x P(4, ae, shlO) 
xP(10, $, re7) x P(12, $, re4 ) 
xP(13,$, re1) x P(5, $, acc) 
= 0.4 x 1.0 x 0.5 x 1.0 x 1.0 
x 1.0 x 1.0 x 1.0 x 1.0 
+ 0.4 x 1.0 x 0.5 x 0.1 x 1.0 
x 1.0 x 1.0 x 1.0 x 1.0 x 1.0 
= 0.2 + 0.02 
= 0.22 
where P(Treei) means the probability of the i- 
th parsing tree generated by the GLR parser and 
P(S,L,A) means the probability of an action A in 
state S with Iookahead L. 
On the other hand, using only bigram constraints, 
the probability P2 of the string "ae b1 a,~' is calcu- 
lated as: 
P2 = P(a2 bl a2) 
= P(ael#) × P(bllae) x P(aelbl) × P($1ae) 
= x0.3 x0.1 x0.7 
= 0.0084 
The reason why P1 > P2 can be explained as 
follows. Consider the beginning symbol a2 of a sen- 
tence. In the case of the bigram model, a2 can only 
be followed by either of the two symbols bl and $ 
(see Figure 5). However, consulting the bigram LR 
table reveals that in state 0 with lookahead ae, she 
is carried out, entering state 2. State 2 has only 
one action re7 with lookahead symbol bl. In other 
words, in state 2, $ is not predicted as a succeeding 
symbol of al. The exclusion of an ungrammatical 
prediction in $ makes P1 larger than P2. 
Perplexity is a measure of the complexity of a lan- 
guage model. The larger the probability of the lan- 
guage model, the smaller the perplexity of the lan- 
guage model. The above result (P1 > P2) indicates 
I Language model I Perplexity I 
Bigram 6.50 
BigraJfi LR table 5.99 
Trigram 4.92 
Table 5: Perplexity of language models 
that the bigram LR table model gives smaller per- 
plexity than the bigram model. In the next section, 
we will demonstrate this fact. 
5 Evaluation of Perplexity 
Perplexity is a measure of the constraint imposed 
by the language model. Test-set perplexity (Jelinek, 
1990) is commonly used to measure the perplexity of 
a language model from a test-set. Test-set perplexity 
for a language model L is simply the geometric mean 
of probabilities defined by: 
where 
Q(L) = 2//(L) 
1 M 
S(L) =  logP(S,) 
i=l 
Here N is the number of terminal symbols in the test 
set, M is the number of test sentences and P(S,) is 
the probability of generating i-th test sentence Si. 
In the case of the bigram model, P~i(Si) is: 
Pb~(&) = P(zl,ze,...,z.) 
= P(zII#)P(zeIzl)...P(x~Iz._I)P($1..) 
And in the case of the trigram model, P~(Si) is: 
Pt~i(S~) = P(xl,ze,...,z,) 
= P(xll#)P(xel#,xl)... 
P(x.Ix.-2, x.-1)P($lx.-a, x.) 
Table 5 shows the test-set perplexity of pretermi- 
rials for each language model. Here the preterminal 
bigram models were trained on a corpus with 20663 
sentences, containing 230927 preterminals. The test- 
set consists of 1320 sentences, which contain 13311 
preterminals. The CFG used is a phrase context- 
free grammar used in speech recognition tasks, and 
the number of rules and preterminals is 777 and 407, 
respectively. 
As is evident from Table 5, the use of a bigram 
LR table decreases the test-set perplexity from 6.50 
to 5.99. Note that in this experiment, we used 
the LALR table generation algorithm 2 to construct 
the bigram LR table. Despite the disadvantages of 
2In the case of LALR tables, the sum of the proba- 
bihties of all the possible parsing trees generated by a 
given CFG may be less than 1 (Inui et al., 1997). 
mai and Tanaka 231 A Method of Incorporating Bigram Constraints 
LALR tables, the bigram LR table has better per- 
formance than the simple bigram language model, 
showing the effectiveness of a bigram LR table. 
On the other hand, the perplexity of the trigram 
language model is smaller than that of the bigram 
LR table. However, with regard to data sparseness, 
the bigram LR table is better than the trigram lan- 
guage model because bigram constraints are more 
easily acquired from a given corpus than trigram 
constraints. 
Although the experiment described above is 
concerned with natural language processing, our 
method is also applicable to speech recognition. 
6 Conclusions 
In this paper, we described a method to construct a 
bigram LR table, and then discussed the advantage 
of our method, comparing our method to the bigram 
and trigram language models. The principle advan- 
tage over the bigram language model is that, in using 
a bigram LR table, we can combine both local prob- 
abilistic connection constraints (bigram constraints) 
and global constraints (CFG). 
Our method is applicable not only to natural lan- 
guage processing but also speech recognition. We 
are currently testing our method using a large- 
sized grammar containing dictionary rules for speech 
recognition. 
Su et al. (Suet al., 1991) and Chiang et al. (Chi- 
ang et al., 1995) have proposed a very interesting 
corpus-based natural language processing method 
that takes account not only of lexical, syntactic, 
and semantic scores concurrently, but also context- 
sensitivity in the language model. However, their 
method seems to suffer from difficulty in acquiring 
probabilities from a given corpus. 
Wright (Wright, 1990) developed a method of dis- 
tributing the probability of each PCFG rule to each 
action in an LR table. However, this method only 
calculates syntactic scores of parsing trees based on 
a context-free framework. 
Briscoe and Carroll (Briscoe and Carroll., 1993) 
attempt to incorporate probabilities into an LR ta- 
ble. They insist that the resultant probabilistic 
LR table can include probabilities with context- 
sensitivity. Inui et. al. (Inni et al., 1997)reported 
that the resultant probabilistic LR table has a defect 
in terms of the process used to normalize probabili- 
ties associated with each action in the LR table. 
• Finally, we would like to mention that Klavans 
and Resnik (Klavaus and Resnik, 1996) have ad- 
vocated a similar approach to ours which combines 
symbolic and statistical constraints, CFG and bi- 
gram constraints. 
Acknowledgements 
We would like to thank Mr. Toshiyuki Takezawa and 
Mr. Junji Etoh for providing us the dialog corpus 
and the grammar for our experiments. We would 
also like to thank Mr. Timothy Baldwin for his help 
in writing this paper. 

References 
A.V. Aho, S. Ravi, and J.D. UUman. 1986. Com- 
pilers: Principle, Techniques, and Tools. Addison 
Wesley. 
T. Briscoe and J. Carroll. 1993. Generalized proba- 
bilistic LR parsing of natural language (corpora) 
with unification-based grammars. Computational 
Linguistics, 19(1):25-59. 
T.H. Chiang, Y.C. Lin, and K.Y. Su. 1995. Robust 
learning, smoothing, and parameter tying on syn- 
tactic ambiguity resolution. Computational Lin- guistics, 
21(3):321-349. 
A. Franz. 1996. Automatic Ambiguity Resolution in 
Natural Language Processing. Springer. 
T. Fujisaki, F. Jelinek, J. Cocke, E. Black, and 
T. Nishino. 1991. A probabilistic parsing method 
for sentence disambiguation. In M. Tomita, edi- 
tor, Current Issues in Parsing Technologies, pages 
139-152. Kluwer Academic Publishers. 
K. Inui, V. Sornlertlamvanich, H. Tanaka, and 
T. Tokunaga. 1997. A new formalization of prob- 
abilistic GLR parsing. In International Workshop 
on Parsing Technologies. 
F. Jelinek. 1990. Self-organized language mod- 
eling for speech recognition. In A. Waibel and 
K.F. Lee, editors, Readings in Speech Recognition, 
pages 450-506. Morgan Kauhnann. 
J.L. Klavans and P. Resnik. 1996. The Balanc- 
ing Act: Combining Symbolic and S~atistial Ap- 
proaches to Language. The MIT Press. 
D.E. Knuth. 1965. On the translation of languages 
left to right. Information and Control, 8(6):607- 
639. 
K.F. Lee. 1989. Automatic Speech Recognition: 
The Development of the SPHINX System. Kluwer 
Academic Publishers. 
K.Y. Su, J.N. Wang, M.H. Su, and J.S. Chang. 1991. 
GLR parsing with scoring. In M. Tomita, editor, 
Generalized LR Parsing. Kluwer Academic Pub- 
fishers. 
H. Tanaka, H. Li, and T. Tokunaga. 1994. Incor- 
poration of phoneme-context-dependence into LR 
table through constraints propagation method. In 
Workshop on Integration of Natural Language and 
Speech Processing, pages 15-22. 
H. Tanaka, T. Takezawa, and J. Etoh. 1997. 
Japanese grammar for speech recognition consid- 
ering the MSLR method. In Information Process- 
ing Society of Japan, SIG-SLP-15, pages 145-150. 
(in Japanese). 
M. Tomita. 1986. Efficient Parsing for Natural Lan- 
guage: A Fast Algorithm for Practical Systems. 
Kluwer' Academic Publishers. 
J.I-I. Wright. 1990. LR parsing of probabilistic 
grammars with input uncertainty for speech recog- 
nition. Computer Speech and Language, 4(4):297- 
323. 
