S|,M~TIC AND Sg~A6TIC ~1~ OF ~ FUNCTION 
Keh-Yih SU* and Jing-Shin CHANG** 
*Department of Electrical Engineering 
National Tsing Hua University, Hsinehu, Taiwan, R.O.C. 
**BTC R&D Center, P~D Road If, No. 28, 2nd Fleer 
Hsinchu Science-Based Industrial Park, Hsinchu, Taiwan, R.O.C. 
Abstract 
In a Machine Translation System (MTS), the number 
of possible analyses for a given sentence is largely 
dve to the ambiguous characteristics of the source 
language. In this paper, a mechanism, called "Score 
Function", is proposed for measuring the "quality" of 
the ambiguous syntax trees such that the one that best 
fits interpretation by human is selected. It is 
featured by incorporating the objectiveness of the 
probability theory and the subjective expertise of 
linguists. The underlying uncertainty that is funda- 
mental to \]inguistic knowledge is also allowed to be 
incorporated into this system. This feature proposes 
an easy resolution to select the best syntax tree and 
provides some strategic advantages for scored parsing. 
The linguists can also be relieved of the necessity to 
describe the language in strictly "correct" linguistic 
rules, which, if not impossible, is a very hard task. 
Motivation 
In a Machine Translation System (Mrs), where the 
underlying grammar is large, there are many sources 
which may cause the system to become highly ambiguous. 
The system must choose a better syntax tree among all 
the possible ones to reduce the load of the post- 
editor. Some systems attack this problem by arranging 
the gram,~r rules in a descending order of their rela- 
tive frequency, following the .parsing paths in a 
depth-first manner, and selecting the first syntax 
tree successfully parsed as the desired one. However, 
rule ordering is just a locally preferred static scor- 
ing of the rule usage. Therefore, the possibility is 
small that the first tree selected is the correct one. 
Several MT systems based on the ATN formalism \[Wood 
70\] adopt another approach. They impose condition 
cheeks to prevent the parser from trying all possible 
states allowed by the underlying grammar. This 
approach has been widely accepted and is useful in 
eliminating the unnecessary trials. However, there are 
times when legal paths are blocked inadvertently by 
condition checks. Therefore, the system must be tuned 
fre.~luently to achieve an equilibrium between the 
over-generative grammar and the over-restrictive con- 
dition checks. This kind of "hard rejection" is obvi- 
ously too variant and too restrictive. 
A better solution is to adopt the "Truncation 
Strategy" (proposed by \[Su 87a, 87b\] for biT system) to 
restrict the number of parsing paths to be tried 
according to the relative preference of all the possi- 
ble paths. The measuring mechanism of preference for 
the truncation strategy is called the "Score Func- 
tion". It bears similaritY to the select-by-preference 
found in other scored MT systems like the DIAGPCLM 
grammar system \[Robi 82\] and METAL system \[Benn 82\]. 
Under a scoring mechanism, the parsing paths are not 
rejected because of the over-restrictive condition 
checks but rather for their low scores. This kind of 
"soft-rejection" prevents legal path from being 
blocked too early because of unsuitable condition 
checks. Different scoring mechanisms may be required 
at lexicon, syntax and semantics levels, and score can 
be computed during parsing or after parsing. In this 
paper, we propose an approach to the semantic and syn- 
tactic aspects of the score function. 
642 
Criteria for Score Function 
In order to define a reasonable score function, 
it is essential to set up some criteria first. Eight 
basic criteria are listed here. 
\[ I\] 'l~le score function should reflect the absolute 
degree of preference of two ambiguous (sub)trees 
as well as their relative preferences. 
\[2\] A good score function should be applicable either 
locally to a subtree or globally to a complete 
tree. 
\[3\] The Score function should be compositional. This 
means the score of a tree should be directly 
evaluated from the scores of its constituent sub- 
trees. 
\[4\] Relative rule application frequency should be 
included in the score function. The rule that is 
used most frequently should receive a higher 
preference. 
\[5\] The score function should also include the seman- 
tic information embedded in the sentence, so that 
the semantic preference can be involved in the 
score function. (Since our present translation 
unit is a single sentence, no discourse informa- 
tion need to be included) 
\[6\] The implementation of the score function should 
not be too complicated. In our case, it should be 
practical for a large-scale bit system. 
\[7\] The database for score computation should be easy 
to build and easy to maintain. 
\[8\] The preference order of ambiguous trees assigned 
by the score function should match those assigned 
by the human. In addition, the way the scores are 
given had better match the way that people give 
their preference to the ambiguous trees. (i.e. how 
people recognize the true meaning of a given sen- 
tence from several different interpretations) 
Keeping these criteria in mind, we define a score 
function as follows. The score function for a subtree 
Xo, with derivation sequence D of Xo(i,j) =D=> 
Xl(i,jl), X2(jI+I,j2) .... Xn(jn-l÷l,j), .is : 
SCORE ( Xo ) 
= SCsyn\[Xl ... Xn\] 
* SCsem\[(XI,KI(XI),KC(XI)) ... (Xn,KI(Xn),KC(Xn))\] 
In the above, Xo(i,j) is a subtree made up of 
terminals X1 to Xnl i to j are the word index in the 
sentence; and SCORE is the score of the subtree Xo. 
SCsyn is the ttnweighted syntax score. SCsem is the 
semantic weighting. KI is defined as the knowledge 
about the inherent properties of the nodes. And KC is 
the well-formedness condition, either syntactic or 
semantic, of the Xi under the given syntactic con- 
struction. To decrease the computational complexity, 
we can convert this multiplication equati'on into an 
addition equation with logarithmic entries. 
log(S~(Xo)) = log(SCsyn) + log(SCsem) 
In order to obtain the score without excessive 
c~irputation and complicated algorithal, the probability 
model is probably one of the most c~n and promising 
approach. Under this approach, the preference measure- 
ment in a scoring mechanism can be seen as a probabil- 
ity assigDment. The best syntax tree should be the 
one with highest preference probability assigned to 
it. This probability model c~m be divided into two 
parts. One is the syntactic score model, which is 
SCsyn, an~ the other is the semantic score model, 
which is SCsem. The syntactic score model uses the 
syntax probability as the base to generate an 
unweighted syntactic score for each syntax tree. The 
semantic ~:core model then supplements the unweighted 
score witl~ weights derived from the semantic 
blowledge. Incorporation of semantic information is 
essential for a good score function because pure syn- 
t~tx probability can only provide partial information 
for sentence preference. 
Syntactic \[~zore Model 
For a syntax tree given below, we define a phrase 
level as a sequence of terminals and nonterminals that 
are being reduced at a single step of "derivation, or 
reduction sequence". The following example shows the 
reduction sequence of a bottom-up parsing. The 
sequence iE: indicated by the time series t\] .... t7 . 
:.\ X8 = { A } 
\\\t7 X7 = { B, C } 
X5 = { B, F, w4 } 
6 X4 = { B, W3, w4 } 
X3 = { D, E, w3, w4 } 
I ' X2 = { D, w2, w3, w4 } 
\[l tll~ i2'' t4 Ii X1 = { we, w2, w3, w4 } I t5 
'i~e unweighted score for this tree A is modeled as the 
following conditional probability. 
SCsyn(A) 
= P(X81X7 .... Xl)*P(X7:X6,...XI)* ... *P(X2:XI) 
P(X8:X7)*P(X7:X6)* ... *P(X2:XI) 
= P(A',BC)*P(BCIBFO)* ... *P(D,w2,w3,w41wl,w2,w3,w4) 
= P(A:P/~)*P(CIBFG)* ... *P(D:wl,w2,w3,w4\] 
An assumption was made in the above equation. We 
assumed terms like P(Xi:Xi-l, Xi-2, ... Xl) can be 
simplified into P(XilXi-l). This is reasonable because 
at phrase level Xi-I it will contains most of the 
information percolated from lower levels and needed by 
Xi. So, extra information needed by Xi from Xi-2 is 
little. We completed a simulation for testing this 
m(xlel and also conducted several tests on the context 
sensitivity of this probability model. First, we 
checked whether a left context (i.e. L) is relevant to 
the probability assignment. Using the 
P(X3:X2)=P(E}D,w2,w3,w4) as an example, with D as the 
left context of t/~e current derivation symbol w2, we 
checked if P(X31X2)=P(E:D,w2) is true? We also checked 
whether a )right context (i.e. R) has influence on the 
assigrmlent ~ Or is P(X3 :X2 ) =P (E: w2, w3) true? Other 
test cases are LL, LR, RR, LRR, LLR, LLL, RRR, LLRR 
and LLLR. 
Semsntic ~re Model 
The weight-assigning process of the semantic 
score can /~ seen as an expert task whs'~ the linguist 
is giving ~he syntax tree a diagnosis. The linguist, 
will assign a preference to a tree according to some 
linguistic knowledge or heuristic rules. Very often 
these linguistic rules are not very precise. There- 
fore, a good semantic score model must allow this type 
of inexact knowledge. Now, the problem is transformed 
into building a rule-based expert system that can cal- 
culate sem~mtic scores (weightings) and handle inexact 
knowledge encountered during calculation. We propose 
a model similar to the CF model (certainty factor 
model } in MYCIN \[ Buch 85 \] system. It has a 
knowledge-rule base where each rule has a certainty 
factor based on the degree of belief and disbelief. 
The confirmation of a hypothesis then is calculated 
from the applicable rules and from other pieces of 
evidence. The CF of' a hypothesis is then accumulated 
gradually with each additional evidence. 
Each tree node will have a we\]l-formedness factor 
(WFF), which is the CF for the derivation of this 
node, associated with it. As the knowledge, which may 
contain the word sense, syntactic category, attribute, 
etc., of leaf nodes propagates up along the syntax 
structure, every node's WFF will be calculated accord- 
ing to the rules stored in the lu~ow\]edge rule-base. 
This WFF then becomes the semantic score of the sub- 
tree. 
WFF(Xo) = SCsem\[ (XI,KI(XI),KC(XI) .. (Xn,KI(Xn),KC{Xn) \] 
where derivation sequence D : Xo =D=> XI, .. Xn. 
There are three major advantages of this scheme. 
First, linguists do not have to write a single exact 
rule to include all possible exceptions, because CF 
are given in accordance with its degree of confirma- 
tion or disconfirmation. When an exception appears, 
all that needs to be done is to add necessary rules 
and alter CF of certain existing rules. Second, the 
CF model simplifies the implementation of "soft- 
rejection" for inexact knowledge. For example, condi- 
tions (like those in A'I~) can be included for disambi- 
guation even if it is not absolute in its generality. 
¢l%lird, we can combine various traditiorml techniques 
in analyzing semantics with CF model to construct a 
uniform and flexible control strategy. This allows 
the inclusion of uncertain factors like sen~mtic 
marker of lexicon, assignment of case ro\]e \[from case 
gra/mnar), and restriction of case filler. Under this 
control strategy, word sense disamb~guation and struc-- 
ture disambiguation are also possible. The relative 
preference will be given accerding to the CF associ- 
ated with different word sense and by the \] ~nguistJc 
rules from the knowledge base. 
All in all, with the score function defined as 
above, it satisfies all eight criteria we had set ini- 
tially and it is a good systematic approach for 
assigning references to a set of ambiguous trees. 
Simulation l~esult 
A simulation, based on \[408 source sentences, was 
conducted to test the syntactic score mode\]. The pro~ 
bability assigned to the entries ,e.g. P(E:w2,w3), in 
the SCsyn equation is estimated with the relative fre- 
quency of these entries. That is, we approximate 
P(E :w2,w3 ) by the ratio of the number of events 
{E,w2,w3} in the database and the number of events 
{w2,w3}. Several tests are conducted to check the 
influence of the context on the probability assign- 
ment. These tests include L, R, LL, LR, RR, LLL, LLR, 
LRR, RR~, LLRR and LLLR. Table 1 is some of the 
result of the simulation using sentences in the data- 
base as the test inputs. 
The number of entries in the t~ble is the number 
of different conditional probability, e.g. P(Elw2,w3), 
in the database. F~ch entry is assigned a probability 
according to its usage frequency as we explained 
before. The preference of a tree is the parmneter that 
we want to estimate from these entries. If the size 
of database is not large enough then these probability 
643 
Table 1 : Some results of the syntactic score simulation. 
+ ......................................... 
I size of database (sentences): 820 
I No. of sample test sentences = 52 + ...... + ....... + .......................... 
Rank count accumulative percentage ...................................... 
1 42 80.77% 
2 8 96,15% 
6 2 100,00% 
context : LL No, of entries : 2966 .......................................... 
Rard\[ count accumulative percentage p ...................................... 
1 45 86.54% 
2 6 98,08% 
4 1 100.00% 
context : LRR No. of entries : 4574 .......................................... 
Rank count accumulative percentage ...... ................................ 
1 45 86.54% 
2 6 98.08% 
4 1 100.00% 
, context : LLRR No. of entries : 6285 ......................................... 
+ ........................................ 
I size of database (sentences): 1468 
: No. of sample test sentences = 97 + ...... + ....... + ......................... 
I Rank count accumulative percentage ' ...................................... 
1 76 78.35% 
2 15 93.81% 
3 3 96.91% 
4 1 97.94% 
6 2 100.00% ......................................... 
context : LL No. of entries : 4187 ......................................... 
Rank count accumulative percentage 
I ...................................... 
1 83 85.57% 
2 11 96.91% 
3 2 98,97% 
4 1 100.00% 
context : ~ No. of entries : 6560 ......................................... 
Rank count accumulative percentage ...................................... 
1 85 87.63% 
2 9 96.91% 
3 2 98.97% 
4 1 100.00% ......................................... 
context : LLI~ No. of entries : 9224 ' ......................................... 
can not be approximated by the relative frequency. In 
general, as %.he size of a database increases so is the 
accuracy of approximation. But how big should the 
dai~base be is diffJeu\]t to determine. This leads us 
to built two databases, one having 1468 source sen- 
tences and the other having 820 sentences. If the 
simulation result from different base is close then we 
may assume that the databdlse size is large enough. 
Comparing the results from these two databases, its 
is apparent that the size is adequate for the present 
simulation. Furthermore, it is also apparent that a 
context-sensitive scoring function must be adopted for 
a good preference estimation. 
Two conclusions can be drawn from this simulation 
resu\]t. First, we should adopt three constituents in 
c~Iculating the probability. The reason is that 
although the result of LLRR case is better than that 
of LRIt case, the size of entries required by LLRR is 
considerab\]e greater. Second, approximately 85% of 
syntax trees is accurately selected with only syntac- 
tic information available. Therefore, if we want to 
improve this result further we must include the seman- 
tic information. 
Conclusion and Perspective 
In a Machine Translation System, to reduce the 
load of the post-editor we must select the best syntax 
tree from a set of ambiguous trees and pass it to the 
post-editor. There are systems that rely on a set of 
ordered grammar rules or on a set of restrictive con- 
dition checks to achieve this. Unfortunately9 they 
all }lave some drawbacks: one being too uncertain and 
the other being too restrictive. In this paper we 
have proposed a score mechanism for the truncation 
strategy to perform disambiguation during parsing. The 
score function, with the adoption of three context 
symbols, gives the power of context-sensitive grammar 
to an efficient context-free parser. From our simula- 
tion, the score function with just syntactic informa- 
tion will achieve an accuracy rate of 85%. In the 
near future when the semantic information is included, 
this accuracy rate is expected to increase. Currently, 
two databases, one for unweighted score computation 
and the other for linguistic rule base (for weighting 
assignment), are under the development at the BTC R&D 
center. After completion they will be incorporated 
into the truncation parsing algorithm for our third 
generation parser. 
Acknowledgment 
We would like to express our deepest appreciation 
to Wen-t%~eh Li and Hsue-Hueh Hsu for their work on the 
simulations, to the whole linguistic group at BTC R&I) 
center for their work on the database, and Mei-Hui Su 
for her editing. Special thanks are given to Behavior 
Tech. Computer Co. for their full financial support of 
this project. 

References 

\[Benn 85\] Bennett, W.S. and J. Slocum, "The LRC 
Machine Translation System~" Computational 
Linguistics, vol.ll, No. 2-3, pp. lll-ll9, 
ACL, Apr.-Sep. 1985. 

\[Buch 85\] Buchanan B.G. and E,H. Sortliffe(eds), 
RULE-BASED EXP~T SYSTEMS. Reading, MA: 
Addison-Wesley, 1984. 

\[Robi 82\] Robinson, J.J., "DI~DRAM : A Grammar for 
Dialogues," CAGM, voi.25, No.l, pp.27-47, 
ACM, Jan. 1982. 

\[Su 87a\] Su, K.Y., J.S. Chang, and H.H. Hsu, "A 
Powerful Language Processing System for 
English-Chinese Machine Translation," 1987 
Int. Conf. on Chinese and Oriental Language 
Computing, pp.260-264, Chicago, Ill, 1987. 

\[Su 87b\] Su, K.Y., J.N. Wang, W.H. Li, and J.S. 
Chang, "A New Parsing Strategy in Natural 
Language Processing Based on the Truncation 
Algorithm'*, pp. 580-586 Proc. of Natl. Com- 
puter Symposium (NCS) 19879 Taipei, H.O.C.. 

\[Wood 70\] Woods, W.A., "Transition Network Grammars 
for Natural Language Analysis," CACM, 
vol.13., No.lO, pp.591-606, ACM, Oct. 1970.
