An Empirical Study on Rule Granularity and 
Unification Interleaving Toward an Efficient 
Unification-Based Parsing System 
Masaaki NAGNFA 
ATR Interpreting Telephony Research Laboratories 
2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-{}2, JAI'AN 
nagat a@atr-la.atr.co.jp 
Abstract 
This paper describes an empirical study on the op- 
timal granularity of the phrase structure rules and 
the optimal strategy for interleaving CFG parsing 
with unification in order to implement an eltlcient 
unification-based parsing system. We claim that us- 
ing "medium-grained" CFG phrase structure rules, 
which balance tile computational cost of CI?G parsing 
and unification, are a cost-effective solution for mak- 
ing unification-based grammar both efficicnt and easy 
to maintain. We also claim that "late unification", 
which delays unification until a complete CI"G parse 
is found, saves unnecessary copies of DAGs for ir- 
relevant subparses and improves performance signifi- 
cantly. The effectiveness of these methods was proved 
in an extensive experiment. The results show that, on 
average, the proposed system parses 3.5 times faster 
than our previous one. The grammar and the parser 
described in this paper are fully implemented and 
ased as the .lapmmse analysis module in SL-TRANS, 
the speech-to-speech translation system of ATR. 
1 Introduction 
Uuifieation-based framework bins been an area of ac- 
tive research in natural language processing. Unifi- 
cation, wbich is the primary operation of ibis frame.- 
work, provides a kind of constraint-checking mecha- 
nism for nlerging varioas information sources, sllcb 
as syntax, semantics, and pragmatics. The computa- 
tional inefficiency of unification, however, precludes 
tile development of large practical NLP systems, al- 
though the framework has many attractiw~ theoretical 
properties. 
The efforts made to improve tile efficiency of a 
uriitication-ba.sed parsing system can be classified into 
four categories. 
• CFG parsing algorithm 
• Graph unification algorithm 
• Granunar representation and organizati(m 
• Interaction between CFG parsing and unilication 
There bave been well-known efficient CFG parsing 
algorithms such as CKY \[Aho mid UllHnm, 77\], Ear~ 
ley \[Earley, 70\], CtIAffl' (Kay, 80\], eatd I,R \[Aho and 
Ullmaa L 77\] \['t'omita, 86\]. There have also been sev- 
eral recent in-depth studies into efficient graph uni- 
fication algoritbms, whose main concerns have been 
either avoiding irrelevant copies of l)AGs \[Karttunen 
and Kay, 85\] \[Pereira, 85\] \[Karttun .... 86\] \[Wrob- 
lewski, 87\] \[Godden, 90\] \[Kogure, 90\] \[Tomabechi, 
91\] \[E1aele, 91\], or the exhaustive expansion of dis- 
junctions into their disjunctive normal forms \[Kasper, 
87\] \[Eisele mad l)Srre, 88\] \[Maxwell and Kaplmh 89\] 
\[l)arre and l~i,~ele, 90\] ickier, 901 \[Nat ...... 91\]. 
There has, however, been litth: discussion regarding 
the optimal representation of a grammar, or linguis- 
tic knowledge, in the unification-based framework, 
from tile engineering point of view. Grammar or- 
ganization is highly flexible, as tile unification-based 
framework uses two different forms of knowledge rep- 
resentation; atomic phrase structure rules and feature 
structure descriptions. Method selection greatly at" 
facts both the computational elficieney and the maiu- 
tenauce cost of the system. There luL~ also been little 
discussion regarding optimal interaction between the 
CFG parsing process and the unification process in 
unificatlon-based parsing, which also greatly af|~ct~; 
overall performance. 
Here we introduce the notion of granularity, and 
suggest mcdium-gra~ued phrase structure rules, in 
which morph~.syutactic specifications in the teature 
descriptious are expallded into phrase structure rules. 
We claim that it reduce the computational loads of 
unification without intractably increasing tim lmulber 
of rules, and it is optimal ill tile sense that it satis 
ties both ettleiency and maintainability. We also sug- 
gest late unification as another ~lution to tim COl)y- 
lug problem, as it avoids unnecessary copies of irrel 
evant subparses by delaying unification mttil a COln- 
ph:te CI,'G parse is found. 
In tile following sections, the design and iml)lemen 
tatiun of tim medimn-grained phrase structure rules 
in explailmd, then the implementation of the late uni: 
tication is illustrated, anti finally the elfectiveness of 
the proposed nlethods is proven in experiments. 
ACrF.S DE COLING-92, NANTES, 23-28 AO~rf 1992 1 7 7 Pat~. OF COt.lNG-92, NAN'ri~.S, Aua. 23-28, 1992 
Granularity of Phrase Constraints in Phrase Constraints in Nmnber of Phrase 
Structure Rules Structure \]Eules Feature Descriptions Structure Rules 
Extremely-Coarse-Grained weak very strong . 1 ~ 10 
Coarse-Grained medium strong 10 ~ 100 
Medium-Grained strong medimn 100 ~ 1000 
Fine-Grained very strong weak 1000 
Table 1: Granularity of phrase structure rules characterized by tile number of rules and the strength of linguistic 
constraints in tile phrase structure rules aJtd the feature descriptions 
2 The Granularity of Phrase 
Structure Rules 
2.1 Granularity 
Phrase structure rule granularity has been intro- 
duced to refer to the amount of linguistic constraints 
specified in the atomic CFG phrase structures rules 
without annotations. The rule granularity spectrum 
has been classified into four categories as shown in 
Table 1, using the number of grammar rules ms a rues- 
sure. 
Unification-based grammars, in general, are char- 
acterized by a few general annotated pbrase structure 
rules, and a lexicon with specific linguistic descrip- 
tions. This is especially true for HPSG \[Pollard and 
Sag, 87\] and JPSG \[Gunji, 87\], which are to be cat- 
egorized as extremely-coarse grained, as they drasti- 
cally reduce the nmnber of phrase structure rules into 
two for English and one for Japanese, respectively. In 
these frameworks, the only role of the phrase struc- 
ture rules is to provide a device for combining a head 
with its complement. Most linguistic constraints are 
stored in the feature descriptions. 
Coarse-grained rules have been characterized as 
a grammar consisting of atonfic phrase structure 
rules with medium constraints, and feature descrip 
tions with strong constraints. Medium-grained rules 
have been characterized as a grammar consisting 
of atomic phrase structure rules witb strong con- 
straints, and feature descriptions with mediuln con- 
straints. Medium-grained rules differ from coarse- 
grained rules in that they include morpho~syntax in 
the phrase structure rules, while coarse-grained rules 
include them in the feature descriptions. This means 
that medium-grained rules are strong enough to de- 
rive syntactic structures from atomic phrase structure 
rules without feature descriptions. 
Grammars for conventional NLP systems using 
simple or augmented CFG fall into the category of 
fine-grained rules, which represent most of linguistic 
constraints as CFG phrase structure rules, and the 
number of rules usually amounts to an intractable 
number of several thousands for practical applica- 
tions. 
2.2 Maintainability and Efficiency 
In unification-based framework, a linguistic con- 
straint cart either be described as atomic context-flee 
phrase structure rules, or as feature descriptions in 
annotations and lexical entries. As the number of 
atomic phrase structure rules decreases, the number 
of feature descriptions increases. 
It is true that the lexieo-syntactic approach makes 
tile granunar modular and improves its maintainabil- 
ity by reducing the number of rules. However, it must 
be noted that the computational cost of disjunctive 
feature structure unification, in the worst ease, is ex- 
ponential in the nmnber of disjunctions \[Kasper, 87\], 
whereas tile cost of CFG parsing is o(N s) in the input 
length N. Therefore, extreme rule reduction results 
in inefficiency. This overwhelms the benefits of the 
maintainability of the reduced number of rules since 
grammar development is essentially a trial-and-error 
process and requires a short turn-around time. How- 
ever, the cost for CFG parsing also increases as the 
number of rules increases. Therefore, we must chose 
tile granularity so that the reduction in unification 
cost outweighs tile increase in CFG parsing cost, in 
order to gain overall etfieiency. 
3 The HPSG-Based Japanese 
Grammars 
In this section, we illustrate the difference between 
"coarse-grained" rules and "medium-grained" rules 
using our HPSG-based spoken-style Japanese grarn- 
lnal'S as an exaluple, 
We have developed two unification-based gram- 
mars with different granularity l, which are essen- 
tially based on tIPSG and its application to Japanese 
(JPSG), for the analysis module \[Nagata and Kogure, 
90\] of an experimental Japanese-to-English speech-to- 
speech translation system (SL-TRANS) \[Morimoto et 
al., 90\]. 
We have selected the "secretarial service of an inter- 
national conference registration" as our task domain, 
in which a conversation between a secretary and a 
questioner is carried out. Tile Japanese grauunars~ 
however, ~tre not task-specific, but rather general- 
purpose OlleSj which cover a wide range of pllenonl- 
I Historically speaking, we fil~t developed coarse-grained 
rules &lid then we nlallllally tl'al|sfonned them into medium- 
grldned rules for e|licicncy. 
ACTES DE COL1NG-92, NAmes, 23-28 hotyr 1992 1 7 8 PRoc. OF COLING-92, NANTES, AUG. 23-28, 1992 
ena at ruazly linguistic levels from syntax, and seman 
tics, to pragmatics using typcd feature structure de- 
scriptions. The linguistic phenomena covered in these 
grammars include: 
• l,'undamental Constructions: causative, passive, 
benefactive, negation, interrogative, etc., 
• Control and Gaps: subject/object control, 
• Unbounded Dependencies: topic, relative, 
• Word Order Variation and Ellipsis. 
3.1 Coarse-Grained Rules vs. 
Medium-Grained Rules 
The coarse-grained HPSG-based Japanese grammar 
has about 20 generalized phrase structure rules, while 
the medium-grained grarmnar has about 200 phrase 
structure rules. Both gra, lnmars use the same lexicon 
with a vocabulary of about 400. ~ 
In the coarse-grained grarmnar, phrase structure 
rules only refer to the relative position l)etween the 
five basic syntactic categories for Japanese: verb (V), 
noun (N), adverb (ADV), postposition (P), and at- 
tributive (ATT). Most of the specific linguistic infor- 
mation is encoded as feature descriptions in either 
the annotation of the l)hrase structure rules or the 
lexical entries. In principle, there is no distinction 
as to whether a constituent is lexical or phrasal, and 
no subcategories of the 5 basic categories. This con- 
tributes greatly to the reduction in the numbcr of 
phrase structure rules, which results in better gram- 
mar maintainability. We present all the phrase struc- 
ture rules of the coarse-grained Japanese grammar in 
Appendix A. 
It has been noticed that the extensive use of dis-- 
junctions in feature descriptions, which results from 
the reduction of the number of phrase structure rules, 
is the main cause of incfficieney in the coarse-grained 
version of the grammar. The three major sources of 
disjunctions are, lnorpho-syntactic specifications for 
diverse expressions in the final part of the sentence, 
frec word order and ellipsis of verb complements (sub- 
eat slash scrambling), and semantic interpretation of 
deep case and aspect, where the first two particularly 
are the problems in spoken-style Japanese. 
We have manually converted the coarse-grained 
phrase structure rules into medium-grained rules to 
reduce thc computational cost of unilication. First, 
we divided each of the basic categories into several 
subcategories. Then, we divided the coarse-grained 
phrase structure rules according to the subcategorics. 
qb kee I) the grammar readable, however, we choose 
to leave the subcat slash scrambling and the semantic 
2We also }lave aalother vel~iOll of tile gF~nlllal" for tile sam,: 
8t|bCOll)lls, whlcll is nsed for tile continuous speech l'ecogni- 
tion module \[Takez.awa el, ~xl. , 911. It only imea atoaalc CFG 
la31esp a31d the \]lulll}~r of rules ~llOUll{S to Inol~ thall 2,0(~, 
It is, thcrefore~ categolJzed ~-s a tin,grained gr~Htnar in our 
defiltition. 
interpretation undone, and nmde extensive efforts to 
expand the morpho-syntactic speeificatioas. 
3.2 Example: Medium-Grained Rules 
for Predicate Verb Phrases 
In this section, we illustrate the process of transfof 
marion using a predicate verb l)hrasc production rulc 
as an example. Japanese predicate phrases consist 
of a main verb followed by a sequence of auxiliaries 
azld sentence final particles. There is an ahnost otto- 
dimensional order of verbal constituents such as in 
Figure l, which reflects the basic hierarchy of the 
J apanese sentence structure. 
Kernel verbs occur first in a predicate phrase se- 
quence. Voice auxiliaries precede all other auxil- 
iaries, and within this category, the causative auxil- 
iary (sa)se,'u precedes the passive auxiliary (ra)re~t. 
Aspect auxiliaries, such a.s the progressive auxiliary 
(Ie)ivu precede modal auxiliaries ;rod follow voice 
auxiliaries. Modal auxiliaries are classified into two 
groups with respect to the relative order of negative 
and tense auxiliaries. Mood1 iuehldes the optative 
arlxiliaries, such as tai (want), beki (should/must), 
etc. Mood2 includes the evidential or inferential 
auxiliarics such as rashii (seem/look), kamoshirenai 
(may), etc. Negative auxiliaries uai, u (not) follow 
voice, aspect, and mood l auxiliaries, and precede 
tense and mood2 auxiliaries. Tease auxiliaries la, 
da (-ed) show irregular behavior. They follow the 
voice, aspect, mood1, and negative auxiliaries, and 
precede the mood2 auxiliaries. They also can tollow 
the mood2 auxiliaries. 
in the coarse-grained grammar, we provide a single 
phrase structure rule for the phcnomena. 
v~(v AUXV) O) 
The order constraints between auxiliaries are spec- 
ified in the annotution of rule (1) and each lexi 
cal entry by the combination of tile syntactic fea- 
tures, such as the synlheadlsubcat for preceding con- 
stituents, tile synlheadlcoh for following constituents, 
and the syn\[headlroodl for the position of the con 
stituent~ in the verb phrase hierarchy. For example, 
the causative auxiliary verb sern has the following, 
feature bundles in its syn{headlmodl feature. 
\[\[CAUS +\] \[DEhC -1 \[hSPC -\] \[DONT -\] \[OPTT -\] 
\[IEGT -\] \[PhST -\] 
lEWD -\] \[TENT -\] \[PONT -\] \[POLT-AUX -\] \[INT~ -\] 
\[SFP-I -\] \[SFP-2 -\] \[SFP-3 -\]\] 
In converting the rule, first we have claasitied the 
verbal phrasal categories according to the hierar. 
ehy, e.g. V-kernel, V-aspect, V-moodl, V-negt, V- 
mood2~ and V-tease, then we have subcategorized 
the auxiliari~ as shown in Table 2. Thus, the coarse- 
grained phrase structure rule (1) is converted to the 
32 medium-grained grammar rules in Appendix B 
ACRES DE COLING-92. NANTES, 23-28 AOt~W 1992 1 7 9 Prtoc. OF COLING-92, NANTI~S, AUG. 23-28, 1992 
kernel < voice < aspect < moodl < negate < tense < mood2 < tense 
(sa)seru (te)iru tai nai ta rasii ta 
(ra)reru (te)morau tagaru n da desu u 
masu darou 
Figure l: The predicate hierarchy of Japanese 
AUXV-caus 
AUXV-deac 
AUXV-aspc 
AUXV-dont 
causative auxiliary: (sa)sei'u 
passive auxiliary: (ra)reru 
aspect auxiliary: (~e)iru, Oe)a(u 
benefactive auxiliary: (le)tnorau 
AUXV-optt optative auxiliary: tai, beki 
AUX%negt negative auxiliary: nai, n 
AUXV-tense tense auxiliary: la, da 
AUXV-evid evidential auxiliary: rashii, darou 
AUXV-copl copulative auxiliary: da, desu 
Table 2: Subcategories of auxiliaries in the medium- 
grained grammar 
4 Interleaving CFG Parsing 
and Unification 
4.1 Strategies for Evaluating Feature 
Descriptions 
Unification is an expensive operation, so the point of 
evaluating feature descriptions during CFG parsing 
has serious affects on the overall performance. We 
have implemented two strategies for feature descrip- 
tion evaluation: 
Early Unification (Step-by-step Strategy) 
Featnre descriptions are evaluated step-by-step, 
at each rule invocation in the CFG parsing. 
Late Unification (Pipeline Strategy) Feature 
descriptions are evaluated when a complete CFG 
parse is found. The "well-formedness" of a parse 
derived from atomic CFG rules is verified by eval- 
uating associated feature descriptions. 
The granularity of the phrase structnre rules is 
closely related to the proper selection of the evalua- 
tion strategy. Since the atomic phrase structure rules 
ill the coarse-grained grammar are not so strong as 
to constrain syntactic structures, we have to employ 
the early unification to avoid a nnmber of irrelevant 
subparses which should have been eliminated by the 
evaluation of annotations. IIowever, since the atomic 
rules in the medium-grained grammar have detailed 
morpho-syntax specifications, they should be able to 
avoid irrelevant copies by using the late unification. 
4.2 Implementing the Evaluation 
Strategies 
We have implemented the various evaluation strate- 
gies by doing additional housekeeping in the underly- 
ing parser. The parser used here is called the Typed 
ACTES DE COLING-92, NANTES, 23-28 AOfYr 1992 1 80 
procedure AcpContinue(chart) 
begin 
if SatisfySuspendingCondition?(chart) 
then return chart 
else 
AcpContinue(AcpOneStep(chart)) 
end 
procedure AepOneStep(chart) 
begin 
pendingedge = GetPendingEdge(chart) 
AddEdge(pendingedge) 
if EdgeActive?(pendingedge) then 
TryToContinueActiveEdge(pendingedge, chart) 
else 
TryToContinuelnactiveEdge(pendingedge, chart) 
ProposeProductlons(pendingedge) 
return chart 
end 
Figure 2: Iterative Rule Invocation in an Active 
Chart Parsing Algorithm 
Feature Structure Propagation Parser (TFSP Parser) 
\[Kogure, 89\], which is based on the active chart pars- 
ing algorithm \[Kay, 80\] and typed feature structure 
unification \[Ait-Kaci, 86\]. 
Tile active chart parser and the unification algo- 
rithm are implemented in C on Sun4, which is a 
10-MIPS work station. The unification algorithm 
is based oil nondestructive graph unification \[Wrob- 
lewski, 87\], which we extend to treat negation, loop, 
type symbol subsumption relationships, and disjunc- 
tiou. Successive approximation \[Kasper, 87\] is used 
for disjunctive feature structure unification. 
The Active chart parsing algorithm basically con- 
sists of chart initialization and iterative rule invo- 
cation. The basic part of the iterative rule invoca- 
tion is shown in Figure 2. AcpContinue checks the 
suspending condition and calls rule invocation recur- 
sivcly. AcpOneStep carries out a cycle of rule invo- 
cation which consists of getting a new pending edge 
(GetPendingEdge), adding it to the chart (AddEdge), 
combining active and inactive edges (TryToContinue- 
ActiveEdge/TryToContinuelnactiveEdge), and propos- 
ing new edges (ProposeProductions). The parser stops 
(SatisfySuspendingCondition?) when it finds an inac- 
tive edge whose starting and ending vertex are the 
left-most and right-most vertex of the chart, respec- 
tively, and whose label is the start symbol of the 
granunar. 
PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
In early unification, the feature descriptions are 
evaluated when the edges are combined, while in late 
unification, they are evaluated iu the chart suspend- 
ing condition check only if tile clmrt suspending con- 
dition bolds. Delaying unification is implemented by 
adding a slot edge.parse to the edge structure, which 
keeps a list of the pair of active aud inactive edges 
constructing the edge. If either or both of the ar- 
gument feature structures of the unification have not 
been evaluated, they are recursively evaluated to get 
the target feature structure. 
It has to bc j~oted that some derivations that ter- 
mhmte when feature descriptions are evaluated, may 
not terminate if they are ignored. For example, it is 
possible to write a rule for unbounded dependency 
like (2), in which m~ element in tile subcat feature 
is moved to the slash feature, to introduce slashed 
categories dynamically 3. 
--~ (~) (2) 
Ignoring feature descriptions in the rule may cause 
aal infinite loop. Therefore, feature descriptions arc 
forced to be evaluated, when rules that cause a loop 
are encountered in late unification. 
5 Experiment 
Tile effectiveness of the strategies proposed in this 
paper call be judged by observing their behavior in 
practice. We have tested the time behavior of pars- 
lug with respect to rule granularity and interleaving 
strategy of CFG parsing and unification. 85 sam- 
ple sentences are used. These are selected from the 
sample subcorpus of ATR's dialogue corpus whose 
teu~k dornain is the "secretarial service of an interna- 
tional conference". The average length of the sample 
sentences is 11.0 characters, and their maximum and 
minimum length are 2 and 28 characters, respectively. 
We have developed two 3 al)anese grammars of dif- 
ferent granularity with ahnost the same coverage. 
The coarse-grained rules consist of 22 generalized 
phrase structure rules with detailed ti~.ature descrip- 
tion ill their annotations, while the medium-grained 
rules consist of 164 detailed phra.se structure rules 
with less detailed feature descriptions. Both gram- 
mars use the same lexicon with about 400 lexical cn 
tries. We haw: ,also implemented two different fca 
ture descrilrtion evaluation modes in tile active chart 
parser. 'file early unificalion cwdurdion mode evah> 
ates tile feature descriptions at each rule application 
(tile step-by-step strategy). The late uniJica*io, cval- 
ualio~ mode, on the other hand, delays unification 
until a CUlnl)lete syntactic structure is tk~und lly using 
the atomic phrase structure rules only (tile pipelin,~ 
strategy). 
The average parsing tilne is shown ill Table :1 It 
shows that, on average, tile m(~diun>grained glum 
Sin o/u. implelnentation, for e|liciency l'e~ualls, w~: gellotal~" 
all the approl)fiate combinations of sul)cat and sl:tsl~ in ~ul 
V&IICe, atld kee I) thertl ~.s a disjunctive fe~tttll'e st rttcLiil'O 
Aortas DE COTING-92, NANTrS, 23-28 ^o£rr 1992 1 8 1 
I/.ule Grmndarity 
Unification Mode 
Average Runt|me 
Relative Speed 
Table 3: Aw~ragc parsing time with respect to grail- 
ularity and unification mode 
~ 0 
I: 
260 
× co~e, Eady 
^ Mmdlum, Early 
× × × 
^ × 
x a 
x 
x x A 
x 
Figure 3: Comparison of Coarse-Grained ILules and 
Medimn-(-;rained Rules 
mar rules are 1.7 times more elllcient than the coarse 
grained rules ill the early unit|cation mode, and that 
tile late unification mode is 2.0 times more etncient 
than tile early unilieation mode with the medium~ 
grained gramniar. Moreover, when tile mediuul- 
grained grauunar rules and tile late unification mode 
m'e combined, tile new parser runs 3.5 times fmqter 
than till', l/revklus olle using the coarse-grained gram- 
mar rules and the early unities|ion. 4 
6 Discussion 
The relationship between inlmt length and 1)arsing 
time with resl)CCt to grammar gramllarity is shown 
ill Figure 3. Ill general, tim medimn grained ru/e.s 
pertormed bet, ter than the coarse grained rules. This 
13:lldency beCOllles dearer, *is tile 8elltellee lellg\].h ill~ 
crelmes. This reslllLs from the redllction of disjunc.. 
tiw! feature descriptions whose computational cost in- 
4"1'}m rq~l)l-Oach of st~Villg illllleCc~ssilay c:ol}i(~s for ill-d~!V~tla 
sub}Dalai.- ~, \]it a iI,%i.~\]llg ptot:ess |)~,. l&tc IIIlil\[c#tt\]oll is el thogonat| 
\[tl t ~le a\]lpl o~;hes ,}f saving illtll{:CeSS&ly c~pieS within a unifies- 
lion in'acess, such a_s \[T,mla\[mchi, 9@ Thelcfmv, the ctfects (ff 
speed up citii be multiplied. We haw~ gdte;tdy impleme|~ted his 
(lll~kqi d¢:stltlCtiVe g~l,~l)ll Iltli|i(:atiOll, &lid the l)l*~|illlillal y {:XpeF 
ilIl~lll l('Slllt shows that th,: \])~l*er witll new ulli|iel" rlltls a\[lllOSl 
t wh:c am ta~st *Ls the one t'el)orte(| ill this l);tl)el-. 
Paoc. el: COI,ING-92, NAm'ES, AUG. 23-28, 1992 
1~01 a Madlum, ~ly a lf~l * IAtdiusa, Ls~ 
^ a a ^ ^a a 
^ aa ~ * a a * 
0 S t0 ~S ~0 ~ 30 
Figure 4: Comparison of Early Unification and Late 
Unification 
creases exponentially in the number of disjunctions. 
However, we occasionally encounter sentences which 
are parsed faster using coarse-grained rules rather 
than medium-grained rules. This is because the in- 
crease in the atomic CFG parsing cost exceeds the 
reduction of tbe unification cost. 
The relationship between input length and parsing 
time with respect to unification evaluation mode is 
shown in Figure 4. This shows that the late unifica- 
tion mode is significantly more efficient than the early 
unification mode. It also shows that the parsing time 
in the late unification mode seems to be lmlynomial 
(not exponential) in the input length, while that in 
the early unification mode varies widely and irregn- 
larly. This is because the parsing time in tile late uni- 
fication mode is mainly predominated by the cost of 
atomic CFG parsing by dclaying unification, whereas 
the parsing tiule in the early unification mode is 
mainly predominated by the cost of unification. 
We have demonstrated tbc effectiveness of com- 
bining medium-grained phrase structure rules with 
late unification. Experiment results suggest that new 
prospective techniques for speeding up ratification- 
based parsing exist. 
The first is automatic transformation of phrase 
structure rules, which converts disjunctions in the fea- 
ture descriptions into atomic phrase structure rifles. 
Some disjunctions such as subcat slash scrambling are 
so reguhu" that it seems possible to exl)and them into 
a set of CF(; rules using forlnal i)rocedures. If the 
grammar compiler can perform this kind of transfor- 
mation automatically, we can gain efficiency without 
losing grammar maintainability. 
The second is feature-sensitive lazy unification. 
Unilication is used for both huihling up a structure 
using infornratiou-propugat, ion and blocking rule a F- 
plication using constraint-checking. If the grammar 
compiler can separately output those features for 
constraint-checking such ~ syntactic llcatnrcs, and 
those for information-propagation such as semantic 
rel)resental;ions , irrelevant subparses can be pruned 
efficiently by evaluating the constraint-checking fea- 
tures first. Unification is an associative and comnm- 
tative operation, so the same results from the feature- 
sensitive lazy unification are assured. 
The third is parallel implementation of a 
unification-based parser based on late unification 
(pipe-line strategy). In early unification (step-by-step 
strategy), it is hard to perform parsing in parallel 
because the CFG parsing process and the unifica- 
tion process depend strongly on each other. How- 
ever, both processes are completely separated in the 
pipe-line strategy . Therefore, it is easy to introduce 
the existing parallel algorithms to both CFG pars- 
ing and unification. It is estimated that most feature 
descriptions can be evaluated in parallel, at least, at 
the \]exical level, because unification-based grammars 
such as IIPSG derive phrase structure by iteratively 
propagating the local constraints. 
7 Conclusion 
\]n this paper, we have proposed two techniques for 
implementing an efficient unification-based parsing 
system, which, when combined, significantly improve 
the overall performance. The first is changing the 
granularity of the context-free phrase structure rules 
into medium-grained rules. This enables us to re- 
duce the amount of unification for feature descrip- 
tions without intractably increasing the number of 
phrase structure rules. The second is late unifica- 
tion in which the unification for feature descriptions 
is delayed until a complete CFG parse is found. This 
saves unnecessary copies of feature structures which 
are wasted for irrelevant subparses. 
We have tested the time behavior of the parsing 
system using two granunars of different granularity 
(coarse/medium) and two different strategies for in- 
voking unification (early/late). It is proved that, on 
average, late unification using medium-grained rules 
parses 3.5 times fester than the previous early unifi- 
cation using coarse-grained rules. 
Acknowledgments 
The author would like to thank Dr. l(urematsu, and all the 
n,embers of A2'R huerpreting q'elephony Besearch I,al)s. for 
their constant help m)d fl'tfitflfl discussions. 
Appendix 
A Coarse-grained Gramn,ar 
Rules for Japanese 
The ll~k\[l/e of each rule is showll ill ILhe cOlnlllellt~ 
where t.he su\[t~x -al L -ch, -coord means adjm~ction, 
COnlplenmntation. aim coordination, respectively. 
; ; ; Start Symbol 
start -> (V) j star~xute 
; ;; ~0mt Phras~ 
n -> (att n) ; att-n-ah 
xt -> (p n) ; p-n-ah 
it -> (v :gtt) ; v-i'n-ch 
n -> (v n) ; w-at=ah 
it -> (p it) ; n-lt~coord 
;;; Complox Ithd Coral)erred IJoult 
~* -> (nprefix zt) 
n -> (n nBu~fix) 
; ; ; Postpozitional Phrauo 
p > (n postp) 
p -> (v poutp) 
p -> (p l, oBtp) 
; ; ; Adverbial |)ltx'~So 
adv -> (adv po.tp) 
adv -> (v ~adv) 
adv -> (~t ~adv) 
; ; ; Vurb Phr~so 
v ~> (p v) 
v .> (~dv v) 
; ; ; Prudicat~ Verb PhraB~ 
v -> (v auxv) 
v -> (~t auxv) 
v -> (adv aunt) 
; ; ; Verb InZloctiolt 
v -> (v~t~ vin~l) 
auxv -) (allxvste~ ~infl) 
npre~ix-n-ah 
n-n-ah 
it-llsurfix-ch 
Ir-postp-ch 
*"pontp-ch 
p"postp°ch 
adv-postp-ch 
v-fadv-ch 
n-lady-oh 
; p" v"alty 
adv~v-ah 
v-auxv-ch 
ll-ttllXv-ch 
adv-~uxv-ch 
v--at e=-in*l 
auxv-st om~infl 
B Medium-grained Rules ior 
Jat)anese Verb Phrases 
The coarse-grained grammar rule (1) is converted to 
the following 32 medium-grMned ~;rammar rules. 
V-voice -> (V-ko*a~el AUXg-volc~ 
V-asp,t:t -> (ADV AUXV-a~pc) 
V-a~poct -> (hOt llllV-dont) 
V-moodl --> (V-k~rltol AUXV--ot,tt) 
V~moodl -> (V-voice AUIV~optt) 
V-moodl -'> (V-a8puct AUXV-optt) 
V'-ltegt > (V~kexltt*l AOXV-lteg£) 
V-negt --> (V-voice AOXV-negt) 
V~negt -> (V-aspect hll\]\[V~zlegt/ 
V-.m~gt .-> (V-moodl lUlV-negt) 
V-tonue -> (V-kernel AIJXV-tClltce) 
V~t~Itse -> (V=voico AUlV-toIt~e) 
V-ton.~e -> (V-aspect AUXV-t,itsu) 
V~toltse -> (V-moo~1 AUXV~telI.~B) 
V" t.nue -> (V"n~gt AllJfV--tellse) 
V-mood2 " > (V-kelae\] hUXV-,!vid) 
V'taood2 -> (V-voice hU~V'-evid) 
V-mood2 -> (V-aap~ct AUXV-evid) 
V-mood2 -> (V-moodl AU~V-evid) 
V-mood2 -> (V-negt AUXV-evid) 
V--mood2 -> (V-toltBe AUXV-ovid) 
V-tense -> (V-mood2 AO~V-tense) 
AUAV voice "-> (AUXV-caus) 
AOIV' voice "> (AUiV-'duac) 
AUXV-voice ~> (AIH\[V-caus AUlV-deac) 
V -> (V-kq~rnel) 
V -> (V-voice) 
V -> (V-aspect) 
V -> (V-moodl) 
V-'> (V-negt) 
V > (V refuse) 
V • > (V-mood2) 
AcrEs DE COL1NG-92, NANTES, 23-28 AOUT 1992 1 8 3 PRo(;. OF COLING-92, NANrES, AUU. 23-28, 1992 
ACITe~q DE COLING-92. NANTES, 23-28 ^ofrr 1992 1 8 2 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 

References 

\[Aho eJld Ullman, 77\] Aho, A. and Ulhnan, J., Principles of 
Compiler Design Addismi-Wesley, 1977. 

\[Ai't-Kaci, 86\] A~'t-l(aci, lI., "An Algebraic Semantias Ap- 
ploach to the Effective Resolution of Type l~;quations, '' 
Journal o\] Theoretical Computer Science ~5, 1986. 

\[Crater, 90\] Carter, D., "Efficient Disjunctive Unification for 
\[~ottonl-Up Paining," Proc. o\] COLIN(;-90, 1990. 

\[I)Sn'e and l~iselc, 90\] DSn'e and Eisele, "Featm'e Logic whh 
Di@mctive Unification," Proc. o\] COLING-90, 1990. 

\[Earley, 70\] Era'Icy, J., "An Efficient ConLext-l,'rec Parsing AI 
gorithm," AG'/~i, 13, ~, 1970. 

\[Eisele and I)Srre, 88\] Eisele, A. m,d D~rre, J.. "Unillcat.lon of 
Disjunctive Feature Descrlptions," Prom o\] ihe ~Oth A CL, 
1988. 

\[Emele, 91\] lgmele, M., "/Jnitlcation with Lazy Non-ltedun- 
dant Copying," Proc. o\] the $9ih A UL, 1.~91. 

\[Godden, 90\] Godden, K., "l,azy lhdfication," Prec. of the 
~8*h A CL, 1990. 

\[Gu*kii, 87\] Clu*~ii, T., Japanese Phrase Structure Grammar 
A Unification-Based Approach, t)o*xh,echt, lteidel, 1987. 

\[K~sper, 87\] Kasper, It., '% Unification Method for Disjunc- 
tive Feature Deacfipthms," Proc. of the ~5th A CL, 1987. 

\[Km~tunen, 86\] Karttunmt, L., D-PATR- A l)evelopment En- 
vironment for Unification-Bmsed Ormtnnara, CSLI-8C~91, 
CSLI, 1986. 

\[Karttunen mid Kay, 85\] Karttunen, L. and Kay, M., "Strut- 
time Shari*tg with Bin~as¢ Trees," Proc. o/the $3rd A CL, 
1985. 

\[Kay, 801 Kay, M., Algorithm Schemata and Data Slvuctures 
in Synlaeilc Proce~slnO, Tedudcal tleport CSl,-80q2, Xe- 
rox PAItC, 1980. 

\[Kogm~, 89\] Kogure, K., "Pruning dapmmse Spoken Sentence.~ 
b~ed on lIPS(l," Prec. oJ lhe IWPT, 1989. 

\[l{og~i~, 99\] Kogu~, K., "Strategic Leu.y lnerementM (3opy 
Graph Unification," Proc. of COLING-90, 1999. 

\[Maxwell emd Kaplaa*, 89\] Maxwell, J. and Kaplan, 11., "An 
Overview of Disjtmctive ConstreJnt Satisfaction," Proc. 
oa t fhe IWPT, 1989. 

\[Mozqmoto et al. , 90\] Morimoto,T. et al., "Integration of 
Speedt Recognition altd Language }'roceasing in Spoken 
L~ngn~gc Trmislatlon Syntetzl (8I/I'RANH)," llrac, of the 
ICSLP, 1.990. 

\[Nagata mid Kogure, 90\] Nagata, M. mM l<.ogm~e, K., 
"\[\[PSO~B~.sed Lattice Parser for Spoken Japanese in a 
Spoken I,~ngunge Tr~amlation System," Prec. o\] ~he 9lh 
ECAI, 1990. 

\[Nakeato, 91\] Nakmm, M., "Co~L~traint Projection: An Ef- 
ficient 'Freatment of l)isjmlctive Feature l)cscriptions," 
Proc. of 2gth A 6'L, 1991. 

\[Pereira, 85\] Perelra, l,'., "A Stl~mtu~-Sharlng liepl~sent*~ 
tlon for \[Jnitic,~tion-ltased lennnMisnas," Prec. o\] lhe 23rd 
A UL, 1985. 

\[Pollard mtd Sag, 87\] Pollard, (3. and Sag, I., An Information,- 
Based Syntax and Semantics, CSLI l,ecture Notes No. 13, 
CS\[,I, 1987. 

\[T~kegawa et M., 91\] Tak~zawa, T. et M., "Linguistic G'on- 
strahtts for Continuous Speech llecognition in (h,al- 
Directed DiMogue," Prec. o\] I(,'ASSP-91, 1991. 

\[Tomabechi, 91\] Tomabedfi, ll., "Quasi-Destmctlve Graph 
Unification," Proc. of 29th A CL, 1991. 

\[Tomita, 86\] *fomita, M., Efficient Parsin9 Jor Natural i,an 
9uage: A Fast Algorithm for Practical Systema, I(hlwer 
Academic Puhlishms, 1986. 

\[Wroblewuki, 87\] Wroblewski, D., "Nondestructive (;ra|,h 
Unification," Proe. o| the 6th AAAI, 1987. 
