An HPSG-to-CFG Approximation of Japanese 
Bernd Kiefer, Hans-Ulrich Krieger, Melanie Siegel 
German Research Center for Artificial Intelligence (DFKI) 
Stuhlsatzenhausweg 3, D-66123 Saarbriicken 
{kiefer, krieger, siegel}@dfki, de 
Abstract 
We present a simple approximation method for turn- 
ing a Head-Driven Phrase Structure Grammar into a 
context-free grammar. The approximation method 
can be seen as the construction of the least fixpoint 
of a certain monotonic hmction. We discuss an ex- 
periment with a large HPSG for Japanese. 
1 Introduction 
This paper presents a simple approximation 
method for turning an HPSG (Pollard and Sag, 
1994) into a context-free grmnmar. The the- 
oretical underpinning is established through a 
least fixpoint construction over a certain mono- 
tonic function, similar to the instantiation of 
a rule in a bottom-up passive chart parser or 
to partial evaluation in logic programming; see 
(Kiefer and Krieger, 2000a). 
1.1 Basic Idea 
The intuitive idea underlying our approach is 
to generalize in a first step the set of all lexicon 
entries. The resulting structures form equiv- 
alence classes, since they abstract from word- 
specific information, such as FORN or STEM. The 
abstraction is specified by means of a restrictor 
(Shiet)er, 1985), the so-called lexicon rcstrictor. 
The grammar rules/schemata are then instan- 
tiated via unification, using the abstracted lexi- 
con entries, yielding derivation trees of depth 1. 
We apply the rule restrictor to each resulting 
feature structure, which removes all information 
contained only in the daughters of the rule. Due 
to the Locality Principle of HPSG, this deletion 
does not alter the set of derivable feature struc- 
tures. Since we are interested in a finite fixpoint 
from a practical point of view, the restriction 
also gets rid of information that will lead to in- 
finite growth of feature structures during deriva- 
tion. Additionally, we throw away information 
that will not restrict the search space (typically, 
parts of tile semantics). The restricted fea- 
ture structures (together with older ones) then 
serve as tile basis for the next instantiation step. 
Again, this gives us feature structures encoding 
a derivation, and again we are applying the rule 
restrictor. We proceed with the iteration, until 
we reach a fixpoint, meaning that further itera- 
tion steps will not add (or remove) new (o1" old) 
feature structures. 
Our goal, however, is to obtain a context-fl'ee 
grammar, trot since we have reached a fixpoint, 
we can use the entire feature structures as (com- 
plex) context-free symbols (e.g., by nlapping 
them to integers). By instantiating the HPSG 
rules a final time with feature structures from 
the fixpoint, applying the rule restrictor and 
finally classifying the resulting structure (i.e., 
find tile right structure from the fixpoint), one 
can easily obtain tile desired context-free grain- 
mar (CFG). 
1.2 Why is it Worth? 
Approximating an HPSG through a CFG ~ is 
interesting for the following practical reason: 
assuming that we have a CFG that comes close 
to an HPSG, we can use the CFG as a cheap fil- 
ter (running time complexity is O(IGI 2 x n 3) for 
an arbitrary sentence of length n). The main 
idea is to use the CFG first and then let the 
HPSG deterministically replay the derivations 
licensed by the CFG. The important point here 
is that one can find for every CF production 
exactly one and only one HPSG rule. (Kasper 
et al., 1996) describe such an approach for word 
graph parsing which employs only the relatively 
unspecific CF backbone of an HPSG-like grmn- 
mar. (Diagne et al., 1995) replaces the CF back- 
bone through a restriction of the original HPSG. 
This grammar, however, is still an unification- 
1046 
based grammar, since it employs coreference 
constraints. 
1..3 Content of Paper 
In tile next section, we describe the Japanese 
HPSG that is used in Verbmobil, a project that 
deals with the translation of spontaneously spo- 
ken dialogues between English, German, and 
Japanese speakers. After that, section 3 ex- 
plains a simplified, albeit correct version of the 
implemented algorithm. Section 4 then dis- 
cusses the outcome of the approximation pro- 
cess. 
2 Japanese Grammar 
The grammar was developed for machine trans- 
lation of spoken dialogues. It is capable of deal- 
ing with spoken  phenomena and un- 
grammatical or corrupted input. This leads on 
the one hand to the necessity of robustness and 
on the other hand to mnbiguitics that must be 
dealt with. Being used in an MT system for spo- 
ken , the grammar must firstly accept 
fragmentary input and bc able to deliver partial 
analyses, where no spanning analysis is awdl- 
able. A coinplete fragmentary utterance could, 
e.g., be: 
dai~oubu 
OKay 
This is an adjective without any noun or (cop- 
ula) verb. There is still an analysis available. 
If an utterance is corrupted by not being fully 
recognized~ the grammar delivers analyses for 
those parts that could be understood. An ex- 
ample would be the following transliteration of 
input to the MT system: 
son desu ne watakushi so COP TAG i 
no hou wa dai~oubu GEN side 'FOP okay 
desu da ga kono hi 
COP but this day 
wa kayoubi desu ~te 
TOP Tuesday COP TAG 
(lit.: Well, it is okay for my side, but 
this day is ~l~msday, isn't it?) 
Here, analyses for the following fragments arc 
delivered (where the parser found opera wa in 
the word lattice of the speech recognizer): 
sou dcsu nc watakushi so COP TAG I 
no hou wa dai{oubu 
GEN side TOP okay 
dCSlt 
COP 
(Well, it is okay for my side.) 
era TOP 
(The opera) 
hone hi wa kayoubi 
this day TOP Tlmsday 
desu nc COP TAG 
(This (lay is 3hmsday, isn't it?) 
Another necessity for partial analysis comes 
fl'om real-time restrictions imposed by the MT 
system. If tile parser is not allowed to produce 
a spanning analysis, it delivers best partial frag- 
ments. 
rl'tle grammar must also be applicable to phe- 
nomena of spoken . A typical problem 
is tile extensive use of topicalization and even 
omission of particles. Also serialization of parti- 
cles occur nlore often than in written , 
as described in (Siegel, 1999). A well-defined 
type hierarchy of Japanese particles is necessary 
here to describe their functions in the dialogues. 
Extensive use of honorification is another sig- 
nificance of spoken Japanese. A detailed de- 
scription is necessary for different purposes in 
an MT system: honorification is a syntactic 
restrictor in subject-verb agreement and com- 
plement sentences, l~lrthermore, it is a very 
useflfl source of information for the solution 
of zero pronominalization (Metzing and Siegel, 
1994). It is finally necessary for Japanese gener- 
ation in order to tind the appropriate honorific 
forms. The sign-based in%rmation structure of 
HPSG (Pollard and Sag, 1994) is predestined 
to describe honorification on the different levels 
of linguistics: on the syntactic level for agree- 
ment phenomena, on tile contextual level for 
anaphora resolution and connection to speaker 
and addressee reference, and via co-indexing on 
the semantic level. Connected to honorification 
is the extensive use of auxiliary and light verb 
constructions that require solutions in the areas 
of morphosyntax, semantics, and context (see 
(Siegel, 2000) for a more detailled description). 
Finally, a severe problem of tile Japanese 
grammar in the MT system is the high po- 
1047 
tential of ambiguity arising from the syntax of 
Japanese itself, and especially from the syntax 
of Japanese spoken . For example, the 
Japanese particle ga marks verbal argmnents in 
most cases. There are, however, occurrences of 
ga that are assigned to verbal adjuncts. Allow- 
ing g a in any case to mark arguments or ad- 
juncts would lead to a high potential of (spuri- 
ous) ambiguity. Thus, a restriction was set on 
the adjunctive g a, requiring the modified verb 
not to have any unsaturated ga arguments. 
The Japanese  allows many verbal 
arguments to be optional. For example, pro- 
nouns are very often not uttered. This phe- 
nomenon is basic for spoken Japanese, such that 
a syntax urgently needs a clear distinction be- 
tween optional and obligatory (and adjacent) 
arguments. We therefore used a description 
of subcategorization that differs from standard 
HPSG description in that it explicitly states the 
optionality of arguments. 
3 Basic Algorithm 
We stm't with the description of the top-level 
function HPSG2CFG which initiates the ap- 
proximation process (cf. section 1.1 for the 
main idea). Let 7~ be the set of all rules/rule 
schemata, 12 the set of all lexicon entries, R 
the rule restrictor, and L the lexicon restrictor. 
We begin the approximation by first abstract- 
ing from the lexicon entries /2 with the help of 
the lexicon restrictor L (line 5 of the algorithm). 
This constitutes our initial set To (line 6). Fi- 
nally, we start the fixpoint iteration calling It- 
crate with the necessary parameters. 
1 HPSG2CFG(T~, 12, R, L) :~==~ 
2 local To; 
3 T0 := (~; 
4 for each l E/2 
5 l:= L(1); 
6 To := To u {l}; 
7 Iterate(T~, R, To). 
After that, the instantiation of the rule 
schemata with rule/lexicon-restricted elements 
from the previous iteration Ti begins (line 11- 
14). Instantiation via unification is performed 
by Fill-Daughters which takes into account a 
single rule r and Ti, returning successful instan- 
tiations (line 12) to which we apply the rule 
restrictor (line 13). The outcome of this restric- 
tion is added to the actual set of rule-restricted 
feature structures Ti+l iff it is new (remember 
how set union works; line 14). In case that re- 
ally new feature structures have not been added 
during the current iteration (line 15), meaning 
that we have reached a fixpoint, we immediately 
exit with T/ (line 16) from which we generate 
the context-free rules as indicated in section 1.1. 
Otherwise, we proceed with the iteration (line 
17). 
8 Iterate(g, R, Ti) :¢==v 
9 local Ti+j; 
10 Ti+~ := Ti; 
11 for each r E T~ 
12 for each t C Fill-Daughters(r, Ti) do 
13 t := R(t); 
14 Ti+I := Ti+I U {t}; 
15 if Ti = T/+I 
16 then return Cornpute-CF-Rules(TG Ti) 
17 else Iterate(7~, R, Ti+l). 
We note here that the pseudo code above is 
only a naYve version of the implemented algo- 
rithm. It is still correct, but not computation- 
ally tractable when dealing with large HPSG 
grammars. Technical details and optimizations 
of the actual algorithm, together with a descrip- 
tion of the theoretical foundations are described 
in (Kiefer and Krieger, 2000a). Due to space 
limitations, we can only give a glimpse of the 
actual implementation. 
Firstly, the most obvious optimization applies 
to the function Fill-Daughters (line 12), where 
the number of unifications is reduced by avoid- 
ing recomputation of combinations of daugh- 
ters and rules that already have been checked. 
To do this in a simple way, we split the set Ti 
into Ti \ T/.-1 and T/_I and fill a rule with only 
those permutations of daughters which contain 
at least one element from T/\r/_ 1 . This guaran- 
tees checking of only those configurations which 
were enabled by the last iteration. 
Secondly, we use techniques developed in 
(Kiefer et al., 1999), namely the so-called rule 
filter and the quick-check method. The rule fil- 
ter precomputes the applicability of rules into 
each other and thus is able to predict a fail- 
ing unification using a simple and fast table 
lookup. The quick-check method exploits the 
1048 
flint that unification fails snore often at cer- 
tain points in feature structnres than at oth- 
ers. In an off line stage, we parse a test cor- 
pus, using a special unifier that records all fail- 
ures instead of bailing out after the first one 
in order to determine the most prominent fail- 
ure points/paths. These points constitute the 
so-called quick-check vector. When executing a 
unification during approximation, those points 
are efficiently accessed and checked using type 
unification prior to the rest of the structure. Ex- 
actly these quick-check points are used to build 
the lexicon and the rule restrictor as described 
earlier (see fig. 1). During ore: experinmnts, 
nearly 100% of all failing unifications in Fill- 
Daughters could be quickly detected using the 
above two techniques. 
Thirdly, instead of using set union we use 
tlhe more elaborate operation during the addi- 
tion of new feature structures to T/.+I. In fact, 
we add a new structure only if it is not sub- 
sumed by some structure already in tile set. To 
do this efficiently, tile quick-check vectors de- 
scribed above are employed here: before per- 
fl)rming full feature structure subsnmption, we 
pairwise check the elements of the vectors us- 
ing type subsumption and only if this succeeds 
do a full subsmnption test. If we add a new 
structure, we also remove all those structures in 
7)ql that are subsumed by the slew structure 
in order to keep the set small. This does not 
change the  of tile resulting CF gram- 
mar because a more general structure can be 
put into at least those daughter positions which 
can be fillcd by the more specific one. Conse- 
quently, fbr each production that employs the 
more specific structure, there will be a (pos- 
sibly) more general production employing the 
more general structure in the same daughter po- 
sitions. Extending feature structure subsump- 
lion by quick-check subsumption definitely pays 
off: more than 98% of all failing subsumptions 
could be detected early. 
Further optimizations to make the algorithm 
works in practice are described in (Kiefer and 
Krieger, 2000b). 
4 Evaluation 
The Japanese HPSG grammar used in our ex- 
periment consists of 43 rule sdmmata (28 unary, 
15 binary), 1,208 types and a test lexicon of 
2,781 highly diverse entries. The lexicon restric- 
tot, as introduced in section 1.1 and depicted in 
figure 1, maps these entries onto 849 lexical ab- 
stractions. This restrictor tells us which parts of 
a feature structure have to be deleted---it is the 
kind of restrictor which we are usually going to 
use. We call this a negative restrictor, contrary 
to tile positive restrictors used in the PATR- 
II system that specii\[y those parts of a feature 
structure which will survive after restricting it. 
Since a restrictor could have reentrance points, 
one can even define a reeursivc (or cyclic) re- 
strictor to foresee recursive embeddings as is the 
case in HPSG. 
The rule restrictor looks quite silnilar, cut- 
ling off additionally information contained only 
in the daughters. Since both restrictors remove 
the CONTENT feature (and hence the semantics 
which is a source of infinite growth), it hal> 
pened that two very productive head-adjunct 
schemata could be collapsed into a single rule. 
Tiffs has helped to keep the number of feature 
structures in the fixpoint relatively small. 
We reached the fixpoint after 5 iteration 
steps, obtaining 10,058 featnre structures. The 
comtmtation of the fixpoint took about 27.3 
CPU hours on a 400MHz SUN Ultrasparc 2 with 
t~¥anz Allegro Common Lisp under Solaris 2.5. 
Given tim feature structures from the fixpoint, 
the 43 rules might lead to 28 x 10,058-t- 15 x 
10,058 x 10,058 = 1, 51.7,732,084 CF produc- 
tions in the worst case. Our method produces 
19,198,592 productions, i.e., 1.26% of all pos- 
sible ones. We guess that the enormous set of 
productions is due tile fact that the grammar 
was developed for spoken Japanese (recall sec- 
tion 2 on the mnbiguity of Japanese). Likewise, 
the choice of a 'wrong' restrictor often leads to a 
dramatic increase of structures in the fixpoint, 
and hence of CF rules--we are not sure at this 
point whether our restrictor is a good compro- 
mise between tile specificity of the context-free 
 and the number of context-free rules. 
We are currently implementing a CF parser that 
can handle such an enormous set of CF rules. 
In (Kiefer and Krieger, 2000b), we report on 
a similar experiment that we carried out using 
the English Verbmobil grmnmar, developed at 
CSLI, Stanford. In this paper, we showed that 
the workload on the HPSG side can be drasti- 
cally reduced by using a CFG filter, obtained 
1049 
-PHON 
FORM 
SYNSEM LOCAL 
-CONTENT 
CONTEXT 
HEAD 
CAT 
SUBCAT 
3PEC ~\] 
"NONLOCAL 
-CONTENT 
CONTEXT 
LOCAL 
~AT 
M0D 
MARK \[~\] 
FORMAL 
MODUS 
P0S 
PTYPE 
10BJIII I 
I OBJ2 I i I I VAL ~ 
1 I SpR l lJ ! 
I SUBJ I 1 I I 
SUBCAT 
HEAD 
POS 11 SPEC FORMAl MARK 
MOD 
Figure 1: The lexicon restrictor used during the approximation of the Japanese grammar. In 
addition, the rule restrictor cuts off the DAUGHTERS feature. 
from the HPSG. Our hope is that these results 
can be carried over to the Japanese grammar. 
Acknowledgments 
This research was supported by the German 
Ministry for Education, Science, Research, and 
Technology under grant no. 01 IV 701 V0. 

References 

Abdel Kader Diagne, Walter Kasper, and Hans- 
Ulrich Krieger. 1995. Distributed parsing with 
HPSG grammars. In Proceedings of the ~th Inter- 
national Workshop on Parsing Technologies~ IW- 
PT'95, pages 79-86. 

Walter Kasper, Hans-Ulrich Krieger, JSrg Spilker, 
and Hans Webcr. 1996. From word hypotheses to 
logical form: An efficient interleaved approach. In 
D. Gibbon, editor, Natural Language Processing 
and Speech Technology, pages 77-88. Mouton de 
Gruyter, Berlin. 

Bernd Kiefer and Hans-Ulrich Krieger. 2000a. 
A context-free approximation of Head-Driven 
Phrase Structure Grmnmar. In Proceedings of the 
6th International Workshop on Parsing Technolo- 
gies, IWPT2000, pages 135-146. 

Bernd Kicfcr and Hans-UMch Kricger. 2000b. Ex- 
periments with an HPSG-to-CFG approximation. 
Research report. 

Bernd Kicfcr, Hans-Ulrich Kricger, John Carroll, 
and Rob Malouf. 1999. A bag of useful techniques 
for emcient and robust parsing. In Proceedings of 
the 37th Annual Meeting of the Association for 
Computational Linguistics, pages 473-480. 

Dieter Metzing and Melanin Siegel. 1994. Zero pro- 
noun processing: Some requirements tbr a Verb- 
mobil system. Verbmobil-Memo 46. 

Carl Pollard and Ivan A. Sag. 1994. Head-Driven 
Phrase Structure Grammar. Studies in Contem- 
porary Linguistics. University of Chicago Press, 
Chicago. 

Stuart M. Shiebcr. 1985. Using restriction to extend 
parsing algorithms for complex-feature-based for- 
malisms. In Proceedings of the 23rd Annual Meet- 
ing of the Association for Computational Linguis- 
tics, pages 145-152. 

Melanin Siegel. 1999. The syntactic processing of 
particles in Japanese spoken . In PTv- 
cecdings of the 13th Pacific Asia Confcrcncc on 
Language, Information and Computation, pages 
313-320. 

Melanic Siegel. 2000. Japanese honorification in an 
HPSG framework. In Proceedings of the l~th Pa- 
cific Asia Conference on Language, Information 
and Computation, pages 289-300. 
