SIMPLE PARSER FOR AN HPSG-STYLE GRAMMAR IMPLEMENTED IN PROLOG 
Karel oliva* 
Lingustic Modelling Laboratory, 
Coordination Centre 
for Computer Science and Computer Technology, 
Bulgarian Academy of Sciences, 
acad. G. Bonchev st. bl. 25A, 
BG - 1113 Sofia, 
Bulgaria 
Abstract: 
This paper describes basic ideas of a parser for 
HPSG style grammars without LP component. The parser 
works bottom-up using the left corner method and a 
chart for improving efficiency. Attention is paid to 
the format of grammar ru~es as required by the parser, 
to the possibilities of direct implementation of prin- 
ciples of the grammar as well as to solutions of prob- 
lems connected with storing partly specified catego- 
ries in the chart. 
i. Kapremmntation of Grammar Rulam for the Parser 
The Head-driven Phrase Structure Grammar (HPSG) 
blurrs the distinction between rules of the grammar 
and the structures they generate. Put shortly, the 
matter is that "structures" and "rules" in HPSG differ 
solely in the level of abstraction over the llnguistJc 
material they describe. A "structure" describes some 
very concrete piece of this material (e.g., a sen- 
tence) and, hence, embodies no abstraction; a "rule", 
on the other hand, presents by itself a prototype of a 
set of structures. Since in HPSG categories are under- 
stood as bundles of features ("attribute"="value" 
pairs) , the "structure"/"rule" dichotomy is reflected 
by the fact that the rules can contain variables as 
values of attributes of some features while the struc- 
tures must be always fully specified or that the rules 
can miss some (otherwise possibly obligatory) features 
altogether. Constraints restricting or binding to- 
gether permitted values of the attributes can be asso- 
ciated with the rules. Naturally, different levels of 
abstraction can be introduced among the rules as well, 
which allows for capturing different levels of genera- 
lization over the linguistic data described. 
On the highest level of abstraction, the parser 
can deal with two types of rules: in the first type, 
the values of variables occurring in the rules are 
bound by constraints, in the second type no con- 
straints occur. In order to support simultaneously an 
easily legible notation and a reasonable computer 
implementation of these two types of rules, two Prolog 
operators are defined, each describing one rule type. 
:- op(1200,xfx, is a rule if) . 
:- op(1200,xf, is a rule) 
The first of the two is an infix operator describing 
the rules containing additional constraints; the rule 
itself should stand in front of the operator, the con- 
straints should follow it, separated from each other 
by commas ",". The second one is a postfix operator 
describing the rules without any constraints. 
The inventory of types of rules may be arbitra- 
rily broadened. All that is necessary for this purpose 
is just adding operator declarations and, possibly, 
also implementing feature inheritance principles cor- 
responding to the newly introduced rule type(s). This 
is important because it provides for bounding the ap- 
plication of the principles to the whole rule types 
and makes thus obsolete the explicit stipulation of 
feature sharing among respective categories in each 
rule, which is still the case in many current parsers. 
TWO examples of the rule format for the parser 
are shown in the following: it is to be remembered 
that in HPSG, as well as in all other theories accept- 
ing the X-bar convention, a central role among the 
daughters in a rule is played by the head-daughter - 
because of this, the head-daughter is specially mark- 
ed, which provides, e.g., for application of the Head 
Feature Principle. 
Ex.l: - the standard "S ---> NP VP" rule can appear 
in the following form (with obvious meanings of the 
predicates "concatenation" and "agreement"): 
\[phonology=SPhonology, 
d trs=\[dtr=\[cat=n, 
bar=two, 
phonology=NP_Phenology, 
morphology=NP_Morphology \], 
head_d_tr=\[cat=v, 
bar=two~ 
phonology=VPPhcnology, 
morphology=VPMorphology\]i\] 
is a rule if 
coneatenation(NPPhonology,VPPhonology, S Phonology), 
agreement(NPMorphology,VPMorphology) 
Ex.2: - the rule "NP ---> Dot NP": note the fact that 
the phonology of the mother can be expressed without 
invocation of the "concatenation" predicate (since 
determiners eonsist of one word only) and the agree- 
ment is expressed directly in the rule by coinde×ing 
the features "number" in both daughters 
\[bar=two, 
phonology=\[Det_Phonology!NP_Phonology\], 
d trs=\[dtr=\[cat=det, 
bar=zero, 
phonology=\[DetPhono!ogy\], 
morphology=\[number=Number\]i, 
head d_tr=\[cat:n, 
bar=one, 
phonology=NPPhonoloqy, 
morphology=\[hum bar=Number I\]\] 
is a rule . 
2. Repreeantation of Categories in the Parser 
AS follows from the examples, the notation adopt Z 
ed for categories in rules is the one describing them 
as (Prolog) lists of features. Keeping such kind of 
representation also in the underlying mechanism of the 
parser would be, however, quite unfelicitous a deci- 
sion. The main problem consists in the fact that the 
parser working bottom-up may discover certain features 
of already parsed (sub)structures only later in the 
parsing process (so to say, only when it gets "higher 
in the tree", with regard to the way the parsing pro- 
ceeds). These features are to be, then, incorporated 
into the already parsed structures. An elegant solu- 
tion of this problem was proposed in (Eisele and 
D~rre,1986) and adopted in the parser described. 
Syntactic categories are represented in the parser in- 
ternally in a way slightly different from their repre- 
sentation in the grammar: all categories (including 
434 1 
those used as values of features of other categories) 
are represented as "open-ended lists": each internal 
representation of a category is a list having a cer- 
tain number of instantiated elements at its beginning. 
and an uninstantiated "tail". The main idea standing 
behind this kind of representation is that any feature 
to be dlscovered (and added to the category) only la- 
ter in the parsing process can be now added as the 
"first member" of the uninstantiated "tail", which 
task is easy to perform provided that the "tail" is 
still accessible (e.g., if the free "tails" of catego- 
ries subject to feature inheritance principles are 
shared logical variables). Converting categories from 
one kind of representation to the other one is 
performed by a two-argument predicate "perestroika" 
(used below). 
The representation described also supports a 
simple implementation of unification of categories 
(see Eisele and D~rre, op. cit. for more detail); in 
the folowing, unification of two categories is presup- 
posed to be performed by a two-argument predicate 
"unify". 
3. The ParsQr 
The main idea of the parsing method used in the 
BUP par~er (Matsumoto et ai,1983) being the starting 
point of the system described is that a rule is to be 
triggered only after its left corner has been found 
(i.e. it has been supplied by lexical scan, in the 
case of iexical categories, or it has been properly 
parsed). The left corner of a rewriting rule is the 
leftmost symbol on its rlght-hand side - the name 
stems from depicting the rule as a local tree it gene- 
rates. After a category is parsed or supplied by lexi- 
cal scan, one of the grammar rules having this ca- 
tegory as its left-corner is s@lected, the sisters of 
the left corner in this rule are tried, and if all of 
them are succesfuly parsed, the mother category of the 
rule is declared to be parsed and the whole process, 
using the mother as a left corner, is repeated "on a 
higher level". If any failure occurs, backtracking is 
invoked. Thus, the parsing process is data-driven - 
the rules of the grammar are selected in accordance 
with the symbols scanned in the input. From the view- 
point of efficiency, this is important mainly for the 
so-called "free-word-order" languages. Mentioning 
this, it should be further recalled that the perfor- 
mance of BUP is further improved by storing all the 
information about all subtasks that have been already 
tried (successfully or unsuccessfully), which avofds 
repetitive computations of the parses that have been 
performed or that have been proved impossible to per- 
form in the preceding steps of the analysis. 
For the purposes of the implementation of the 
parsing process, it is necessary to extend the notion 
of the "left corner" to its reflexive and transitive 
closure. The transitive closure inductively states 
that for all triples of categories X,Y,Z such that X 
is a left corner of Y and Y is a left corner of Z , X 
is also a left corner of Zo The reflexive closure fi- 
nishes the picture by saying that any category is a 
left corner of itself. 
Given the previously described basic philosophy 
of parsing, the process can be implemented in Prolog 
by means of two predicates performing the two tasks 
informally mentioned in the preceding paragraphs: 
- the predicate "parse", parsing a given (expected) 
category from (a prefix of) the input string 
- the predicate "isaleft_corner", linking the left 
corner category with the goal (expected) category in 
the parsing process. 
However, before these predicates can be explained 
in more detail, it is necessary to make several re- 
marks explaining the way the processing of complex ca- 
tegories has been built into the system. 
First, the usual equality ("=") of two categories 
was replaced by their unification, i.e. on all spots 
where equality of two categories - expressed either 
directly, in the form of an equation, or indirectly, 
by variable sharing or otherwise - occured in the ori- 
ginal BUP, it had to be replaced by a call of the pre- 
dicate "unify". 
Second, in the predicates storing or retrieving 
the information about the (un)successfully performed 
parsing subtasks, the categories must be "frozen" 
exactly in the state when this subtask was started: 
problems would occur if the "stored" categories 
include free variables ("\]n\[ormation holes") as values 
of some features, which variables might be matched by 
any real values in the moment of search for the infor- 
mation about previously performed parsing tasks - such 
a matching, however, would be incorrect, since what is 
required is a real identity of the subtasks. (The same 
holds also the other way round, i.e. problems of 
exactly the same nature would occur also if the stored 
value were instant!ated and the current one were a 
free variable.) 
The aforementioned identity of subtasks, however, 
requires the identity of (some of) the stored catego- 
ries only, not the identity of the lists representing 
them, i.e. what really matters is the identity of fea- 
tures, but not of their order. This identity of 
"frozen" categories (represented as "usual" Prolog 
lists) is checked by the predicate "identical_catego- 
ries". 
Now at last, the definitions of the predicates 
"parse" and "is a left corner" cao be given; the sup- ...... 
porting predicates are either elucidated in ti~e pre- 
ceding text or are given (hepefully) self-expla!ni:Ig 
names, which should hold also for the arguments. The 
difference between the "frozen" categories represented 
as usual Prolog lists and those represented as "open- 
ended" lists is reflected in the variable names stand- 
ing for the respective types: the "open-ended" ca- 
tegories are always marked as "ReaL" categories, the 
other ones never bear such marking. 
% PARSE( 
% "Frozen"_Goa l_Cat, 
% \[Real_Goal Cat,Structure\], 
% Input_String,Rest String ) 
/* Checking whether parsing the current Real Goal Ca- 
tegory from some prefix of the Input String has been 
tried (either successfully or not) in the preceding 
steps of the parsing process. */ 
parse(GoalCategory,\[RealGoal_Category,Structtlre\], 
Input_String, Rest_String ) 
( already_parsed(Asserted_Goal Category,, 
Input Str\]ng, ), 
identical_categories(GoaiCategory, 
AssertedGoaiCategory) ; 
cannot_beparsed(AssertedGeaiCategory, 
Input String ), 
identical_categories{GoaICategory, 
Asserted_Goal_Category) 
!, fail ), 
I 
, i 
already_parsed(AssertedGoal_Category, 
\[Real_Goal Category, Structure\], 
InputStrlng,Rest_String 
identical categorles(Goal Category, 
Asserted_Goal Category) . 
), 
2 435 
/* The following clause describes parsing of a cate- 
gory with no daughters (category immediately dominat- 
ing an empty string) */ 
parse(Goal_Category,\[Real_Goal_Category,dtrs=\[\]\], 
String,String ) 
/* rule having no daughters is to be found in the 
grammar */ 
find_rule to be used(Realgoal_Category, 
Constraints Of Rule), 
call(Constralnts Of Rule), 
assertz(already_parsed( 
Goal_Category, 
\[Real_Goal_Category,d_trs=\[\]\], 
String, String )) . 
/* The following clause describes parsing of a cate- 
gory dominating a non-empty terminal string */ 
parse(Goal_Category,\[Real_GoalCategory,Structure\], 
\[Word_FormIRest Input_Strlng\],Rest_String ) 
lexicon(WordForm,WordFormCategory), 
perestroika(Word_Form_Category, 
Real Word_Form_Category), 
is a left corner( 
\[Real_WordFormCategory, d trs=\[\]\], 
\[Real_Goal Category,Structure}, 
Rest_Input_String, Rest_String ), 
assertz(already_parsed( 
Goal Category, 
\[Real_Goal Category,Structure\], 
\[Word_FormlRest_Input_String\], 
Rest_String )) . 
/* Asserting information about the impossibility of 
parsing certain categories from certain strings */ 
parse(Goal_Category,,InputStrlng,) 
(already_parsed(GoalCategory,,Input_String,_) ; 
assertz(cannot_be_parsed( 
Goal Category,Input String)) }, 
! , fail . 
IS_A_LEFT_CORNER( 
\[Real_Left Corner Cat,Structure\], 
\[Real GoalCat,Structurel, 
Input String,RestString 
/* reflexive closure of the relation "being a left 
corne r" * / 
is_a_leftcorner( 
\[ReaiLeftCornerCategory, 
RealLeftCornerCategory_Structure\], 
\[Real GoalCategory,RealGoal_Category_Structure\], 
String,String ) 
unify(RealLeftCorner_Category, 
Real_Goal_Category ) . 
/* transitive closure of the relation "being a left 
corner" */ 
is_a_left_corner( 
\[Real Lef~_Corner_Category, 
RealLeft_CornerCategory_Structure\], 
\[Real Goal_Category,RealGoal_Category_Structure\], 
InputString, RestString ) 
:- 
/* a rule having the current left corner category 
as its left corner is to be found */ 
findruletobeused( 
Real_Left_Corner_Category\], 
Left_Daughter._Marking, 
Right_Sisters_ListFromCurrent_Rule, 
Mother_Category_From_Current Rule, 
Constraints Of Rule, 
Type Of Rule ), 
/* all the right sisters have to be parsed */ 
parse_right_sisters( 
Right Sisters_ListFromCurrentRule, 
Real_RightSisters_List, 
InputString, IntermediaryString, 
Type Of Rule ), 
call(Constraints Of Rule), 
/* the mother category from the rule must comply 
with the feature inheritance principles relevant for 
the Type Of Rule */ 
featureinheritanceprinclplesconcerning_mother( 
\[RealLeftCornerCategory 
IReal_RightSistersList\], 
Mother Category_From_CurrentRule, 
Real Mother_Category, 
Type Of Rule ), 
/* the mother itself (?herself?) is used as a left 
corner, which repeats the process on a higher level */ 
is alert corner( 
\[RealMother_Category, 
d trs={LeftDaughter__Marking = 
\[Real Left Corner Category, 
Real LeftCorner_Category_Structure\], 
IReal_Right_Slsters List \]\], 
\[Real GoalCategory,RealGoalCategory_Structure}, 
Intermediary String,RestStrlng ). 
The whole parsing process is started by asking the 
conjunction of goals 
?- parse(TOPMOSTCATEGORY, Intermediary Result, 
INPUT,\[\] ), 
perestroika(RESULT, Intermediary Result) . 
where the "RESULT" is the only output argument, na- 
mely, the resulting structure of the parse, the 
TOPMOST CATEGORY Is a skeleton category (represented 
as a "usual" Prolog llst) of the expected result (most 
often, something like "\[cat=sentence\] " or 
"\[cat=v,bar=two\]" etc.), i.e. a category which is ex- 
pected to unify with any result of the parse, the 
INPUT STRING is the input string represented as a Pro- 
log list of wordforms and the empty string "\[\]" is the 
expected rest of the input string after the parsing 
process finished. 
Wsince Ist April 1990: 
Lehrstuhl fur Computerlinguistlk 
Universit~t des Saarlandes 
Im Stadtwald 
D-6600 Saarbr~cken 
(West) Germany 
436 3 

References

Eilele A. and J. D6rre: A Lexical Functional Grammar 
System in Prolog, in: Proceedings of Coling ~86, Bonn 

M~tlumQto ¥. et al.: BUP - A Bottom-Up Parser Embedded 
in Prolog, in: New Generation Computing vol.l, 1983 

Pollard C. and I.Sag: Information Based Syntax and Se- 
mantics, vol.l: Fundamentals, CSLI Lecture Notes No. 
13, CSLI, Stanford, California 1987 
