Learning Lineal" Precedence Rules 
Vladimir Pericliev 
Mathema.tical Linguistics Department 
Institute of Mathematics ~md Computer Science, bl.8 
Bulg~rian Acade~ny of Sciences, 1 113 Sofia Bulgaria 
periQbgearn, acad. bg 
Abstract 
A system is descril)ed which learns fl'om 
examples the Linear Precedence rules in 
an Immediate Dominance/Linear Prece- 
dence grammar. Given a particular hn- 
mediate Dominance grammar and hier- 
archies of feature values potentially rel 
evant for linearization (=the systelu's 
bias), the leanler generates appropriate 
naturM language expressions to be ewd- 
uated as positive or negative by a teach- 
er, and produces as output IAnear Prece- 
dence rules which can be directly used 1)y 
the gralnmar. 
1 Introduction 
The lnanual cotnpilation of a. sizable grammar is 
a difficult and time-consuming task. An impor- 
tant subtask is the construction of word ordering 
rules in the grammar. '\['hough some languages 
are proclaimed as having simple ordering rules, 
e.g. either complete scrambling or strictly "fixed" 
order, most languages exhibit quite complex reg- 
ularities (Steele, 198l), and even the rigid word 
order languages (like 1,;nglish) and those with to 
tal scrambling (like Warlpiri; cf. (H ale, 1983) may 
show intricate rules (Kashket, 11981); hence the 
need for their automatic acquisition. 'Fhis I;ask 
however, to the best of our knowledge, has not 
heen previously addressed. 
This paper describes a prograln which, given 
a grammar with no ordering relations, l)roduces 
as outpu_t a set of linearization, or Linear Prece- 
dence, rules which can be directly employed by 
that grammar. The learning step uses the ver- 
sion space algorithm, a familiar techlfique from 
ma.chine learning for learning from examples. In 
contrast to most previous uses of the algorithnl for 
various learning tasks, which rely on priorly giv- 
en classified examples, our learner generates itself 
tile training instances olte at a tinie, and they are 
then classed as positive or negative by a teach- 
er. A selective generation of training instances is 
employe, d which facilitates the learning by mini- 
mizing the nu,nl)er of evaluations that the teacher 
needs to make. 
The next section (h.'scribes the hmnediate Dom- 
inance/Linear I)recedence grammar format. In 
section 3, tlle lask of learning l,inear Precedence 
rules is interpreted as a task of learning from ex- 
amples, and section 4 introduces the version space 
\]nethod. Section 5 is a system overview, and sec- 
tion 6 focuses on implementation. Finally, some 
limitations of the system a.re discussed as well as 
some directions for future research. 
2 hnmediate Dominance/Linear 
Precedence Grammars 
A standard way of expressing tile ordering of 
nodes in a grammar is I)y means of l,inear 
Precedence rules in hnme(liate l)ominance/IAnear 
I','ocede,,ce. (ID/LI') ,~ra.,,,,ai's. The m/U' for:- 
mat was first introduced by (Gazdar and Pullmn, 
1981) and (Gazdar el.. al., 1985) and is usually 
associated with GPS(\], but is also used by IIPSG 
(l'ollard and Sag, 1987) and, under different guis- 
es, by other formMisins a.s well. 
In all ll)/I,P grammar, the two types of in- 
formation, constituency (or, immediate domi- 
nance) and linear order, are separated. Thus, 
for instance, an immediate dominance rule, say, 
A--~I3 C D, with no linear Precedence rules de- 
clared, stands for the mother node A expanded 
into its siblings occurring in any order (six Con- 
text Free Grammar rules as result of the permuta- 
tions). If l;he I,\[' rule D < C is added, the ID rule 
can be expanded in the following three CFG rules: 
A---~ /3 D C; d--, D I/ C; d -+ D CB. ID/LP 
grammars capture an important ordering general- 
ization, missed by usual CFGs, by means of the so 
calle(/ "l'3xhaustive Partial Ordering Constraint", 
stating that the l)artial ordering of any two sister 
nodes is constant throughout the whole grammar. 
That is, just one of the Mlowing three situations 
is valid for the ordering of any two nodes A and 11: 
either A < B (A precedes B) or d > B (A follows 
B) or A <> 11 (A occurs in either position with 
respect to B). (The last. <> situation is normal- 
ly state(l in ll)/LI' grammars by ~lot stating an 
883 
LP rule, but we shall use it here, as we need an 
explicit ~hference to it.) 
3 The Task of LP Rules 
Acquisition Viewed As Learning 
from Examples 
A program which learns from examples usual- 
ly reasons from very specific, low-level, instances 
(positive or both positive and negative) to more 
general, high-level, rules that adequately describe 
those instances. Upon a common understanding 
(Lea and Simon, 1974), learning from examples is 
a cooperative search in two spaces, the instance 
space, i.e. the space of all possible training in- 
stances, and the rule (=h.ypotheses) space, i.e. the 
space of all possible general rules. Besides these 
two spaces, two additional processes are needed, 
intermediating them: interpretatioT~ and instance 
selection. The interpretation process is needed, in 
moving from the instance space to the rule space, 
to interpret the raw instances, which may be far 
removed in form froln the form of the rules, so that 
instances can guide the search in the rule space. 
Analogously, the instance selection rules serve to 
transform the high-level hypotheses (rules) to a 
representation useflfl for guiding the search in the 
instance space. 
A general description of our task is as follows: 
Given a specific ID grammar with no LP rules, 
find those LP rules. 1 In this task we also need 
to reason from very specific instances of LP rules 
(language phrases like small childreu, *children s- 
malt) to rnore general LP rules (adjective < noun), 
therefore it can be interpreted in terms of tile two- 
space model, described above. 
Our instance space will consist of all strings gen- 
erable by the given ID grammar (the size of this 
instance space for any non-toy grammar will be 
very large). The LP rules space will be an un- 
ordered set, whose elements are pairs ot' nodes, 
connected by one of the relations <, > or <>, 
e. a LP set = \[\[A < B\], \[B < l';\], \[E > C\], ... \]. 
(The size of the LP rules space will deperid upon 
the size of the specific ID grammar whose LP rnles 
are to be learned.) 
We also need to define the interpretation and 
instance-selection processes. In the learning sys- 
tem to be described, for both purposes serves 
(basically) a mete-interpreter for ll)/LP gram- 
mars, which can parse tile concrete grammar, giv- 
en at the outset, for both analysis and generation. 
In an interpretation phase, the mete-interpreter 
will parse a natural language expression out- 
putting an LP-rules-space approriate representa- 
tion, whereas in the instance-selection phase the 
1Though indeed this is the usual way of looking at 
the task, sometimes we may need to start with some 
LP rules already known; the program we shMl describe 
supports both regimes. 
meta-interpreter, given an LP space representa- 
tion as input, will generate a language expression 
to be classified as positive (i.e. not violating word 
order rules) or negative (i.e. violating those rules) 
by a teacher. 
4 The Version Space Method 
There are a variety of methods in the AI litera- 
ture for learning from exarnples. For handling our 
task, we have chosen tile so called "version space" 
method (also known as the "candidate elimination 
algorithm"), cf. (Mitchell, 1982). So we need to 
have a look at this method. 
Tile basic idea is, that ill all representation lan- 
guages for the rule space, there is a partial order- 
ing of expressions according to their generality. 
This fact allows a compact representation of the 
set of plausible rules (=hypotheses) in the rule s- 
pace, since the set of points in a partially ordered 
set can be represented by its most general and its 
most specific elements. Tile set of most general 
rules is called the G-set, and tile set of most spe- 
cific rules tile S-set. 
Figure 1 illustrates the LP rules space of a de- 
terminer of some grammatical nulnber (singular 
or l)lura.l) and an adjective, expressed in predicate 
logic. 
Viewed top-down, the hierarchy is in descending 
order of generality (arrows point from specific to 
general). The topmost LP rule is most general and 
covers all the other rules, since det(Num), where 
Nnm is a variable, covers both det(sg) and det(pl), 
and <> covers both < and >. Each of the rules at 
level 2 are neither more general nor more specific 
than each other, but are more general than the 
most specific rules at the bottorn. 
The learT~ing method assumes a set of positive 
and negative examples, and its aim is to induce 
a rule which covers all the positive examples and 
none of the counterexamples. Tile basic algorithm 
is as follows: 
(1) The G-set is instantiated to the most gener- 
al rule, and the S-set to the first positive example 
(i.e. a positive is needed to start the learning pro- 
cess). 
(2) The next training instance is accepted. If it 
is positive, frorn the G-set are removed the rules 
which do not cover the example, and the elements 
of S-set are generalized as little as possible, so that 
they cover the new instance. If the next instance is 
negative, then fl'om the S-set are removed the rules 
that cover the counterexample, and the elements 
of the G-set are specialized as little as possible so 
that the counterexample is no longer covered by 
any of the elements of the G-set. 
(3) The learning process terminates when the 
G-set and the S-set are both singleton sets which 
are identical. If they are diflhrent singleton sets, 
the training instances were inconsistent. Other- 
wise a new training instance is accepted. 
884 
deL(Nun,) <> a~ti 
det(sg) <> adj det(Num) < adj det(Nuni) > adj de.t(pl) <> adj 
det(sg) < adj det(pl) < adj do.t(sg) > adj det(pl) > adj 
Figure 1: A Generalization hierarchy 
Now, let us see how this works with the l,P rules 
version space in Figure 1, asslllning further the 
following classitied exaanplcs ((+) nteans l)ositiw~, 
and (-) negative instance): 
(+) det(sg) < a,ti 
(--) det(sg) > adj 
(+) dct(pl) < adj 
The algorithni will instanLiate the {i-set to the 
n:iost general rule iii tile version space, and tim 
,5'-set to the first positive, obl, aining: 
G-set: \[\[dot(Nu,n) <> ad.i}\] 
s-set: \[\[det(sg) < add\]\] 
'/'hen the next exmnple will lie accepted, which 
is negative. The current ~g-sel does not cover it, 
so it relnains the sanle; the G-set is specialized as 
little as possible to exchlde tile negative, which 
yields: 
G-sot: \[\[det(Num) < adj\]\] 
S-set: < adj\]\] 
The last example is positive. \]'lie (;-sol reliiaill- 
s l, he same since it covers the positive. The ,5'-set 
however does not, so it has to be uiilliinally gen- 
eralized to cover it, obtaining: 
G-set: \[\[de.t(Nuui) < a~ti\]\] 
S-sot: \[\[det(Num)< acid\]\] 
These are singleton sets which are identical, and 
the resultant (consistent) generalization is there-- 
fore: \[det(Num) < adj\]. That is, a determiner of 
any grarrniiatical lunnber niust precede a,n adjec-- 
rive. 
5 Overview of the Learner 
Our learning program has two basic modules: IAm 
version space learner which performs the olenmn-. 
tary learning Step (as descril)ed in 1.he previous 
section), and a nmta-hiterpretcr for ll)/I,P grain- 
iiiars which serves the processes of interpretai.ion 
alld instance selection (as described in section 3). 
The learning proceeds in a dialog forni with tile 
teacher: for the learning of each individual l,P 
rule, the system produces natural language phras- 
es to be classitied by the teacher mttil it can con- 
verge to a single concept (rule). The whole process 
ends when all LP rules are learned. 
At tile outset, the prograrn is supplied with the 
specific H) grallHl-lar whose l,P rules are to be ac- 
quired, and the user-provided bias of the. system. 
The latl;er implies an explicit statement Oil the 
part of tlw user of what featm'es and values are 
relevant to the task, by ilqmtting the correspond- 
ing generalization hierarchies (the precedence gen- 
eralization hierarchy is taken for granted). 
In the particular implementation, the accept- 
able 11) grammar format is essentially that of a 
logic gt'ammar (Pereira and Warren, 1980), (l)ahl 
and Abramson, 1990). We only use a double arrow 
(to avoid mixing up with the often built-in Deft- 
nite Clause ()irammar notation), and besides emp- 
ty productions and sisters ha.ving the very same 
nallm are not allowed, since they interfere with 
1,1 > rules statmnenl.s, of. e.g. (Sag, 1987), (Saint- 
l)izier, 1988). 
6 Tile implementation 
Below we discuss the basic aspects of tile imple- 
mentation, illustrating it; with the ll) grannnar 
wil, h no LP restrictions, given on Figure 2. 
The grammar will generate simple declarative 
and interrogative sentences like The Jonses read 
this thick book, The 3onses read these thick books, 
Do the Jonses smile, etc. as well as all their (un- 
granuimtical) permutations Read this thick book 
the ,lo,lscs, 7'he Jonses read thick this book do, 
ct, c. 
The progranl knows at the outset that the val- 
ues "sg" and "pl" are hoth more specitic than l, he 
varial)lc "N ran", mal;ching any mmtber (this is tilt 
bias of the system). 
Step I. The prograni determines the siblings 
885 
(1) s ==> name, vp. 
(2) sq ~ aux, name, vp. 
(3) vp ==> vtr, np. 
(4) vp --> vintr. 
(5) np =:=> det(Num),adj,n(Nm,,). 
(6) , arno \[the-jonses\]. 
(7) n(sg) ~ \[book\]. 
(8) n(pl) =:~ \[books\]. 
(9) det(sg) ~ \[this\]. 
(10) det(p,) \[these\]. 
(11) act(_) \[the\]. 
(12) adj ==> \[thick\]. 
(la) vtr \[read\]. 
(14) vintr ~ \[smile\]. 
(15) aux \[do\]. 
Figure 2: A simple i1) grammar with no LP constraints 
(=the right-hand sides of ID rules) that will later 
have to be linearized, by collecting them in a par- 
tially ordered list. Singleton right-hand sides (rule 
(4) above and all dictionary rules) are therefore 
left out, and so are cuts, and "escapes to Pro- 
log" in curly brackets, since they are not used to 
represent tree nodes, but are rather coustraints 
on such nodes. Also, if some right-hand side is a 
set which (properly) includes another right-hand 
side (as in rule (2) and rule (1) abowe), the latter 
is not added to the sibling list, since we do not 
want to learn twice the linearization of some two 
nodes ("name" and "vp" in our case). The sib- 
ling list then, after the hierarchical sorting frollt 
lower-level to higher-level nodes, becomes: 
\[\[aux,name,vp\] ,\[vtr,np\],\[det (Nurn),adj ,n(Num)\]\] 
Now, despite the fact that the set of LP rules we 
need to learn is itself unordered, the order in which 
the program learns each individual LP rule rnay 
be very essential to the acquisition process. Thus, 
starting Dom the first, element of the above sib- 
ling list, viz. \[aux, name, vp\], we will be in trouble 
when attempting to locate the misorderings in any 
negative example. Considering just a single nega- 
tive instance, say The Jonses read thick this book 
do: What is(are) the misplacenmnt(s) and where 
do they occur? In the higher-level tree nodes \[aux, 
name, vp\] or in the lower-level nodes \[vtr, np\] or 
in the still lower \[det(Num),adj,n(Num)\] ? 
Our program solves this problem by exploiting 
the fact, peculiar to our application, that the n- 
odes in a grammar are hierarchically structured, 
therefore we may try to linearize a set of nodes 
A and B higher up in a tree o~lly after all lower- 
level nodcs dominated by both A and ft have al- 
ready been ordered. Knowing these lower-lewq LP 
rules, our rneta-interpreter would never generate 
instances like The Jonses read thick this book do, 
but only some repositionings of the nodes \[aux, 
name, vp\], their internal ordering being gua.ran- 
teed to be correct. The sibling list then, after hi- 
erarchical sorting from lower-level to higher-level 
nodes, becomes: 
\[\[det(Num),adj,n(Sum)\],\[vtr,np\],\[aux,,mme,vp\]\] 
and the lirsl; element of this list, is first passed to 
the learning engine. 
Slep ~. The. program now needs to produce a 
first positive example, as required by the version 
sl)ace method. Taking as input the first elelneu- 
t of tim sibling list, the. II)/LP meta-interpret.er 
generates a phrase conforming to this description 
and asks the teacher to re-order it correctly (if 
needed). In our case, tile first positiw', example 
would be this thick book. The phrase will be re- 
parsed in order to determine the linearization of 
constituents. 
A word about the I1)/I,P parser/generator. Its 
analysis role is needed in processing the firs|; pos- 
itive example, and the generation role in the pro- 
duction of language examples for all intermediate 
stages of the learning process which are then e- 
valuated by the teacher. The predicate observes 
two types of LP constraints: tile globally valid LP 
rules tha.t have been acquired by the system so far, 
" and the "transitory" LP constraints, serving to 
produce an ordering, as required by an intermedi- 
ate stage of the learning process. 
l)isposing of the ordering of constituents in the 
positive example, the tra~silive closure of these 
partial orderings is computed (in our case, from 
\[\[det(Num) < adj\],\[adj < n(Num)\]\] we get \[\[de- 
t(Num) < adj\], \[ad.i < n(Num\]), \[det(Num) < 
n(Num)\]\]). This result is the. east into a rop- 
resentation that SUl)ports our learning process. :3 
20r are priorly known, in the case when the system 
starts with some LP rules declared by the user. 
3The concept we learn is actually a conjunction of 
individual I,P rules, when tile right-hand side of a rule 
consists of three or more constituents. 
886 
Dialog with, user 
this thick book (+) 
(?) 
(-) 
this book thick (-) 
thick this book (-) 
these thick books (+/ 
LP rules space ~,al representation) i 
,,;x.: \[dot,~ ;a~(~j,~,,l,_V-- 
(;: \[\[det,N,< >,a~0,#,ad.i,< >,,~,_,#,det,_,< >,n,j\] 
S: \[\[det,sg, <,adj,#,adj,<,n,_,#,det,_,<,n,_\]\] 
Ex.: \]'(tet,sg,<,adj,#,adj,<,n,_,#,(tet,_,>,n,_\]--- 
G: \[\[det,N,< >,ad.i,#,adj,< >,,l,_,#,det,_, <,n,_\]\] 
s: \[\[det,sg,<,adj,#,a4i,<,n,_,#,det,_, <,,*,all 
F-EST.: \[det,sg,<,adj,#,adj,>,n,_,},det,_,<,n,_\] 
G: \[\[(let, N,<> ,a dj,#,ad i,< ,n,_, # ,d0t,_, < ,.,_\]\] 
S: \[\[det,sg,<,adj,#,adj,<,n,_,#,det,_,<,n,_\]\] 
t;;x. : \[det,sg, >,adj,#,adj, <,n,_,#,det,_,< ,n,_\] 
G: \[\[det, N,<,ad, i,#,adj, <,n,_, #,det,_,<,,~,_\]\] 
S: \[\[det,sg, <,adi,#,adj, <,,~,_,#,det,_, <,n,_\]\] ~det,l,l,<,adj,#,ad.i,<,n,_,#,do<,<,,,,_I---- 
G: \[\[tier,N, <,~dj,#,ad.i,<,,,,_,#,det,_,<,,l,_\]\] 
S: \[\[det,sg, < ,adj, # ,adj, <,n,_, # ,det,_, <,n,_\]\] 
-A c~,,-777 
adj < n(sg) 
det(sg) < n(sg) 
det(sg) < adj 
adj < n(sg) 
det(sg) > n(sg) 
~et(sg) < adj 
adj > n(sg) 
,let(sg) < ,,(sg) 
--~let(sg) > ~dj 
adj < n(sg) 
det(sg) < .(sg) 
adj < n(pl) 
det(pl) < n(pl) 
Consistent generalization: -~det(N) < adj 
\[det,N,<,adj,#,adj,<,,,,_,#,det,_,<,n,_\] ad.i < ,I(N) 
det(N) < n(N) 
Figure 3: A sample session 
Step 3. q'he version space method is applie(I 
and the individual 1,1 ) rules, resulting fron, lind- 
ing a consistent generalization, are asserted in the 
II)/bP granmm.r datal)ase to lie resl)eete,l by any 4 
further generation process. 
Figure 3 gives a learning cycle starting \['rein the 
sibling list ele,nent \[det(Nun,),adj,n(Nu,,,)\]. The 
first column gives the dialog with the teacher, the 
second the program's internal representation of 
the l,P rules space., and the third those |'ules a.re 
expressed in their nnore familiar, and final, form 
that can be utilized directly by the 11) gra.nmmr. 
Afl,er processing the lirst 1)ositive (tirst row), 
tile system generalizes by varying a paraluet(:r 
(imnd)er or l)recedenec), verbalizes 1,he general- 
ization, the generated phrase is class|tied by the 
teacher, then another generalization is made, de- 
pending on the classiiication, it is verbalized, eval- 
uated and so on. The 1)rocess results in the three 
l,P rules: det(Num) < adj; adj < ,,(Num); and 
det(N.,n) < ,~(N.,,,). 
A remark on notation: # delimits individual l,P 
rules, allowing their recovery in terms of Prolog 
structures. The underbars, _, are nmrely place- 
holders for bound variables (in our case. those 
bound to "N"). Clearly, mutually depemhmt flea- 
ture values need to be eonsidcre(l (i.e. wu'ie(I I)y 
the program) only once, and so they occtu" just 
once in the expressions. 
Severa.1 additional points regarding the learning 
process need to be lnadc. 
4Assertions are actually made after (:he(:king for 
consistency with LP's already present in the database. 
Though no contradictions may ~u'isc with acquired 
rules, they may come from LP's declared by the user 
it, the case when the system is started with some such 
LP's. 
The firsl i~ that after couw.q'ging to a. single l,p 
rule, it is tesl.e(l wlmther 1.his rule covers allluost 
specific instnnces. For doing this, the stated gen- 
¢~ralization hierarchies are t.aken into account a- 
longside with the fact that in an II)/LP format a 
rule of tile type d > 13 logically implies the nega- 
lion of its "inverse rule" A < IL Thus, the rule 
det(Num) < adj covers all potential most specif~ 
ie instances since the rule itself and its inverse 
rule det(Nmn) > adj cover them, which is clearly 
seen on the generalization hierarchy in Figure 1. 
\[\[" SO\]fie lllOSt sl)eci\[i(: illsta, l|('es rettlaill /lllcovere(.l~ 
theu they are fed again to the version space algo~ 
ril.hlzl for a second pass. 
The second point is that when it is impossible 
\['or SOll|e structur(" to be verbalized due to cont, ra- 
dictory LP statelnellts (as ill the second row), the 
system itself evaluates this exa.ml)le as negative 
and l)roceeds fln'ther. 
We also nee(I to emphasize that the program 
selectively, rather than randomly, wries the po- 
tentially relevant parameters (munber and prece- 
dence, in this particular case), ~ttempting to 
converge the. generalization process most quick- 
ly. This is done in order to minilnize the nunnber 
of training iustanees that need to be generated, 
and hence (,o nfinimize the number of evMuations 
that tim teach(~r lice(Is to Ina.ke. In other words, 
being generalization-driven, the generator never 
produces training instances which arc superfluous 
to the generalization pro('ess. '\]'his, in particu- 
lar, allows the program to avoid outputting all 
strings generable by the grammar whose LP rules 
are being acquired {notice, for instance, in the 
lh'st colunm of Figure 3 that no language expres- 
sion involving the dictionary rule (11) det(_) 
\[the\] from Figure 2 is displayed to the user). 
887 
In this respect our approach is in sharp contrast 
to a learning process whose training examples are 
given en bloc, and hence the teacher would, of 
necessity, make a great lot of assessions that the 
learner would never use. 
Step ~. The learning terminates successfully 
when all LP rules are found (i.e. all elements of 
the sibling list are processed) and fails when no 
consistent generalization may be found for some 
data. The latter fact needs to be interpreted in the 
sense that these data are not correctly describable 
within the ID/LP format. 
7 Conclusion and Future Work 
We have described a program that learns the LP 
rules of an ID/LP logic grammar in a form that 
can be directly utilized by that grammar. This 
task has not been addressed in previous work. 
We conclude by mentioning some limitations of 
the system suggesting future directions for inves- 
tigation. 
It is known that the version space method mis- 
behaves on encountering noisy data: an instance 
mistakenly classed as negative e.g. may lead to 
premature pruning of a search branch where the 
solution may actually lie. This may be a prob- 
lem in our task (and perhaps in many other lin- 
guistic tasks) since our assessments of grammati- 
cM/ungrammaticM word order are in some cases 
far from definite yes/no's. So haudling uncertain 
input is one way our research may evolve. 
Another direction for filture research is address- 
ing the learning of word order expressed in more 
complex formalisms than I1)/LP grammars. It has 
been proposed in the (computational) linguistics 
literature (e.g. (Zwicky, 1986), Ojeda, 11988, Per- 
icliev and Grigorov, 1994) that LP rules of the 
standard format may be insufficient in some ca.s- 
es, and need to be augmented with other ordering 
relations like "immediate precedence" <<, "fist", 
"last", etc., and more generally, that linearization 
needs to be stated in complex logic expressions 
connected by conjunction, disjunction and nega- 
tion. We can trivially add the relation << to the 
present learner, but the other parts of such pro- 
posals seem beyond its immediate capacity, as it 
stands. From our previous work on word order 
we despose of a parser/generator that can ham 
dle complex expressions, however we shall need 
to modify (or perhaps, even replace) our learning 
method with one which is better suited to handle 
logic constructions like disjunction and negation. 
Acknowledgements. The research reported in 
this paper was partly supported by contract 1- 526/95. 
References 
v. Dahl and H. Abramson. 1990. 
mars, Springer. 
G. Gazdar and G. Pullum. 1981. 
Logic Gram- 
Subcatego- 
rization, constituent order and the notion of 
"head". In M. Moortgat et. al. (eds). The S- 
cope of Lexical Rules, Dordrecht, llolland, pages 
107-123. 
G. Gazdar, E. Klein, G. Pullum and I. Sag. 1985. 
Generalized Phrase Structure Grammar, liar- 
yard, Cambr. Mass. 
K. Hale. 1983. Warlpiri and the grammar of non- 
configurational languages. Natural Language 
and Linguistic Th.eory, v. 1, pages 5-49. 
M. Kashket. 1987. A GB-based parser for 
Warlpiri, a free-word order language. MIT AI 
Laboratory. 
G. Lea and I1. Simon. 1974. Problem solving and 
rule induction: A unified view. In L. Gregg 
(ed). l(nowledge and Co;inition, Lawrence Erl- 
baum Associates. 
T. Mitchell. 1982. Generalization as search. Ar- 
tificial Intellige,ee, 18, pages 203-226. 
A. Qieda. 1988. A linear precedence account of 
cross-serial dependencies. Linguistics aand Phi- 
losophy, ll, pages 457-492. 
F.C.N. Pereira and D.H.D Warren. 1980. Definite 
Clause Grammars for Natural Language Anal- 
ysis. Artificial Intelligence, 13, pages 231-278. 
V. Pericliev and A. Grigorov. 1994. Parsing a 
flexible word order language. COLING'g/t, Ky- 
oto, pages 391-395. 
C. Pollard and I. Sag. 1987. b~formation-Based 
Syntax and Semanties,vol.l: Fnndamentals. C- 
SLI Lecture Notes No. 13, Stanford, CA. 
P. Saiut-1)izier. 1988. Contextual discontinuous 
grammars. In Natural Language Understand- 
ing and Logic Programming, lI, North-Holland, 
pages 29-43. 
I. Sag. 1987. Grammatical hierarchy and lin- 
ear precedence. In S?lulax and Semantics, v.12. 
Discontinuous Constituency, Academic Press, 
pages 303-340. 
Richard Steele. 1981. Word order variation. In 
J. Greenberg (ed). In Universals of Language, 
v./t, Stanford. 
A. Zwicky. 1986. hnnaediate precedence in GPSG. 
OSU WPL, pages 133-138. 
888 
