SYNTACTIC PREFERENCES FOR ROBUST I)ARSING WIT\[t SEMANTIC PREFERENCES 
JIN WANG 
Computing Research Laboratory 
New Mexico State University 
Las Cruces, NM 88003 
E-mail: wjin@nmsu.edu 
Abstract 
Using constraints in robust parsing seems to have 
what we call "robust parsing paradox". Preference 
Sem,'ultics and Coimecfionisln both offered a promis- 
ing approach to fins problem. However, Prefelence 
Semantics has not addressed the problem of how to 
make flfll use of syntactic constraints, and Connec- 
tiouism has some inherent difficulties of its OWll 
which prevent it producing a practical system. In this 
paper we are proposing a method to add syntactic 
preferences to the Preferealce Semantics paradigm 
while nmiutaining its fundamental philosophy. It will 
be shown ttmt syntactic preferences can be coded as 
a set of weights associated with the set of symboli- 
cally manipulatable rules of a new grammar formal- 
ism. ~121e syntactic preferences such codezt can lye 
easily used to compute with semantic preferences. 
With the help of some tectmiques borrowed from 
Connectioeisln, these weights can be adjusted 
through tralrting. 
1. Introduction 
Robust parsing faces what seems to be a "para- 
dox". On one hand, it needs to tolerate input 
sentences which more or less break syntactic 
and semantic constraints; that is, given m~ ill- 
formed sentence, the parser should attempt to 
interpret it somehow rather thml simply reject it 
because it violates some constraints. In order to 
do this it allows syntactic and semantic con- 
stralnts to be relaxed. On the other hand, robust 
parsing still need to be able to disambiguate. For 
the sake of disambiguation, stronger syntactic 
and ,semantic coilstraints are required, since the 
stronger these constraints are, the greater the 
number of unlikely readings that can be rejected, 
and the more likely the correct reading is 
selected. 
A lot of work related to this problem has been 
done. The most commonly used snategy is to 
use the strongest constraints first to parse the 
input. If the parser fails, the violated constraints 
are relaxed in order to recover the failed process 
m~d arrive at a successful parse (CarboneU & 
Hayes, 1983) (Huang, 1988) (Kwasny & Son- 
dheimer, 1981) (Weischedel & Sondheimer, 
1983). The major problem with this approach is 
that it is difficult to tell which constraint is actu- 
ally being violated and, therefore, needs to be 
relaxed when the parser fails. Tiffs l)roblem is 
more serious with the parser using backtracking. 
Another strategy is based on Preference Seman- 
tics (Wilks, 1975, 1978) (Fass & Wilks, 1983). 
The idea is that all the possible sentence read- 
ings can (though not necessarily) be produced. 
All the "readings are scored according to how 
mm~y preference satisfactions they contmn", and 
"the best reading (that is, the one with the most 
preference satisfactions) is taken, even if it con- 
tains preference violations." Selfridge (1986) 
and Slator (1988) 'also use this strategy. One 
important advantage of this approach is that 
within a uniform mechanism, the semantic con- 
straints can both be used maximally for the sake 
of disambiguation m~d be gracefully relaxed 
when nece,ssmy for the sake of robusmess. How- 
ever, how to extend the preference philosophy to 
syntactic constraints have not been addressed. 
There are two li-equently used approaches to 
incorporate syntactic constraints in systems 
using semantic preferences. The first ,~s in (Slao 
tot, 1988), is to use a weak version of a rather 
typical syntactic module. The problem with this 
approach is that the syntactic constraints here 
still suffer from the problem of "robust parsing 
paradox". Another problem with this approach is 
that it shifts more bin-dens of disambiguation to 
semm~tic preferences because the syntactic con- 
stralnts need to be weak enough in order not to 
reject too many noisy inputs. The second 
approach, as in (Selfridge, 1986), is to try all the 
possible combinations without any structural 
preference. Tfie problein hexe is computational 
complexity especially with long sentences which 
have conlplex or recnrsive slJllctares in them. 
Cormectionism has also shown some appealing 
results (Dyer. 1991) (Waltz & Pollack. 1985) 
(l,ehnelt, 1991) on robusmess. However, there 
are some very difficult problems which they 
pose, such as the difficulty with complex struc- 
tures (especially recursive structures), the 
difficulty with variables, variable fiindings and 
ratification, and the difficulty with schemes and 
their instantiations. These problems need to be 
solved before such an approach can be used. in 
practical natural language processing systems 
ACRES DE COLING-92, NA~rt~s, 23-28 AO~r 1992 2 3 9 Pl~oc. ot: COLING-92, NAbrrEs, AuG. 23-28. 1992 
especially those which involve syntactic pro- 
cessing (Dyer, 1991). 
In this paper we will propose a framework of 
representing syntactic preferences 1 that will 
keep all the virtues of robustness Preference 
Semantics suggests. Furthermore, such prefer- 
ences can be learned. 
2. Sketch of Our Approach 
As we seen the idea of Preference Semantics 
about robustness is to enable the system to 1) 
generate (potentially) all the possible readings 
for all the possible inputs, 2) find out among all 
the possible readings which one is the most 
appropriate one and generate it first. To meet 
this goal for syntactic constraints we want to 
accomplish the following: 
1. A formalism with a practically feasible 
amount of symbolic mampulatable rules which 
will have enough generative power to accept any 
input and produce all the possible structures of 
it (section 3.1). 
2. A scheme to associate weights with each of 
these rules such that the weights of all the rules 
that are used to produce a structure will reflect 
how preferable (syntactically speaking) the 
whole structure is (section 2.1). 
3. A algorithm that will incorporate these rules 
and weights with semantic preferences so that 
the overall best reading will be generated first. 
(section 2.2). 
4. A method to train the weights (section 2.1 
and 3.3). 
2.1. Coding Syntactic Constraints 
The most popular method to encode syntactic 
constraints is probably phrase structure gram- 
mar. One way to make it robust is to assign for 
all the grammar rules a cost. So, a rule designed 
for well formed inputs will have less cost than 
the rules designed for less well formed inputs; 
that is to say that a higher cost grammar rule 
will be less preferred than a lower cost grammar 
rule. However there are two serious problems 
with this naive method. First, if all the ill- 
formedness will have a set of special grammar 
rules designed for them, to capture a reasonable 
I We u~ syntactic preferences in a broader sense 
which not only includes syntactic feature agreements 
but also the order of the constituents in the sentence 
and the way in which different constituents are com- 
bined. On the other hand, is yOU will ram, the absence 
of nonterminals and syntactic derivation tree~ will 
also di.staoce us from a strong syntax theory. 
variety of ill-fornledness, we need an unreason- 
ably amount of grammar rules. Second, it is not 
clear how we can find the costs for all these 
rules. Next we will show how we will solve 
these two problems. 
To solve the first problem, our approach aban- 
dons the use of phrase structure grammar, and 
instead a new grammar formalism is used. The 
idea of this new formalism is to reduce syntactic 
constructions to a relative small set of more 
primitive rides (P-rules), so that each syntactic 
construction can be produced by the application 
of a particular sequence of these P-rules. Section 
3.1 will give more details about this formalism, 
and we win just list some of its relevant charac- 
teristics here: 1. There are no nonterminals in 
this formalism. 2. Each P-ntle take only three 
parameters to specify a pattern on which its 
action will be fired. All these three parameters 
will be one of the parts of speech (or word 
types), such as noun, verb, adverb, etc. For 
example: (noun, verb, noun) ~> action3 is a 
P-rule. 3. Each rule has one action. There are 
only three possible actions. 
Since the number of parts of speech is generally 
limited 2, the total rule space (the number of all 
the possible P-rules) is rather small 3. So, it is 
possible for us to keep all of the possible P-rules 
in the grammar rule base of the parser. The out- 
put of the parser is a parsing tree in which each 
node corresponds to a word in the input sen- 
tence and the edge represents the dependency or 
the attachment relation between the words in the 
sentence. One important property of this formal- 
ism is that given any input sentence, for all of 
the normal parsing trees that can be built on it 
(the definition of normal parsing tree is given in 
next section), there exists a sequence of P-rules, 
such that the tree will be derived by applying 
this rule sequence. This means that we can now 
use this small P-rule set to guide the parsing 4 
without pre-excluding any kinds of ill-formed 
inputs for which a normal parsing tree does 
exist. 
2 For example in Longnmn Dictionary of Contem- 
porary English (1978) there are only 14 different parts 
of speech. Given that interjection can be taken care by 
Jt preprocessor and the complex parts like v adv, v adv 
prep, v adv;prep, v prep can be represented as v plus 
anme syntactic feature.s, this number can be further m- 
duce, d to 9. 
3 If there ere 10 different parts of speech, the total 
P-rule sl~ee is smaller than lOxlOxlOx3 = 3000. 
4 Because of the searching algorithm we used, a 
lot of searching paths sanctioned by p-rules will be 
pruned by semantic prefurences. In this sense the 
AcrEs DE COLING-92, NANTES, 23-28 AOt3X 1992 2 4 0 PRec, OF COLING-92, NANTES, AUG. 23-28, 1992 
To solve the second problem, we will use some 
techniques bon'owed from the connectionist 
cmnp. As we have mentioned before each rule 
takes three parameters to specify a pattern. 
Each of these parameters will associate with a 
vector of syntactic feature 5 roles. So, every P- 
rule will have a feature rote vector: 
(FFlt ...... Fln, F21 ...... F2n, F31 ...... 
Each feature role is in turn associated with a 
weight. So each P-rule also has a vector of 
weights: 
(W W W 11 ...... W .ln' W21 ...... W2n' 
31 ...... 3n ) 
If a rule is applicable at a certain situation, the 
feature roles will be filled by the corresponding 
feature values given by the language objects 
matched out by the pattern of the rule. The 
value of each feature can be either 1 or 0, Let 
F' i . denotes the value filling role F i • at one par- 
tic~ar application. The cost of al~plying that 
rule at that situation will be: 
-Y. Wij • F'ij (*) 
The weights can be trained by a training algo- 
rithm similar to that used in the single layer per- 
ceptron network. 
To summarize, the syntactic constraints are 
encoded partly as P-rules and partly as the 
weights associated with each p-ride. The P-rnles 
are used to guide the parsing and tell which con- 
stituent should combined with which co~tsti- 
tuent. There are two types of syntactic prefer- 
ences encoded in the weights: the preferences of 
the language to different P-rules and the prefer- 
ences of each P-rule to different syntactic 
features. "l~e P-rule is still symbolic and work- 
ing on a stack. So, the ability of recursion is 
kept. 
2.2. Organization of the Parser 
At the topmost level, the parser is a standard 
searching algorithm. Due to its efficiency, A* 
algorithm (Nilsson, 1980) is used. However, it is 
worth noting that any other standard searching 
algorithm can also be used. For any searching 
parsing is actually guided by both syntactic prefer- 
enccs and semantic preferences. 
5 The use of term "syntactic feature" is only to 
make it distinct from semantic preference and seman- 
tic type . We, by no means, try to exclude th~ use of 
those features which are generally considered as so- 
mantle feantres but which will help to make the right 
syntactic choice. 
algorithm, the following things need to be 
specilied: 1. what the states hi the searching 
space are: 2. what the initial state is; 3. what the 
action roles are aed taking one state how they 
create new states; 4. what the cost tff creating a 
new node is (please note that the cost is actually 
charged to the edge of the searching graph); 5. 
what the final states are. 
In the pmser, tile searching states are pairs like: 
(<partial resutt>,<unparsed sen- 
tence>) 
The pml:ial re.suit is represented a.s one or more 
parsing trees which arc kept on a working stack. 
Details of this representation are given in next 
section. The initial state is a pair of an empty 
stack and a sentence to be parsed. The action of 
the searching process is to look at the next 
read-in word and the roots of the top two taees 
on the wolking stack, which represent the two 
most recent /ound active constituents. The 
searching will then search for all the applicable 
P-rules based on what it sees. All P-rules it has 
found will then be fired. The action part of these 
P-rules will decide bow to manipulate the trees 
on the stack and the current read-in. At will also 
decide whether it needs to read in next word. 
Therefore, for each of these P-rules being fired, 
a new state is created. The cost of creating this 
new state is following: the cost of applying the 
P-rule on its father state; tile degree of violation 
of any new local, preference just being found in 
the newly built trees; the cost of contextual 
preference 6 associated with file new consumed 
read in sense. All these costs are tmrmalized and 
added to yield the total cost of creating this new 
state. The reason for normalization is that, for 
example, the cost of applying P-rule (see section 
2.1) needs to be norntalized to be positive as it 
is required by A* algorithm. Tile relative magni- 
tudes of different types of costs need also be 
balanced, so that one kind of cosls will not 
overwhelm the others. The final states of the 
searching are the states where the unparsed sen- 
tence is nil and the working stack has only one 
complete parsing tree on it. Obviously the out- 
put of this searching algorithm is the reading 
(represented as the n'ee) of dte input sentence 
which violates the least amount of Sylltactic and 
semantic preferences. 
So far the heuristics used in the A* searching is 
simply taken to be: 
6 For the idea of contextual prefe.rence~ see (Slator, 
1988). 
AcrEs DE COLING-92, NANTES, 23-28 AOl'Yr 1992 2 4 1 t'l~nc. Ov COIANG-92, NANTES, AUG. 23-28, 1992 
tX/C 
where ct is a constant. I is the length of the 
unparsed sentence, c is the average cost for pars- 
ing each word. 
It is needed to mention that the input sentence 
we talked above is actually a list of lexical 
frames. Each lexical flame contains the follow- 
ing information about the word it stands for: the 
part of speech, syntactic features, local prefer- 
ences, contextual preferences, etc. Since words 
can have more than one senses and one lexical 
frame will only represent one sense, for each 
read-in word, the parser will have to work for 
each of its senses and produce all of their chil- 
dren. 
3. Some Details 
3.1. P-rules, Parsing Trees, and P-parsers. 
Given an input string a on a vocabulary E 7, a 
parsing tree of the input is a trees with all its 
nodes corresponding to one of the words in the 
input string. For example, one of the parsing 
trees for input abcde is: 
(e (a) (c (b) (d))) 
Here the head of each list is the root of the tree 
or the subtree, and the tail of the list are its chil- 
dren. Intuitively the parenthood relation reflects 
the attachment or dependency relation in the 
input expression. For example, given an input 
expression a small dog, the dog will be the 
father of both a and small. The parsing tree is 
more formally defined as follows: 
Definition (Parsing Tree): Given an input string 
a, the parsing tree on a is recursively defined as 
following: 
I. (a) is a parsing tree on a, if a is in 
2. (a T 1 T 2 ...... Ti) is a parsing tree 
on a, if a is in a and T k (l~..<i) is 
also a parsing tree on a. 
Definition (Complete Parsing Tree): Suppose T 
is a parsing tree on a. If for all the a in a, a is 
also in T, then T is a complete parsing tree on 
To reduce the computational complexity, we 
will limit our attentions to only one special type 
of parsing trees, namely the normal parsing tree. 
7 in tho actual pars~ I~ is tho ~t of all tho parts of 
Ipeech. 
S The order of children for any node is significant 
hera, and wo will usa LIsP convention to rapre.lasnt 
the par~ing ~a¢ in this paper. 
It should not be difficult to see and prove from 
the following definition that normal parsing 
trees are simply the trees which do not have 
"crossing dependencies" (Maling & Zaenen, 
1982). 
Definition: (Normal Parsing Tree): Suppose T is 
a parsing tree on a. T is a normal parsing tree 
on a iff for all the nodes in T, if they have chil- 
dren, say T.. T, then all the nodes in T 1'" "" ' . . I 
must be appeared ~fore all nodes m Tjm the 
input stnng a, where l<i<j~,k. 
The P-parser has an input tape, and the reading 
head is always moving forward. The P-parser 
also keeps a stack of parsing trees which will 
represent the parsing result of the part of the 
input which has already been read. Besides there 
is a working register in the P-parser to store the 
current read in word. As you will see later, the 
read-in word is actually transformed into a U'ee 
before it being stored in the working register. 
The configurauon of the P-parser is thus defined 
as a triple \[<the stack>,<content of working 
register>,<unparsed input>\]. The P-rule is used 
to specify how the P-parser works. If the mput 
is of a vocabulary E, the P-rules on it can be 
defined as follows: 
Definition (P-rule): A P-rule on E is a member 
of set E' xE' xE' xA, where E' isE to \[nil} 
and A is the set of actions defined below. 
Definition (Action): Actions are defined as func- 
tions of type E ~ E, where E is the set of 
configurations. There are total three different 
acUons used in P-rules, and they are listed 
below: 
5~\[S,C,R\]= \[(cons C S), (list (car 
R)), (cAr R)\] 
5?\[S,C,R\]= \[(cdr S), (cons (car C) 
(Cons (car S) (cdr C))), R\] 
5~s\[S,C,R\]= \[(cons (append (car S) 
t (cadr S))) (cddr S)), C, R\] 
Action 1 simply pushes the current read-in on 
the stack and then reads the next word from the 
input. Action 2 attaches top of the stack as the 
first child of the current read-in stored in the 
working register. Action 3 pops the top of the 
stack first and then attaches it to the second top 
of the stack as its last child. 
The initial configuration for the P-parser is \[nil, 
(list (car a)), (cdr ct)\], where the a is the input 
string to be parsed. There is a set of P-rules in 
each P-parser to specify its behavior. The P- 
parser will work non-deterministically. A P-rule 
can be fired iff its three parameters march the 
roots of the top two parsing trees on the stack 
and the root of the tree in the working register 
ACRES DE COLING-92, NANTES, 23-28 Aotrr 1992 2 4 2 PROC. O1: COLING-92, NANTES, AUG. 23-28, 1992 
respectively. Note the P-rule does not care about 
the unparsed input part in the configuration tri- 
ple. A cmtfiguratiml is a final configuration iff 
the unparsed input string and the working regis- 
ter are all empty mid the stack only has one 
parsing tree on it. A input is grammatical if and 
only if the P-parser can reach from the initial 
configuration to a final configuration by apply- 
ing a sequence of P-rules taken from its gram- 
mar set. If there are more than one possible final 
states for a given input sWing and the parsing 
txees produced at these states are different, then 
we say that the input swing is ambiguous. Here 
is a simple example to show how the P-parser 
works. You may need a pen and a piece of 
paper to work it out. Given input abcde (a good 
friend of mine, for example), the parsing tree (c 
(a) (b) (d (e))) can be constructed fly applying 
the following sequence of rules: 
(nil,nil,a)=> 1 (a,nil,b)=> 1 (b,a,c)=>2 
(a,nil,c)=>2 (nil,nil,c)=>l (c,nil,d)=>l 
(d,e,c)=>l (c,d,nil)~.>3 (d,c,nll)=>3 
Most properties of this formalism will not be of 
the interests of this paper, except this theorem: 
Theorem: A P-parser with all the P-rules on E 
as its grammar set cm~ produce all the complete 
normal parsing trees for any input sWing on Z. 
PROOF: It is easy to prove this by induction on 
the length of the input string, r2 
The theorem tells that a P-parser with all the 
possible P-rules as its rule set can produce for 
any input string all of its possible parsing trees 
which have no crossing dependencies. This 
alone may not be very useful since this P-parser 
will also accept all the strings over E. However, 
as we have shown in the last section, with a 
proper weighting scheme, this P-parser offers a 
suitable framework for coding syntactic prefer- 
ences. 
3.2. Syntactic Preferences in the Weights of 
P-rules 
Each lexical item will have a vector (of a fixed 
length) containing syntactic features. The value 
of the each feature is either 1 or 0. Each of the 
three parameters in the P-rule is associated with 
a vector (of the same length) of syntactic feature 
roles. Each of these syntactic feature roles is 
associated with a weighL Each weight thus 
encodes a preference of the P-rule toward the 
particular syntactic feature it corresponds to. A 
higher weight means the P-rule will be more 
sensitive to the appearance of this syntactic 
feature, and the result of applying this rule 
therefore will be more competitive when it does 
appeaa'. The preferences of the language to dilC 
ferent P-rules arc 'also ~ellected in weight,s. 
Instead of being reflected in each individual 
weight, they are reflected in the distribution of 
weights in P-rules. A P-rule with a higher aver- 
age weight is generally more favored than a P- 
ntle with a lower average weight, since the 
higher tile average weight is, tile lower the cost 
of applying the P-rule tends to be. It is also 
necessary to emphasize that these two types of 
preferences are closely integrated. 
3.3. Weight Learning 
The weights of ,all the P-rules will be trained. 
The training is supervised. Tllere are two dif- 
ferent methods to train the weights, The first 
method takes an input string and a correct pars~ 
ing tree for that stnng. We will use an algoritlmt 
to computer a sequence of P-rules that will pro- 
duce the given parsing tree from the given input 
string. "\[qfis algorithm is not difficult to design, 
and due to the space limits we will not present it 
here. After the P-rule sequence is produced, 
each of P-rule in the sexluence will get a bonus 
5. The bonus will farther be distributed to the 
weights in the P-rule in the same fashion as that 
in the single layer perceptron network, that is: 
AWij = ~llSF'ij 
where r I is the learning factor. 
The second method requires less interventions. 
The parser is let to work on its own. 'llae user 
tufty need to tell the parser whether the output it 
gives is correct. If the result is correct, all the 
P-rules used to construct this outpnt will get a 
bonus and all the rules used during the parsing 
which did not contribute to tim output will get a 
punishinent. If the answer is wrong, all the Po 
rules used to construct this artswer will get a 
punishment, and the parser will continue to 
search for a second best guess. The Ixmus and 
punishment will be diswibuted to the weights of 
the P-rule in the san~e maimer as that in the first 
method. 
The first method is more appropriate at the early 
stage of the training when most of the rules 
have about the stone weights. It will take much 
longer for the parser to find the correct result on 
its own at this time. On the other hand, the 
second method needs less interventions. So, it 
will be more appropriate to be used whenever it 
can work reasonably well. 
It is well known that the single layea" perception 
net can not be trained to deal with exclusive-or. 
The same situation will also happea here. Since 
exclusive-or relations do exist between syntactic 
AcrEs DE COLING-92, NAm'.ES, 23-28 AO~" 1992 2 4 3 I'ROC. OF COLING-92, NANTES, AU(L 23-28, 1992 
features, we need to solve this problem. The 
simplest solution is to make multiple copies for 
the P-role, and, hopefully, each copy will con- 
verge to each side of the exclusive-or relation. 
3.4. Measuring Preference Violations 
The syntactic preference violation is measured 
by formula (*) in section 2. Both action 2 and 
action 3 of P-rule actions make an aaachrnent, 
and some semantic preferences may be either 
violated or satisfied by the attachment. So, after 
each application of such p-ride actions, the 
parser needs to check whether there are some 
new preferences being violated or satisfied. If 
there are, it will computer the cost of these vio- 
lations and report it to the searching process. 
Similarly, eacli action 1 will cause a new con- 
textual preference being reported. The measure- 
ment for the violation degree of both local 
preferences and contextual preferences is basi- 
cally taken from PREMO (Slator, 1988), which 
we will not repeat here. 
4. Conclusion and Comparisons 
Preference Semantics offers an excellent frame- 
work for robust parsing. However, how to make 
full use of syntactic constraints has not been 
addressed. Using weights to code syntactic con- 
straints on a relatively small set of P-rules (from 
which all the possible syntactic structures can be 
derived) enables us to expand the philosophy of 
Preference Semantics from semantic constraints 
to syntactic constraints. The weighting system 
not only reflects the preferences of the language, 
say English, to different P-rnles but also reflects 
the preferences of each P-rule to each syntactic 
feature. Besides of this, it also offers a nice 
interface so that we can integrate the application 
of these syntactic preferences nicely with the 
application of both local semantic preferences 
and contextual preferences by using a highly 
efficient searching algorithm. 
This project has also shown that some of the 
techniques commonly associated with the con- 
nectionism, such as coding information as 
weights, training the weights, and so on, can 
also be used to benefit symbolic computing. The 
result is gaining the robustness and adaptability 
while not losing the advantages of symbolic 
computing such as recursion, variable binding, 
etc. 
The notion 'syntactic preference' has been used 
in (Pereira, 1985) (Frazier & Fodor, 1978) 
(Kimball. 1973) to describe the preference 
between Right Association and Minimal Attach- 
ment. Our approach shares some similarities 
with (Pereira, 1985), in that MA and RA simply 
"corresponds to two precise roles on how to 
choose between alternative parsing actions" at 
certain parsing configurations. However, he did 
not offer a framework for how one of them will 
be preferred. According to our model the prefer- 
ence between them will be based on the weights 
associated with the two rules, the syntactic 
features of the words involved and the semantic 
preferences found between these words. Besides, 
the idea of syntactic preferences in this paper is 
more general than the one used in their work, 
since it includes not only the preference between 
MA or RA but other kinds of syntactic prefer- 
ences as well. 
Wilks, Huang and Fass (1985) showed that 
prepositmnal phrase attachments are possible 
with only semantic information. In their 
approach syntactical preferences are limited to 
the order of matchings and the default attaching. 
Their atxaching algorithm can be seen as a spe- 
cial case of the model we proposed here, in that, 
if they are correct, the preferences between the 
sequences of rules used for RA and MA would 
turn out to be very similar so that the semantic 
preferences involved would generally over sha- 
dow their effects. 
There are some significant differences between 
our approach and some hybrid systems (Kwasny 
& FaJsal, 1989) (Simmons & Yu, 1990). First, 
our approach is not a hybrid approach. Every- 
thing in our approach is still symbolic comput- 
ing. Second, in our approach the costs of the 
applications of syntactic constraints are passed 
to a global search process. The search process 
will consider these costs along with the costs 
given by other types of constraints and make a 
decision globally. In a hybrid system, the syn- 
tactic decision is made by the network locally, 
and there is no intervention from the semantic 
processing. Third, our parser is non- 
deterministic while the parsers in hybrid systems 
are deterministic since there is no easy way to 
do backtracking. It is also easy to see that 
without the intervention of semantic processing, 
lacking the ability of backtracking is hardly an 
acceptable s~ategy for a practical natural 
language parser. Finally, in our approach each 
P-rule has its own set of weights, while in the 
hybrid systems all the grammar rules share a 
common network, and it is quite likely that this 
net will be overloaded with information when a 
reasonably large grammar is used. 
ACTES DE COIANG-92, NANTES, 23-28 ^Ot~T 1992 2 4 4 PROC. OF COLING-92, NANTES, Aun. 23-28, 1992 
Acknowledgment 
All the experiments for this project are carried 
on the platform given by PREMO which was 
designed and implemented by Brian Slator when 
he was here in CRL. The author "also wants to 
thank Dr Yorick WiNs, Dr David Farwell, Dr 
John Barnden and one referee for their com- 
ments. Of course, the author is solely responsi~ 
ble for all the mistakes in the paper. 
Actas DE COLING-92, NANTES, 23-28 AoOr 1992 2 4 5 I'R(){:. OF COLING.92, NAN-rES, AUU. 23-28, 1992 

References 

1. J.G. Carbonell and P. J. Hayes, "Recovery 
Strategies for Parsing Extragrammtical 
Language," American Journal of Computa- 
tional Linguistics, vol. 9(3-4), pp. 123-146, 
1983. 

2, D. Fass and Y. Wilks, "Preference Semantics, 
Ill-Formedness, and Metaphor," American 
Journal of Computational Linguistics, voL 
9(3-4), pp. 178-187, 1983. 

3. M.G. Dyer, "Symbolic NeuroEngmeelmg md 
natural language processing: a multilevel 
research approach.," in Advances in Conaec- 
tionist and Neural Computation Theory, Vol. 
1., ed. J.A. Bmnden and J.B. Pollack, pp. 32-- 
86, Ablex Pnblishing Corp. , Norwood, N.J., 
1991. 

4. L. Frazter and J. D. Fodor, "The Sausage 
Machine: A New Two-Stage Parsing Model," 
Cognition, vol. 6, pp. 291-325, 1978. 

5. X. Huang, "XTRA: 'Ihe design and Implemen- 
tation of A Fully Automatic Machine Transla- 
tton System (Doctoral Dissertation)," 
Memoranda in Computer and Cognitive Sci- 
ence, vol. MCCS-88-121, Computing Research 
Laboratory, New Mexico State University, Las 
Cruces, NM 88003, USA, 1988. 

J. Kimball, "Seven Principles of Surface 
Structure Parsing in Natural Language," Cog- 
nition, vol. 2, pp. 15-47, 1973. 

S. C. Kwasny and N. K. Sondheimer, "Relaxa- 
tion theories for parsing ill-forared input," 
American Journal of Computational Linguis- 
tics, vol. 7, no. 2, pp. 99-108, 1981. 

S. C. Kwasny aruJ K. A. Faisal, "Competition 
and Learmng in a Coanectiohist Deterministic 
Parser," Procs. 11 th Annual Conf. of the Cog- 
nitive Science Society, pp. 635-642, Lawrea~ce 
Erlbaum, Hillsdale, N.J., 1989. 

W. G. Lehnert, "Symbolic/Sabsymbolic Sen- 
tence Analysis: Exploiting the best of two 
worlds.," in Advances in Com~ectionist and 
Neural Computation Theory, Vol. 1., ed. J.A. 
Banlden and J.B. Pollack, pp. 32--86, Ablex 
Publishing Corp. , Norwood, N.J., 1991. 

J. Maling and A. Zaeslen, "A Phrase Structure 
Account of Scandinavian Extraction 
Phenomeala," in The Nature of Syntactical 
Representation, ed. P. Jacobsun and G. K. Pul- 
lure, pp. 229--282, Reidel Publishing Com- 
pany, Holland, 1982. 

11. N. Nilsson, Principles of A/, Tioga pulishing 
co., Mtaflo Park, 1980. 

12. F.C.N. Pereira, "A New Characterization of 
Attachment Preference," in Natural Language 
Pursing, exl. D. R. Dowry, L. Kalltunen & A. 
M. Zwicky, pp. 307-319, Cmnhiidge Univer- 
sity Press, C,'unb\[idge, 1985. 

13. M. Selfridge, "Integrated Pr~essing Produces 
Robust Understanding," Computational 
Linguistics, wfl. 12(2), pp. 89-106, 1986. 

14. R. 17 . Sinanons mad Y. II. Yu, "Training a 
Neural Network to be a Context Sensitive 
Grmnmar," Proceedings of the 5th Rocky 
Moant<¢in Co@rence on AI, pp. 25t-256, Las 
Cruces, NM., 1990. 

15. B.M. Slator, "Lexical Semantics and Prefer- 
ence Semantics Atmlysis (Doctoral Disserta- 
tion)," Memorandn in Computer and Cogni- 
tive Science, vol MCCS-88-1, Compnting 
Research Laboratory, New Mexico State 
University, L~s Cruces, NM 88003, USA, 
1988. 

16. D.L. Waltz and J. 13. Pollack, "Massive Paral- 
lel Prosing: A Strongly h~teractive Model of 
Natural Language hlterpretation," Cognitive 
Science, vol. 9, pp. 51-74, 1985. 

17. R. M. Weischedel and N. K. Soodheitner, 
"Meta-rules as a basis for processing ill- 
formed output," Americcul Journal of Compu- 
tational Linguistics, vol. 9, no. 3-4, pp. 161- 
177, 1983. 

18. Y. Wilks, "A Prefereaaial Pattern-Seeking 
Semamtcs for Natural Language lnferalce," 
Artificial Intelligence, vol. 6, pp. 53-74, 1975. 

19. Y. Wi|ks, "Making Preferences More Active," 
Artificial Intelligence, vol. 11, pp. 197-223, 
1978. 

20. Y.A. Wilks, X Huang and D. Fass, "Syntax, 
plefercalcc and nght artaclmmnt," IJCA\[-85, 
pp. 635-642. 1985 
