SYNTACTIC GRAPHS: A REPRESENTATION FOR 
AMBIGUOUS PARSE TREES 
THE UNION OF ALL 
Jungyun Seo 
Robert F. Simmons 
Artificial Intelligence Laboratory 
University of Texas at Austin 
Austin, TX 78712-1188 
In this paper, we present a new method of representing the Surface syntactic structure of a sentence. 
Trees have usually been used in linguistics and natural language processing to represent syntactic 
structures of a sentence. A tree structure shows only one possible syntactic parse of a sentence, but in 
order to choose a correct parse, we need to examine all possible tree structures one by one. Syntactic 
graph representation makes it possible to represent all possible surface syntactic relations in one directed 
graph (DG). Since a syntactic graph is expressed in terms of a set of triples, higher level semantic 
processes can access any part of the graph directly without navigating the whole structure. Further- 
more, since a syntactic graph represents the union of all possible syntactic readings of a sentence, it is 
fairly easy to focus on the syntactically ambiguous points. In this paper, we introduce the basic idea of 
syntactic graph representation and discuss its various properties. We claim that a syntactic graph 
carries complete syntactic information provided by a parse forestmthe set of all possible parse trees. 
1 INTRODUCTION 
In natural language processing, we use several rules and 
various items of knowledge to understand a sentence. 
Syntactic processing, which analyzes the syntactic re- 
lations among constituents, is widely used to determine 
the surface structure of a sentence, because it is effec- 
tive to show the functional relations between constitu- 
ents and is based on well-developed linguistic theory. 
Tree structures, called parse trees, represent syntactic 
structures of sentences. 
In a natural language understanding system in which 
syntactic and semantic processes are separated, the 
semantic processor usually takes the surface syntactic 
structure of a sentence from the syntactic analyzer as 
input and processes it for further understanding. ~ Since 
there are many ambiguities in natural language parsing, 
syntactic processing usually generates more than one 
parse tree. Therefore, the higher level semantic proces- 
sor should examine the parse trees one by one to choose 
a correct one. 2 Since possible parse trees of sentences 
in ordinary expository text often number in the hun- 
dreds, it is impractical to check parse trees one by one 
without knowing where the ambiguous points are. We 
have tried to reduce this problem by introducing a new 
structure, the syntactic graph, that can represent all 
possible parse trees effectively in a compact form for 
further processing. As we will show in the rest of this 
paper, since all syntactically ambiguous points are kept 
in a syntactic graph, we can easily focus on those points 
for further disambiguation. 
Furthermore, syntactic graph representation can be 
naturally implemented in efficient, parallel, all-path 
parsers. One-path parsing algorithms, like the DCG 
(Pereira and Warren 1980), which enumerates all possi- 
ble parse trees one by one with backtracking, usually 
have exponential complexity. All-path parsing algo- 
rithms explore all possible paths in parallel without 
backtracking (Early 1970; Kay 1980; Chester 1980; 
Tomita 1985). In these algorithms, it is efficient to 
generate all possible parse trees. This kind of algorithm 
has complexity O(N 3) (Aho and Ullman 1972; Tomita 
1985). 
We use an all-path parsing algorithm to parse a 
sentence. Triples, each of which consists of two nodes 
and an arc name, are generated while parsing a sen- 
tence. The parser collects all correct triples and con- 
structs an exclusion matrix, which shows co-occurrence 
constraints among arcs, by navigating all possible parse 
Copyright 1989 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided 
that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 
0362-613X/89/010019-32503.00 
Computational Linguistics, Volume 15, Number 1, March 1989 19 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
., Jl \[I'se''vl I vtp ._..._~ 
/ .I"... ., 
, , i i,.,.,...o..o, I 
I dot 
Sentence: I saw a man on the hill with a telescope. I\[8,a,artl \[ 
Figure 1: Syntactic Graph of the Example Sentence. 
trees in a shared, packed-parse forest) We claim that a 
syntactic graph represented by the triples and an exclu- 
sion matrix contains all important syntactic information 
in the parse forest. 
In the next section, we motivate this work with an 
example. Then we briefly introduce X (X-bar) theory 
with head projection, which provides the basis of the 
graph representation, and the notation of graph repre- 
sentation in Section 3. The properties of a syntactic 
graph are detailed in Section 4. In Section 5, we 
introduce the idea of an exclusion matrix to limit 
possible tree interpretations of a graph representation. 
In Section 6, we will present the definition of complete- 
ness and soundness of the syntactic graph representa- 
tion compared to parse trees by showing an algorithm 
that enumerates all syntactic readings using the exclu- 
sion matrix from a syntactic graph. We claim that those 
readings include all the possible syntactic readings of 
the corresponding parse forest. Finally, after discussing 
related work, we will ~uggest future research and draw 
some conclusions. 
2 MOTIVATIONAL EXAMPLE 
We are currently investigating a model of natural lan- 
guage text understanding in which syntactic and seman- 
tic processors are separated. 4 Ordinarily, in this model, 
a syntactic processor constructs a surface syntactic 
structure of an input sentence, and then a higher level 
semantic processor processes it to understand the sen- 
tence---i.e., syntactic and semantic processors are pipe- 
lined. If the semantic processor fails to understand the 
sentence with a given parse tree, the semantic processor 
should ask the syntactic processor for another possible 
parse tree. This cycle of processing will continue until 
the semantic processor finds the correct parse tree with 
which it succeeds in understanding the sentence. 
Let us consider the following sentences, from Waltz 
(1982): 
I saw a man on the hill with a telescope. 
I cleaned the lens to get a better view. 
When we read the first sentence, we cannot determine 
whether the man has a telescope or the telescope is used 
to see the man. This is known as the PP-attachment 
problem, and many researchers have proposed various 
ways to solve it (Frazier and Fodor 1979; Shubert 1984, 
1986; Wilks et. al 1985). In this sentence, however, it is 
impossible to choose a correct syntactic reading in 
syntactic processing---even with commonsense knowl- 
edge. The ambiguities must remain until the system 
extracts more contextual knowledge from other input 
sentences. 
The problems of tree structure representation in the 
pipelined, natural language processing model are the 
following: 
First, since the number of parse trees of a typical 
sentence in real text easily grows to several hundreds, 
and it is impossible to resolve syntactic ambiguities 
by the syntactic processor itself, a semantic processor 
must check all possible parse trees one by one until it 
is satisfied by some parse tree. 5 
Second, since there is no information about where the 
ambiguous points are in a parse tree, the semantic 
processor should check all possibilities before accept- 
ing the parse tree. 
Third, although the semantic processor might be 
satisfied with a parse tree, the system should keep the 
status of the syntactic processor for a while, because 
there is a fair chance that the parse tree may become 
unsatisfactory after the system processes several 
more sentences. For example, attaching the preposi- 
tional phrase (PP) "with a telescope" to "hill" or 
"man" would be fine for the semantic processor, 
since there is nothing semantically wrong with these 
attachments. However, these attachments become 
unsatisfactory after the system understands the next 
20 Computational Linguistics, Volume 15, Number 1, March 1989 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
sentence. Then, the semantic processor would have 
to backtrack and request from the syntactic processor 
another possible parse tree for the earlier sentence. 
We propose the syntactic graph as the output structure 
of a syntactic processor. The syntactic graph of the first 
sentence in the previous example is shown in Figure 1. 
In this graph, nodes consist of the positions, the root 
forms, and the categories of words in the sentence. 
Each node represents a constituent whose head word is 
the word in the node. Each arc shows a dominator- 
modifier relationship between two nodes. The name of 
each arc is uniquely determined according to the gram- 
mar rule used to generate the arc. For example, the snp 
arc is generated from the grammar rule, SNT ~ NP VP, 
vpp is from the rule, VP ~ VP PP, and ppn from the 
rule, PP ~ Prep NP, etc. 
As we can see in Figure 1, all syntactic readings are 
represented in a directed graph in which every ambigu- 
ity--lexical ambiguities from words with multiple syn- 
tactic categories and structural ambiguities from the 
ambiguous grammar--is kept. The nodes which are 
pointed to by more than one arc show the ambiguous 
points in the sentence, so the semantic processor can 
focus on those points to resolve the ambiguities. Fur- 
thermore, since a syntactic graph is represented by a set 
of triples, a semantic processor can directly access any 
part of a graph without traversing the whole. Finally, 
syntactic graph representation is compact enough to be 
kept in memory for a while. 6 
3 X THEORY AND SYNTACTIC GRAPHS 
B X theory was proposed by Chomsky (1970) to explain 
various linguistic structural properties, and has been 
widely accepted in linguistic theories. In this notation, 
the head of any phrase is termed X, the phrasal category 
containing X is termed_X, and the phrasal category 
containing X is termed X. For example, the head of a 
noun__phrase is N (noun), N is an intermediate category, 
and N corresponds to noun phrase (NP). The general 
form of the phrase structure rules for X theory is 
roughly as follows: 
• ~'~ ~* X 
• X---> XZ * , where * is a Kleene star. 
Yis the phrase that specifies X, and Z is the phrase that 
modifies X. 7 The properties of the head word of a 
phrase are projected onto the properties of the phrase. 
We can express a grammar with X conventions to cover 
a wide range of English. 
Since, in X theory, a syntactic phrase consists of the 
head of the phrase and the specifiers and modifiers of 
the head, if there are more than two constituents in the 
right-hand side of a grammar rule, then there are 
dominator-modifier (DM) relationships between the 
head word and the specifier or modifier words in the 
Computational Linguistics, Volume 15, Number 1, March 1989 
phrase. Tsukada (1987) discovered that the DM rela- 
tionship is effective for keeping all the syntactic ambi- 
guities in a compact and handy structure without enu- 
merating all possible syntactic parse trees. His 
representation, however, is too simple to maintain some 
important information about syntactic structure that 
will be discussed in detail in this paper, and hence fails 
to take full advantage of the DM-relationship represen- 
tation. 
We use a slightly different representation to maintain 
more information in head-modifier relations. Each 
head-modifier relation is kept in a triple that is equiva- 
lent to an arc between two nodes (i.e., words) in a 
syntactic graph. The first element of a triple is the arc 
name, which represents the relation between the head 
and modifier nodes. The second element is the lexical 
information of the head node, and the third element is 
that of the modifier node. The direction of an arc is 
always from a head to a modifier node. For example, 
the triple \[snp, \[1,see,v\], \[0~t~\]\] represents the arc 
snp between the two nodes \[1,aoo,v\] and \[0,t~_\] in 
Figure 1. 
Since many words have more than one lexical entry, 
we have to keep the lexical information of each word in 
a triple so that we can distinguish different usages of a 
word in higher level processing. The triples correspond- 
ing to some common grammar rules are as follows: 
1. N--* Det N ¢=~ \[det,\[\[nl,Rl\]lLl\],\[\[n2,R2\]lL2\] \] 
2. N-* Adj N ¢=~ \[mod,\[\[n3,R3\]lL3\],\[\[n4,R4\]ll4\] \] 
3. N --~ N Prep ¢=> \[npp,\[\[n5,R5\]lLs\],\[\[n6,R61L6\] \] 
Each ni represents the position, each Ri represents the 
root form, and each Li represents a list of the lexical 
information including the syntactic category of each 
word in a sentence. Parentheses signify optionality and 
the asterisk (*) allows repetition. 
Figure 2 shows the set of triples representing the 
syntactic graph in Figure I and the grammar rules used 
to parse the sentence. The sentence in Figure 2 has five 
possible parse trees in accordance with the grammar 
rules. All of the dependency information in those five 
parses is represented in the 12 triples. Those 12 triples 
represent all possible syntactic readings of the sentence 
with the grammar rules. Not all triples can co-occur in 
one syntactic reading in the case of an ambiguous 
sentence. 
The pointers of each triple are the list of the indices 
that are used as the pointers pointing to that triple. For 
example, Triple 2 in Figure 2 has a list of three indices 
as the pointers. Each of those indices can be used as a 
pointer to access the triple. These indices are actually 
used as the names of the triple. One triple may have 
more than one index. The issues of why and how to 
produce indices of triples will be discussed later in this 
section. 
Triple 3 in Figure 2 represents the vnp arc in Figure 
I between two nodes, \[1,aoo,v\] and \[3,ma, n,n\]. The 
node \[1,8oo,v\] represents a VP with head word 
21 
Jungyun See and Robert F. Simmons Syntactic Graphs: ,% Representation for the Union of All Ambiguous Parse Trees 
GRAMMAR RULES AND CORRESPONDING TRIPLES: 
Grammar rules arc-name head modifier 
i. SNT--~NP VP snp header VP head of NP 
2. NP--,art NP det head of NP art 
3. NP--,N' head of N' 
4. N' --,N' PP npp head of N' head of PP 
5. N'-*noun noun 
6. PP--*prep NP ppn prep headofNP 
7. VP--~V' head of V' 
8. V'-*V' NP vnp headef V' head of NP 
9. V'--*V' PP vpp head of V' head of PP 
I0. V'--~verb verb 
SENTENCE: 
1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 
10. 
11. 
12. 
I SAW AMANON THE HILL 
Triples for the Input Sentence 
\[snp, \[\[l,see\],categ,verb,tns,past\] 
\[\[O,i\],categ,noun. nbr,sing\] \] 
\[det, \[\[3,man\],categ,noun,nbr,sing\] 
\[\[2,a\],categ,art,ty,indef\] \] 
\[vnp, \[\[1,see\],categ,verb,tns,past\] 
\[\[3,man\],categ,noun,nbr,sing\] 
\[vpp. \[\[1,see\],categ,verb,tns.past\] 
\[\[4,on\],categ,prep\] \] 
\[npp, \[\[3,man\],categ, noun,nbr,sing\], 
\[\[4,on\],categ,prep\] \] 
\[det, \[\[6,hill\],categ,noun,nbr,sing\], 
\[\[5,the\],categ, art,ty,def\] \] 
\[ppn, \[\[4,on\],categ,prep\], 
\[\[6,hill\],categ,noun,nbr,sing\] \] 
\[vpp, \[\[1,see\],categ, verb,tns,past\], 
\[\[7,with\],categ,prep\] \] 
\[npp, \[\[6,hill\],categ,noun,nbr,sing\], 
\[\[7,with\],categ,prep\] \] 
\[npp, \[\[3,man\],categ.noun,nbr,sing\], 
\[\[7,w±th\],categ,prep\] \] 
\[det, \[\[9,telescope\],categ,noun.nbr,sing\], 
\[\[8,a\],categ,art,ty,indef\] \] 
\[ppn, \[\[7,with\],categ,prep\], 
\[\[9,teleseope\],categ,noun.nbr,sing\] \] \[15\] 
Figure 2 Grammar Rules and Example triples. 
WITH A TELESCOPE 
Pointers 
\[22\] 
\[02, 09, 20\] 
\[03, zo, 2l\] 
\[13, 24\] 
\[08, 19\] 
\[06, 17\] 
\[07, 18\] 
\[28\] 
\[18\] 
\[25\] 
\[14\] 
\[1,see,v\], and the node \[34"aan,n\] represents an NP 
with head word \[3,ma, n,n\]. \[ 1,see,v\] becomes the head 
word, and \[3,rna, u,u\] becomes the modifier word, of 
this triple. The number 1 in r 1,sea,v\] is the position of 
the word "see" in the sentence, and v (verb) is the 
syntactic category of the word. Since a word may 
appear in several positions in a sentence, and one word 
may have multiple categories, the position and the 
category of a word must be recorded to distinguish the 
same word in different positions or with different cate- 
gories. 
A meaningful relation name is assigned to each pair 
of head and modifier constituents in a grammar rule. 
Some of these are shown at the top of Figure 2. Rules 
for generating triples augment each corresponding 
grammar rule. Some grammar rules in Prolog syntax 
used to build syntactic graphs are shown in Figure 3. 
An informal description of the algorithm for generat- 
ing triples of a syntactic graph using the grammar rules 
in Figure 3 is the following: The basic algorithm of the 
parser is an all-path, bottom-up, chart parser that con- 
structs a shared, packed-parse forest. Unlike an ordi- 
nary chart parser, the parser uses two charts, one for 
~ i. snt--*np + vp 
gr(\[snt, Vhd\], 
\[\[np, Nhd\], ~vp. Vhd\]\], 
( true ), 
\[\[snp, Vhd, Nhd\]\]). 
~ 2. np~article + npl 
gr(\[np. Nhd\]. 
\[\[art, Det\], \[npl, Nhd\]\]. 
( true ), 
\[\[act, Nhd, Det\]\]). 
~ 3. np--~npl 
gr(\[np, Nhd\], 
\[\[npl. Nhd\]\]. 
(true), 
\[ \]). 
~ 4. vp~be_aux + vp 
gr(Ivp, Aux\], 
category and head of LHS of rule. 
categories and heads of RHS. 
constraints, in this case, none 
list of triples generated 
Vhd is head word, Nhd is modifier. 
Nhd, the head of npl, becomes new head 
Nhd is head and Det is modifier. 
since there is only one constituent 
% in here 
no triple will be generated in 
this rule 
(be + vp) either passive or progressive 
\[\[beaux. Aux\], \[vp, Vhd\]\], 
( mempr(\[inflection, INFL\], Vhd). 
( INFL = paprt ~ if inflection of vp is passive 
--~ ~ participle, then 
Triples = \[\[beaux, Aux. Vhd\], \[voice, Vhd, passive\]\] 
: ~ otherwise, 
( INFL = prprt ~ if inflection is present participle 
-~ ~ then, 
Triples = \[\[be_aux, Aux, Vhd\], progressive, Vhd, yes\]\] 
; ~ otherwise, 
fail ) ) ), ~ this rule cannot be applied. 
Triples). 
Figure 3 Augmented grammar rules for triple generation. 
constituents and the other for triples. Whenever the 
parser builds a constituent and its triple, the parser 
generates an index for the triple, 9 and records the triple 
on the chart of triples using the index. Then it records 
the constituent with the index of the triple on the chart 
of constituents. 
We use Rule 4 in Figure 3 to illustrate the parser. 
Rule 4 states that if there are two adjacent constituents, 
a be-aux followed by a vp, execute the procedure in the 
third argument position of the rule. The procedure 
contains the constraints that must be satisfied to make 
the rule to be fired. If the procedure succeeds, the 
parser records a new constituent \[vp,Vhd\]~the first 
argument of the rule---on the chart. Before the parser 
records the constituent, it must check the triples for the 
constituent. The procedure in the third argument posi- 
tion also contains the processes to produce the triples 
for the constituent. 
The fourth argument of a grammar rule is a list of 
triples produced by executing the augmenting proce- 
dure at the third argument position of the rule. If the 
constraints in the procedure are satisfied, the triples are 
also produced. The parser generates a unique index for 
each triple, records the triples on the chart of triples, 
and adds to the new constituent, the indices of the new 
triples. Then, the new constituent is recorded on the 
chart of constituents. In this example, the head of the 
new constituent is the same as that of be-aux; i.e., the 
be-aux dominates the vp. 
After finishing the construction of the shared, 
packed-parse forest of an input sentence, the parser 
navigates the parse forest to collect the triples that 
22 Computational Linguistics, Volume 15, Number 1, March 1989 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
pointer \[category, head, list ofchildnodesandindex 
1047 \[snt, 
1002 \[np, 
i001 \[npl, 
I000 \[n. 
1046 \[vp, 
1045 \[vpl 
1027 \[vpl, 
1013 \[vpl, 
1004 \[vpl, 
i003 \[verb, 
1012 \[np, 
1008 \[art, 
I011 \[np, 
i010 \[npl. 
1009 \[noun, 
1023 \[pp, 
1017 \[prep, 
1022 \[np, 
1018 \[art, 
1021 \[np, 
1020 \[npl, 
I019 \[noun. 
1026 \[np. 
1025 \[np, 
1024 \[npl, 
1037 \[pp, 
1031 \[prep, 
1036 \[np, 
1032 \[art, 
1035 \[np, 
1034 \[npl, 
1033 \[noun, 
1041 \[pp, 
1040 \[np, 
1039 \[np, 
1038 \[npl, 
1044 \[np, 
1043 \[np, 
1042 \[npl, 
of triple\] 
\[1,see\]. \[\[1002, 1046\], 22\] \] 
\[0,i\], \[\[1001\], notriple\] \] 
\[O,i\], \[\[1000\], notriple\] \] 
\[0,i\], \[\]\] 
\[1,see\], \[\[i045\], notriple\] \] 
\[l,see\], \[\[i004, i044\], 21\], 
\[\[1013, 1041\], 24\], 
\[\[1027, 1037\], 26\] \] 
l,see\], \[\[1004, i026\], i0\], 
\[\[lO13, lO23\], 13\]\] 
\[l,see\], \[\[1004, i012\], 03\] \] 
\[l,see\], \[\[i003\], notriple\] \] 
\[1,see\], \[\] \] 
\[3,man\], \[\[1008, i011\], 02\] \] 
\[2,a\], \[\] \] 
\[3,man\], \[\[i010\], notriple\] \] 
\[3,man\], \[\[1009\], notriple\] \] 
\[3,man\], \[\] \] 
\[4,on\], \[\[1017, I022\], 07\] \] 
\[4,on\], \[\]\] 
\[e,hill\], \[\[1018. 1021\], 06\] \] 
\[5.the\], \[\] \] 
\[6,hill\], \[\[I020\], notriple\] \] 
\[6,hill\], \[\[i019\], notrlple\] \] 
IS,hill\], \[\] \] 
\[3.man\], \[\[i008, i025\], 09\] \] 
\[3,man\], \[\[i024\], notriple\] \] 
\[3,man\], \[\[i010, i023\], 08\] \] 
\[7,wlth\], \[\[1031, i036\], 15\] \] 
\[7,with\], \[\] \] 
\[9,telescope\], \[\[1032, 1035\], 14\] \] \[8,a\], 
\[\] \] 
\[9,telescope\], \[\[i034\], notriple\] \] 
\[9,telescope\]. \[\[i033\], notriple\] \] 
\[9,telescope\], \[\] \] 
\[4.on\], \[\[1017, 1040\], 18\] \] 
\[6,hill\], \[\[1018, 1039\], 17\] \] 
\[6,hill\], \[\[i038\], notriple\] \] 
\[6,hill\]. \[\[1020, 1037\], 18\] \] 
\[3,man\], \[\[1008, i043\], 20\] \] 
\[3,man\], \[\[i042\], notriple\] \] 
\[3,man\], \[\[i010, i041\], 19\], 
\[\[1024, 1037\], 25\] \] 
A packed node contains several nodes, each of which 
contains the category of the node, its head word, and 
the list of the pointers to its child nodes and the indices 
of the triples of the node. Node 1045 in Figure 4 is a 
packed node in which three different constituents are 
packed. Those three constituents have the same cate- 
gory, vpl, span the same terminals, (from \[ 1,see,v\] to 
\[9,telescope.n\] ), with the same head word, (\[ 1,see,v\]), 
but with different internal substructures. 
Note that several constituents may have different 
indices that point to the same triple. For example, in 
Figure 4, the first vpl in the packed node 1045 has the 
index 21, the first vpl in the packed node 1027 has the 
index 10, and the vpl node in the packed node 1013 has 
the index 13 as the indices of their triples. Actually, 
these three indices represent the same triple, Triple 3 in 
Figure 2. Those three constituents have the same cate- 
gory, vpl, the same head, \[1,see,v\], and the same 
modifier, \[3,man,n\], but have different inside struc- 
tures of the modifying constituent, np, whose head is 
\[3arian,n\]. The modifying constituent, np, may span 
from \[2,a\] to \[3maan\], from \[2,a\] to \[6~hill\], or from 
\[2,a\] to \[9,telescope\]. 
There are different types of triples that do not have 
head-modifier relations. These types of triples are for 
syntactic characteristics of a sentence such as mood and 
voice of verbs. For example, grammar rule 4 in Figure 
3 generates not only triples of head-modifier relations, 
but also triples that have the information about the 
voice or progressiveness of the head word of the VP, 
depending on the type of inflection of the word. This 
kind of information can be determined in syntactic 
processing and is used effectively in higher level seman- 
tic processing. 
Figure 4 Shared, Packed-Parse Forest. 4 PROPERTIES OF SYNTACTIC GRAPHS 
participate in each correct syntactic analysis of the 
sentence. The collecting algorithm is explained in Sec- 
tion 5.2 in detail. 
The representation of the shared, packed-parse for- 
est for the example in Figure 2 is in figures 4 and 5.'o It 
is important to notice that the shared, packed-parse 
forest generated in this parser is different from that of 
other parsers. In the shared, packed-parse forest de- 
fined by Tomita (1985), any constituents that have the 
same category and span the same terminal nodes are 
regarded as the same constituent and packed into one 
node. In the parser for syntactic graphs, the packing 
condition is slightly different in that each constituent is 
identified by the head word of the constituent as well as 
the category and the terminals it spans. Therefore, 
although two nodes might have the same category and 
span the same terminals, if the nodes have different 
head words, then they cannot be packed together. 
We first define several terms used frequently in the rest 
of the paper. 
Definition 1: An in-arc of a node in a syntactic graph 
is an arc which points to the node, and an out-arc of 
a node points away from the node. 
Since, in the syntactic graph representation, arcs point 
from dominator to modifier nodes, a node with an in-arc 
is the modifier node of the arc, and a node with an 
out-arc is the dominator node of the arc. 
Definition 2: A reading of the syntactic graph of a 
sentence is one syntactic interpretation of the sen- 
tence in the syntactic graph. 
Since a syntactic graph is a union of syntactic analyses 
of a sentence, one reading of a syntactic graph is 
analogous to one parse tree of a parse forest. 
Definition 3: A root node of one reading of a syntactic 
graph is a node which has no in-arc in the reading. 
In most cases, the root node of a reading of the syntactic 
graph of a sentence is the head verb of the sentence in 
that syntactic interpretation. In a syntactically ambigu- 
Computational Linguistics, Volume 15, Number 1, March 1989 23 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
:1002 
:1047 
vpl 
vpl vpl 
:1046 
vpl vpl I :1045 
:104/4 
n~l npl I :1042 
PP 
npl 
vpl I 
V 
saw 
npl 
PP 
np 
npl 
art 
a 
pp 
n P art n P art n 
man on the hill with a telescope 
Figure 5 Shared, Packed-Parse Forest-A Diagram. 
ous sentence, different syntactic analyses of the sen- 
tence may have different head verbs; thus there may be 
more than one root node in a syntactic graph. For 
example, in the syntactic graph of one famous and 
highly ambiguous sentence--"Time flies like an 
arrow"Dshown in Figure 6, there are three different 
root nodes. These roots are \[O,tlmo,v\], \[ 1,fly,v\], and 
\[ 2rUke,v\] 11 . 
Definition 4: The position of a node is the position of 
the word which is represented by the node, in a 
sentence. 
Since a word may have several syntactic categories, 
there may be more than one node with the same position 
in a syntactic graph. For example, since the word 
"time" in Figure 6, which appeared as the first word in 
the sentence, has two syntactic categories, noun and 
verb, there are two nodes, \[O,tame,n\] and \[O,Ume,v\], in 
the syntactic graph, and the position of the two nodes is 
0. 
One of the most noticeable features of a syntactic 
24 
graph is that ambiguities are explicit, and can be easily 
detected by semantic routines that may use fu~her 
knowledge to resolve them. The following property 
explains how syntactically ambiguous points can be 
easily determined in a syntactic graph. 
Property 1: In a syntactic tree, each constituent 
except the root must by definition be dominated by a 
single constituent. Since a syntactic graph is the 
union of aU syntactic trees that the grammar derives 
from a sentence, some graph nodes may be domi- 
nated by more than one node; such nodes with 
multiple dominators have multiple in-arcs in the 
syntactic graph and show points at which the node 
participates in more than one syntactic tree interpre- 
tation of a sentence. In a graph resulting from a 
syntactically unambiguous sentence, no node has 
more than a single in-arc, and the graph is a tree with 
the head verb as its root. 
According to Property 1, no pair of arcs which point to 
the same node can co-occur in any one syntactic 
Computational Linguistics, Volume 15, Number 1, March 1989 
Jungyun Seo and Robert F. Simmom Syntactic Graphs: A Representation for the Union of All Ambiguous parse Trees 
I 
vpp rood npp \[4,arrow,Nl I \f 
I 11 
Sentence: Time flies like an arrow. 
Figure 6 Graph Representation and Parse Trees of a Highly Ambiguous Sentence. 
reading, because each node can be a modifier node only 
once in one reading. Therefore, we can focus on the 
arcs pointing to the same node as ambiguous points. In 
terms of triples, any two triples with identical modifier 
terms reveal a point of ambiguity, where a modifier term 
is dominated by more than one node. 
In the example in Figure 1, the syntactic ambiguities 
are found in two arcs pointing to \[4,on,p\] and in three 
arcs pointing to \[7,w-it, la,p\]. The PP with head \[4,on\] 
modifies the VP whose head is \[1,see\] and it also 
modifies the NP with head \[3,ma~\]. Similarly three 
different in-arcs to the node \[7,wit~\] show that there 
are three possible choices to which Node 7 can be 
attached. The semantic processor can focus on these 
three possibilities (or on the earlier two possibility set), 
using semantic information, to choose one dominator. 
Lacking semantic information, the ambiguities will re- 
main in the graph until they can be resolved by addi- 
tional knowledge from the context. 
Property 2: Since all words in a sentence must be 
used in every syntactic interpretation of the sentence 
and no word can have multiple categories in one 
interpretation, one and only one node from each 
position must participate in every reading of a syn- 
tactic graph. In other words, each syntactic reading 
derived from a syntactic graph must contain one and 
only one node from every position. 
Since every node, except the root node, must be 
attached to another node as a modifier node, we can 
conclude the following property from properties 1 and 
2. 
Property 3: In any one reading of a syntactic graph, 
the following facts must hold: 
I. No two triples with the same modifier node can 
co-occur. 
2. One and only one node from each position, 
except the root node of the reading, must appear 
as a modifier node. 
Another advantage of the syntactic graph representa- 
tion is that we can easily extract the intersection of all 
possible syntactic readings from it. Since one node from 
each position must participate in every syntactic read- 
ing of a syntactic graph, every node which is not a root 
node and has only one in-arc, must always be included 
in every syntactic reading. Such unambiguous nodes are 
common to the intersections of all possible readings. 
When we know the exact locations of several pieces in 
a jigsaw puzzle, it is much easier to place the other 
pieces. Similarly, if a semantic processor knows which 
arcs must hold in every reading, it can use these arcs to 
constrain inferences to understand and disambiguate. 
Property 4: There is no information in a syntactic 
graph about the range of terminals spanned by each 
triple, so one triple may represent several constitu- 
ents which have the same head and modifying terms, 
with the same relation name, but which span differ- 
ent ranges of terminals. 
The compactness and handiness of a graph representa- 
tion is based on this property. One arc between two 
nodes in a syntactic graph can replace several compli- 
cated structures in the tree representation, and multiple 
dominating arcs can replace a parse forest. 
For example, the arc vnp from \[1,see,v\] to 
\[3,man,n\] in Figure I represents three different con- 
stituents. Those constituents have the same category, 
vpl, the same head, \[1,soo,v\], and the same modifier, 
\[3~nan,n\], but have different inside structures of the 
modifying constituent, np, whose head is \[3,man,n\]. 
The modifying constituent, np, may span from \[2,a\] to 
\[3,ma~\], from \[2,a\] to \[6,hfll\], or from \[2,a\] to 
\[9,telescope\]. Actually, in the exclusion matrix de- 
scribed below, each triple with differing constituent 
structure is represented by multiple subscripts to avoid 
the generation of trees that did not occur in the parse 
forest. 
Another characteristic of a syntactic graph is that the 
number of nodes in a graph is not always the same as 
that of the words in a sentence. Since some words may 
have several syntactic categories, and each category 
may lead to a syntactically correct parse, one word may 
require several nodes. For example, there are eight 
Computational Linguistics, Volume 15, Number 1, March 1989 2S 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
nodes in the syntactic graph in Figure 6, while there are 
only five words in the sentence. 
5 EXCLUSION MATRIX 
A syntactic graph is clearly more compact than a parse 
forest and provides a good way of representing all 
possible syntactic readings with an efficient focusing 
mechanism for ambiguous points. However, since one 
triple may represent several constituents, and there is 
no information about the relationships between triples, 
it is possible to lose some important syntactic informa- 
tion. 
This section consists of two parts. \]in Section 5. I, we 
investigate a co-occurrence problem of arcs in a syntac- 
tic graph and suggest the exclusion matrix, to avoid the 
problem. The algorithms to collect triples of a syntactic 
graph and to construct an exclusion matrix are pre- 
sented in Section 5.2. 
5.1 CO-OCCURRENCE PROBLEM BETWEEN ARCS 
One of the most important syntactic displays in a tree 
structured parse, but not in a syntactic graph, is the 
co-occurrence relationship between constituents. Since 
one parse tree represents one possible syntactic reading 
of a sentence, we can see whether any two constituents 
can co-occur in some reading by checking all parse trees 
one by one. However, since the syntactic graph keeps 
all possible constituents as a set of triples, it is some- 
times difficult to determine whether two triples can 
co-occur. 
If a syntactic graph does not carry the information 
about exclusive arcs, its representation of all possible 
syntactic structures may include interpretations not 
allowed by the grammar and cause extra overhead. For 
example, after a syntactic processor generates the tri- 
ples, a semantic processor will focus on the ambiguous 
points such as triples 4 and 5, and triples 8, 9, and 10 in 
Figure 2 to resolve the ambiguities. In this case, if the 
semantic processor has a strong clue to choose Triple 4 
over Triple 5, it should not consider Triple 10 as a 
competing triple with triples 8 and 9 since I0 is exclu- 
sive with 4. 
Some of the co-occurrence problems can be detected 
easily. For example, due to Property 1, since there can 
be only one in-arc to any node in any one reading of a 
syntactic graph, the arcs that point to the same node 
cannot co-occur in any reading. Triples including these 
arcs are called exclusive triples. The following properties 
of the syntactic graph representation show several cases 
when arcs cannot co-occur. These cases, however, are 
not exhaustive. 
Property 5: No two crossing arcs can co-occur. More 
formally, if an arc has n t -th and n'- -th words as a 
head and a modifier, and another arc has m I -th and 
m e -th words as a head and a modifier node, then, if 
nl<mz<n2<m 2 or ml<nt<m2<n 2, the two arcs can- 
not co-occur. 
I 
N I 
N 
Conj 
................................. "1 
\[3,W3, prep\] \[5, Wl, nl 19, W2, conjl 
Figure 7 II\]ega| Parse Tree from Exclusive Arcs. 
In the syntactic graph in Figure 1, the arcs vpp from 
\[1,see,v\] to \[4,on,p\] and npp from \[3,ma,n,n\] to 
\[7,wita'x,p\] cannot co-occur in any legal parse trees 
because they violate the rule that branches in a parse 
tree cannot cross each other. 
The following property shows another case of exclu- 
sive arcs which cross each other. 
Property 6: In a syntactic graph, any modifier word 
which is on the right side of its head word cannot be 
modified by any word which is on the left side of the 
head word in a sentence. More formally, let an arc 
have a head word W t and a modifier word W 2 whose 
positions are n t and n 2 respectively, and nz<n 2. Then 
if another arc has W 2 as a head word and a modifier 
word with position n~ where n3<-nl, then those two 
arcs cannot co-occur. 
Assume that there are two arcs---one is \[npp, 
\[5,Wl,noun\], \[9,W2,eonj \]\], and the other is \[eonjpp, 
\[9,W2,eor~\], \[3,W3,prep\]\]. The first arc said that the 
phrase with head word W2 is attached to the noun in 
position 5. The other triple said that the phrase with 
head word W3 is attached to the conjunction. This 
attachment causes crossing branches. The correspond- 
ing parse tree for these two triples is in Figure 7. As we 
can see, since there is a crossing branch, these two arcs 
cannot co-occur in any parse tree. 
The following property shows the symmetric case of 
Property 6. 
Property 7: In a syntactic graph, any modifier word 
which is on the left side of its head word cannot be 
modified by any word which is on the right side of the 
head word in a sentence. 
Other exclusive arcs are due to lexical ambiguity. 
Definition 5: If two nodes, W i and Wj , in a syntactic 
graph have the same word and the same position but 
with different categories, W i is in conflict with Wj. , 
and we say the two nodes are conflicting nodes. 
Property 8: Since words cannot have more than one 
syntactic' category in one reading, any two arcs 
which have conflicting nodes as either a head or a 
modifier cannot co-occur. 
26 Computational Linguistics, Volume 15, Number 1, March 1989 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
The example of exclusive arcs involves the vpp arc from 
\[ 1,flTC,v\] to \[2~lce,la \] and the vnp arc from \[0,time,v\] to 
\[1,fly,n\] in the graph in Figure 6. Since the two arcs 
have the same word with the same position, but with 
different categories, they cannot co-occur in any syn- 
tactic reading. By examination of Figure 6, we can 
determine that there are 25 pairwise combinations of 
exclusive arcs in the syntactic graph of that five word 
sentence. 
The above properties show cases of exclusive arcs 
but are not exhaustive. Since the number of pairs of 
exclusive arcs is often very large in real text (syntacti- 
cally ambiguous sentences), if we ignore the co-occur- 
rence information among triples, the overhead cost to 
the semantic processor may outweigh the advantage 
gained from syntactic graph representation. Therefore 
we have to constrain the syntactic graph representation 
to include co-occurrence information. 
We introduce the exclusion matrix for triples (arcs) 
to record constraints so that any two triples which 
cannot co-occur in any syntactic tree, cannot co-occur 
in any reading of a syntactic graph. The exclusion 
matrix provides an efficient tool to decide which triples 
should be discarded when higher level processes choose 
one among ambiguous triples. For an exclusion matrix 
(Ematrix), we make an N x N matrix, where N is the 
number of indices of triples. If Ematrix(ij) = 1 then the 
triples with the indices i and j cannot co-occur in any 
syntactic reading. If Ematrix(ij) = 0 then the triples 
with the indices i and j can co-occur in some syntactic 
reading. 
5.2 AN ALGORITHM TO CONSTRUCT THE EXCLUSION 
MATRIX 
Since the several cases of exclusive arcs shown in the 
previous section are not exhaustive, they are not suffi- 
cient to construct a complete exclusion matrix from a 
syntactic graph. A complete exclusion matrix can be 
guaranteed by navigating the parse forest when the 
syntactic processor collects the triples in the forest to 
construct a syntactic graph. 
As we have briefly described in Section 3, when the 
parser constructs a shared, packed forest, triples are 
also produced, and their indices are kept in the corre- 
sponding nonterminal nodes in the forest. 12 The parser 
navigates the parse forest to collect the triples--in fact, 
pointers pointing to the triples--and to build an exclu- 
sion matrix. 
As we can see in the parse forest in Figure 5, there 
may be several nonterminal nodes in one packed node. 
For each packed node, the parser collects all indices of 
triples in the subforests whose root nodes are the 
nonterminal nodes in the packed node, and then records 
those indices to the packed node. After the parser 
finishes collecting the indices of the triples in the parse 
forest, each packed node in the forest has a pointer to 
the list of collected indices from its subforest. There- 
fore, the root node of a parse forest has a pointer to the 
i 
e" 'l 
subforest 
"-El 
--El 
i,°, vo, vo, D 
rnj,, ' 
.; ; ~' .!~ .~. 
.t i 
,. ', • ', 
.i • t ~ 'l s 
• • • t ..... J /.- ....... d e. ..... "- 
subforest subforest subforest 
D : packed node 
==\]~ : list of all triples below this node 
i • triples of this node %.1~" 
=~ : pointer to the list of pointers pointing to triples 
Figure 8 Parse Forest Augmented with Triples. 
list of all indices of all possible triples in the whole 
forest, and those triples represent the syntactic graph of 
the forest. 
Figure 8 shows the upper part of the parse forest in 
Figure 5 after collecting triples. A hooked arrow of each 
nonterminal node points to the list of the indices of the 
triples that were added to the node in parsing. For 
example, pointer 2 contains the indices of the triples 
added to the node snt by the grammar rule: 
snt ~ np + vp 
A simple arrow for each packed node points to the list 
of all indices of the triples in the forest of which it is the 
root. This list is generated and recorded after the 
processor collects all indices of triples in its subnodes. 
Therefore the arrow of the root node of the whole 
forest, Pointer 1, contains the list of all indices of the 
triples in the whole forest. 
Since several indices may represent the same triple, 
after collecting all the indices of the triples in the parse 
forest, the parser removes duplicating triples in the final 
representation of the syntactic graph of a sentence. 
Collecting pointers to triples in the subforest of a 
packed node and constructing the Ematrix is done 
recursively as follows: First, Ematrix(i j) is initialized to 
1, which means all arcs are marked exclusive of each 
other. Later, if any two arcs indexed with i and j 
Computational Linguistics, Volume 15, Number 1, March 1989 27 
Jungyun See and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
function collect_triple(Packed_node) 
if Packed node. Collected 
if the indices of triples are already collected 
then return(Packednode. Triplelndex) ~ collected then. return 
the collected indices 
else Packed node. TripleIndex := eollectl(Packed_node) 
else collected them 
Packed node. Collected := true ~ set flag Collected. 
return?Packed_node. TriplsIndex) ~ return collected indices. 
function collectl(Packed_node) 
Triple_Indices: = { } 
for each Node in Packed node do 
TRiple Indices := me~ge(Node.TripleIndex. Triple_Indices) 
case Node.Child node num 
0 (do nothing) 
1 Temp := collecttriple(Node. Childnode) 
Triple_Indices := msrge(Temp, Triple_Indices) 
co-occuri(Node.TripleIndex, Temp) 
2 Templ := collect_triple(Node.Left_child) 
Temp2 := collect_triple(Node.Right_child) 
Triple_Indices := merge(Tempi. Temp2. Triple_Indices) 
co-occur2(Node.TripleIndex, Templ, Temp2) 
end-do 
return(Triples) 
function cooccurl(Tripl. Trip2) 
fully cooccur(Tripl) 
cc-occur3(Tripl, Trip2) 
function cooccur2(Tripl, Trip2, Trip3) 
fully_cooccur(Tripl) 
cooccur3(Tripl, Trip2) 
cooccur3(Tripl, Trip3) 
cooccur3(Trip2, Trip3) 
functioncooccur3(Tripl, Trip2) 
for each index i in Tripl do 
foreach index j in Trip2 do 
Ematrix(i, j):= 0 
Ematrix(j. i):= 0 /*Ematrixissymmetric*/ 
function fully_cooccur(Triples) 
for each pair of indices i and j in Triples do 
Ematrix(i, j):= 0 
Figure 10 An Algorithm to Construct the Exclusion Matrix. 
Figure 9 An Algorithm to Collect Triples. 
co-occur in some parse tree, then the value of Ematrix, 
E(ij), is set to 0. For each nonterminal node in a packed 
node, the parser collects every index appearing below 
the nonterminal node--i.e., the index of the triples of its 
subnodes. If a subnode of the nonterminal node was 
previously visited, and its indices were already col- 
lected, then the subnode already has the pointer to the 
list of collected indices. Therefore the parser does not 
need to navigate the same subforests again, but it takes 
the indices using the pointer. The algorithm in pseudo- 
PASCAL code is in Figure 9. 
After the parser collects the indices of the triples 
from the subnodes of the nonterminal node, it adjusts 
the values in the exclusion matrix according to the 
following cases: 
I. If the nonterminal node has one child node, its 
own triples can co-occur with each other, and 
with every collected triple from its subforest. 
2. If the nonterminal node has two child nodes, its 
own triples can co-occur with each other and 
with the triples collected from both left and right 
child nodes, and the triples from the left child 
node can co-occur with the triples from the right 
one. 
This algorithm is described in Figure I0. 
For example, the process starts to collect the indices 
of the triples from SNT node in Figure 8. Then, it 
collects the indices in the left subforest whose root is 
np. After all indices of triples in the subforest of np are 
collected, those indices and the indices of the triples of 
the node in 6 are recorded in 5. Similarly all indices in 7 
and 4 are recorded in 3 as the indices of the triples in the 
right subforest of the snt node. The indices in 5 and 3 
and the indices in 2 are recorded in I as the indices of 
the triples of the whole parse forest. In packed nodes 
with more than one nonterminal node, like vpl, all 
indices of the triples in the three subforests of vpl and 
28 
the indices in 8, 9, and 10 are collected and recorded in 
7. 
By the first case in the above rule, every triple 
represented by the indices in 4 can co-occur with each 
other, and every triple represented by the indices in 4 
can co-occur with every triple represented by the indi- 
ces in 7. One example of the second case is that every 
triple represented by the indices in 2 can co-occur with 
each other, and every triple represented by the indices 
in 2 can co-occur with every triple represented by the 
indices in 5 and 3. Every triple represented by the 
indices in 5 can co-occur with the triples represented by 
the indices in 3. Whenever the process finds a pair of 
co-occurring triples it adjusts the value of Ematrix 
appropriately. 
6 COMPLETENESS AND SOUNDNESS OF THE SYNTACTIC 
GRAPH 
In this section, we will discuss completeness and sound- 
ness of a syntactic graph with an exclusion matrix as an 
alternative for tree representation of syntactic informa- 
tion of a sentence. 
Definition 6: A syntactic graph of a sentence is 
complete and sound compared to the parse forest of 
the sentence iff there is an algorithm that enumerates 
syntactic readings from the syntactic graph of the 
sentence and satisfies the following conditions: 
1. For every parse tree in the forest, there is a 
syntactic reading from the syntactic graph that is 
structurally equivalent to that parse tree. (complete- 
ness) 
2. For every syntactic reading from the syntactic 
graph, there is a parse tree in the forest that is 
structurally equivalent to that syntactic reading. 
(soundness) 
To show the completeness and soundness of the syn- 
tactic graph representation, we present the algorithm 
that enumerates all possible syntactic readings from a 
syntactic graph using an exclusion matrix. This algo- 
Computational Linguistics, Volume 15, Number 1, March 1989 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
The following data are initial input. 
Partition I = alist of triples which have the I-thwordas amodifier. 
Sen_length = the position of the last word inasentence. 
RootList = a list of root triples. 
gen_subgraph(RootList, Sen_length, Graphs, All_readings):- 
( RootList = \[\] ~ if all root triple in RootList had been tried 
-*All_readings = Graphs ~ then return Graphs as all readings, 
otherwise, find all readings with a RootTriple 
; RootList = \[RootTripleIRootListl\], 
gen_subgraphl(RootTriple, Sen_length, Sub_graphs), 
append(Graphs, Sub graphs, Graphsl), 
gen_subgraph(RootListl, Graphsl, All_readings)). 
gen_subgraphl(RootTriple, Sen length, Sub_graphs):- 
Rh = Position of the head node inRootTriple %i.e., position of the root node 
Rm = Position of the modifier node in RootTriple, 
Wlist = \[RootTriple\], 
setof(Graph, gen_subl(Rh, Rm, Sen length, Wlist, Graph, 0), Sub_graphs). 
gen_subl(Rh, Rm, Sen_length, Wlist, Graph, N):- 
(N>Sen length ~ifit takeatriple from all partitions 
-*Graph = Wlist ~ then return Wlist as one reading of a syntactic graph, 
otherwise, pick one triple from partition N. 
; ( ( Rh = N ~Don't pick up any triple from root node position. 
; Rm = N) ~ A triple from partition Rm is already picked in Wlist. 
-*true 
; get_triple(N, Triple), ~ take a triple(in fact, an index of the triple) from partition N. 
not_exclusive(Wlist, Triple), ~ check exclusiveness of Triple with other triples in Wlist. 
N1 is N + l, ~ go to the next partition. 
gen_subl(Rh, Rm, \[Triple\[Wlist\], Graph, NI))). 
Figure 11 Algorithm that Generates All and Only Readings ~om an SG. 
rithm constructs subgraphs of the syntactic graph, one 
at a time. Each of these subgraphs is equivalent to one 
reading of the syntactic graph. Since no node can 
modify itself, each of these subgraphs is a directed 
acyclic graph (DAG). Furthermore, since every node in 
each of these subgraphs can have no more than one 
in-arc, the DAG subgraph is actually a tree. 
Before going into detail, we give an intuitive descrip- 
tion of the algorithm. The algorithm has two lists of 
triples as input: a list of triples of a syntactic graph and 
a list of root triples. A root triple is a triple that 
represents the highest level constituent in a parse--i.e., 
ant (sentence) in the grammar in Figure 2. The head 
node of a root triple is usually the head verb of a 
sentence reading. 
According to Property 3 in Section 4, one reading of 
a syntactic graph must include one and only one node 
from every position, except the position of the root 
node, as a modifier node. This is a necessary require- 
ment for any subgraph of a syntactic graph to be one 
reading of the graph. One of the simplest ways to make 
a subgraph of a syntactic graph that satisfies this 
requirement is: 
Make partitions among triples according to the 
position of the modifier node of the triples, e.g., 
triples in Partition 0 have the first word in a sentence 
as the modifier nodes. Then take one triple from each 
partition. Here, the algorithm must know the position 
of the root node so that it can exclude the partition in 
which triples have the root node as a modifier. When 
Computational Linguistics, Volume 15, Number 1, March 1989 
it chooses a triple, it also must check the exclusion 
matrix. If a triple from a partition is exclusive with 
any of the triples already chosen, the triple cannot be 
included in that reading. The algorithm must try 
another triple in that partition. Since the exclusion 
matrix is based on the indices of the triples, when it 
chooses a triple, it actually chooses an index in the list 
of indices of the triple. 
Note that any subgraphs produced in this way satisfy 
Property 3, and all triples in each subgraph are inclusive 
with each other according to the exclusion matrix. The 
top level procedures of the algorithm in Prolog are 
shown in Figure 11.13 
We do not have a rigorous proof of the correctness of 
the algorithm, but we present an informal discussion 
about how this algorithm can generate all and only the 
correct syntactic readings from a syntactic graph. 
Since the syntactic graph of a sentence is explicitly 
constructed as a union of all parse trees in the parse 
forest of the sentence, the triples of the syntactic graph 
imply all the parse trees. This fact is due to the 
algorithm that constructs a syntactic graph from a parse 
forest. Therefore, if we can extract all possible syntactic 
readings from the graph, these readings will include all 
possible (and more) parse trees in the forest. Intuitively, 
the set of all subgraphs of a syntactic graph includes all 
syntactic readings of a syntactic graph. 
In fact, this algorithm generates all possible sub- 
graphs of a syntactic graph that meets the necessary 
conditions imposed by Property 3. The predicate 
29 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
gen sub1 generates one reading with a given root 
triple. All readings with a root triple are exhaustive- 
ly collected by the predicate gen subgraphl using 
the setof predicate--a meta predicate in Prolog. All 
readings of a syntactic graph are produced by the 
predicate gen subgraph, which calls the predicate 
gen subgraphl for each root triple in RootList. There- 
fore,this algorithm generates all subgraphs of a syntac- 
tic graph that satisfy Property 3 and that are consistent 
with the exclusion matrix. Hence, the set of subgraphs 
generated by the algorithm includes all parse trees in the 
forest. 
The above algorithm checks the exclusion matrix 
when it generates subgraphs from the syntactic graph, 
so all triples in each subgraph generated by the algo- 
rithm are guaranteed to co-occur with each other in the 
exclusion matrix. Unfortunately, it does not appear 
possible to prove that if triples, say T1 and T2, T2 and 
T3, and T1 and T3, all co-occur in pairs, that they must 
all three co-occur in the same tree! So, although empir- 
ically all of our experiments have generated only trees 
from the forest, the exclusion matrix does not provide 
mathematical assurance of soundness. 
If subsequent experience with our present statisti- 
cally satisfactory, but unsound exclusion matrix re- 
quires it, we can produce, instead, an inclusion matrix 
that guarantees soundness. The columns of this matrix 
are I.D. numbers for each parse tree; the rows are 
triples. The following procedure constructs the matrix. 
1. Navigate the parse forest to extract a parse tree, 
I, and collect triples appearing in that parse tree. 
2. Mark matrix(Tindex, I) = 1, for each triple with 
the index Tindex appearing in the I-th parse tree. 
Backtrack to step 1 to extract another possible 
parse tree until all parse trees are exhausted. 
Then, given a column number i, all triples marked in 
that column co-occur in the i-th parse tree. Since this 
algorithm must navigate all possible parse trees one by 
one, it is less efficient than the algorithm for construct- 
ing the exclusion matrix. But if our present system 
eventually proves unsound, this inclusion matrix guar- 
antees that we can test any set of constituents to 
determine unequivocally if they occur in a single parse 
tree from the forest. 
Therefore, we claim that syntactic graphs enable us 
to enumerate all and only the syntactic readings given in 
a parse forest, and that syntactic graph representation is 
complete and sound compared to tree representations of 
the syntactic structure of a sentence. 
7 RELATED WORKS 
Several researchers have proposed variant representa- 
tions for syntactic structure. Most of them, however, 
concentrated on how to use the new structure in the 
parsing process. Syntactic graph representation in this 
work does not affect any parsing strategy, but is con- 
structed after the syntactic processor finishes generat- 
ing a parse \]Forest using any all path parser. 
Marcus et. al. (1983) propose a parsing representa- 
tion that is also different from tree representation. They 
use the new representation for a syntactic structure of a 
sentence to preserve information, while modifying the 
structure during parsing, so that they can solve the 
problems of a deterministic parser (Marcus 1980)--i.e., 
parsing garden path sentences. Marcus's representation 
consists of dominator-modifier relationships between 
two nodes. It is, however, doubtful that a correct parse 
tree can be derived from the final structure, which 
consists of only domination relationships. They do not 
represent all possible syntactic readings in one struc- 
ture. 
Barton and Berwick (1985) also discuss the possibil- 
ity of a different representation, an "assertion set", as 
an alternative for trees, and show various advantages 
expected from the new structure. As in Marcus's work, 
they use the assertion set to preserve information as 
parsing progresses, so that they can make a determin- 
istic parser to be partially noncommittal, when the 
parser handles ambiguous phrases. Their representation 
consists of sets of assertions. Each assertion that rep- 
resents a constituent is a triple that has the category 
name and the range of terminals that the constituent 
spans. It is unclear how to represent dominance rela- 
tionships between constituents with assertion sets, and 
whether the final structure represents all possible parses 
or parts of the parses. 
Rich et. al. (1987) also propose a syntactic represen- 
tation in which all syntactic ambiguities are kept. In this 
work. the ambiguous points are represented as one 
modifier with many possible dominators. Since, how- 
ever, this work also does not consider possible prob- 
lems of exclusive attachments, their representation 
loses some information present in a parse forest. 
Tomita (1985) also suggests a disambiguation pro- 
cess, using his shared, packed-parse forest, in which all 
possible syntactic ambiguities are stored. The disambig- 
uation process navigates a parse forest, and asks a user 
whenever it meets an ambiguous packed node. It does a 
"shaving-a-forest" operation, which traverses the parse 
forest to delete ambiguous branches. Deleting one arc 
accomplishes the "shave" in the syntactic graph repre- 
sentation. Furthermore, in a parse forest, the ambigu- 
ous points can be checked only by navigating the forest 
and are not explicit. 
Since a parse forest does not allow direct access to its 
internal structure, a semantic processor would have to 
traverse the forest whenever it needed to check internal 
relations to generate case relations and disambiguate 
without a user's guidance. Syntactic graph representa- 
tion provides a more concise and efficient structure for 
higher level processes. 
30 Computational Linguistics, Volume 15, Number 1, March 1989 
Jungyun Seo and Robert F. Simmons Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees 
8 CONCLUSION 
In this paper, we propose the syntactic graph with an 
exclusion matrix as a new representation of the surface 
syntactic structure of a sentence. Several properties of 
syntactic graphs are examined. An algorithm that enu- 
merates all and only the correct syntactic readings from 
syntactic graph is also presented. Therefore, we claim 
that syntactic graph representation provides a concise 
way to represent all possible syntactic readings in one 
structure without losing any useful information con- 
tained in the tree structured representation. 
To further justify that syntactic graph representation 
is a suitable formalism for an output format of syntactic 
processes, we need to investigate methods for using 
syntactic graphs to make correct decisions in higher 
level processes. The exclusion matrix is an efficient tool 
to help semantic processes make correct choices. 
Because of its conciseness, the syntactic graph 
makes it possible to store temporarily the syntactic 
structure of sentences that already have been proc- 
essed. A text understanding process is very likely to 
find contradicting evidence between a current sentence 
and the context of the previous sentences. If we did not 
keep alternative analyses of previous sentences the only 
thing we could do is backtracking, which is computa- 
tionally too expensive. Furthermore, since the search 
space of the syntactic processor is different from that of 
the semantic processor, it is very important for the 
syntactic process to commit to a final result. We are 
currently investigating how to use syntactic graphs of 
previous sentences to maintain a continuous context 
whose ambiguity is successively reduced by additional 
incoming sentences. 
ACKNOWLEDGMENTS 
This work is sponsored by the Army Research OffÉce under contract 
DAAG29-84-K-0060. The authors are grateful to Olivier Winghart for 
his critical review of an earlier draft of this paper. 

REFERENCES 
Aho, A. V. and Ullman, J. D. 1972 The Theory of Parsing, Translation 
and Compiling 1. Prentice-Hall, Englewood Cliffs, NJ. 
Barton, G. E. and Berwick, R. C. 1985 "Parsing with Assertion Sets 
and Information Monotonicity." In Proceedings of International 
Joint Conference on Artificial Intelligence-85 (IJCAI-85): 769- 
771. 
Birnbaum, L. and Selfridge, M. 1981 "Conceptual analysis of natural 
language." In R. Schank and C. Riesbeck, eds., Inside Computer 
Understanding. Lawrence Erlbaum, Hillsdale, NJ. 
Chester, D. 1980 "A Parsing Algorithm that extends Phrases." 
American Journal of Computational Linguistics 6 (2): 87-96. 
Chomsky, N. 1970 "Remarks on nominalization." In R. Jacobs and 
P. S. Rosenbaum, Eds., Readings in English Transformational 
Grammar. Waltham, MA- Ginn & Co. 
Chomsky, N. 1981 Lectures on Government and Binding. Foris, 
Dordrecht, Holland. 
Early, J. 1970 "An Efficient Context-free Parsing algorithm." Comm 
ACM 13, (2): 94-102. 
Frazier, L. and Fodor, J. 1979 "The Sausage Machine: A New 
Two-Stage Parsing Model." Cognition 6: 41-58. 
Computational Linguistics, Volume 15, Number 1, March 1989 
Kay, M. 1980 "Algorithm Schemata and Data Structures in Syntactic 
Processing." Xerox Corporation, Technical Report Number CSL- 
80-12, Palo Alto, CA. 
Lytinen, S. L. 1986 "Dynamically Combining syntax and semantics in 
natural language processing." In Proceedings of The American 
Association for Artificial lntelligence-86(AAAI-86): 574-578. 
Marcus, M. P. 1980 A Theory of Syntactic Recognition for Natural 
Language. MIT Press, Cambridge, MA. 
Marcus, M. P.; Hindle, D.; and Fleck, M. M. 1983 "D-Theory: 
Talking about Talking about Trees." In Proceedings of 21st 
Annual Meeting of the Association for Computational Linguistics: 
129--136. 
Pereira, F. C. N. and Warren, D. H. 1980 "Definite Clause Grammars 
-- A survey of the formalism and a Comparison with Augmented 
Transition Network." Artificial Intelligence, 13:231-278. 
Rich, A.; Barnett, J.; Wittenburg, K.; and Wroblewski, D. 1987 
"Ambiguity Procrastination." In Proceedings of AAAI--87: 571- 
576. 
Shubert, L. K. 1984 "On Parsing Preferences." In Proceedings of the 
Conference on Computational Linguistics 84 Stanford, CA: 247- 
250. 
Shubert, L. K. 1986 "Are There Preference Trade-Offs in Attachment 
Decision?" In Proceedings of AAAI-86: 601-605. 
Tomita, M. 1985 Efficient Parsing for Natural Language. Kluwer 
Academic Publishers, Boston, MA. 
Tsukada, D. 1987 "Using Dominator-Modifier Relations to Disam- 
biguate a Sentence" (master's thesis), Department of Computer 
Sciences, University of Texas at Austin. 
Waltz, D. L. 1982 "The State of the Art in Natural Language 
Understanding." In W. Lehnert and M. Ringle (eds.), Strategies 
for Natural Language Processing, Lawrence Erlbaum Associates, 
Inc., Hillsdale, NJ. 
W~lks, Y.; Huang, X.; and Fass, D. 1985 "Syntax, Preference and 
Right Attachment." In Proceedings of lnternational Joint Confer- 
ence on Artificial Intelligence-85 (IJCAI-85): 779--784. 
Winghart, O. J. 1986 "A Processing Model for Recognition of 
Discourse Coherence Relations" (unpublished Ph.D proposal), 
Department of Computer Sciences, University of Texas at Austin. 
