LIST. AUTOMATA WITH SYNTAC'\]\[" I CAL LY STRUCTURED OU~ PU~ ~ 
karel 0L~VA and Martin PLATEK 
Faculty of Mathematics and Physics 
Charles University 
Nalostransk~ n~m~stl 25 
CS-118 00 Praha I -Mal& Strana 
Czechoslovakia 
Abstract: 
A new type of abstract automaton 
is introduced, and both formal and linguistic 
implications are discussed, most importantly 
a new possibility of proving certain formal 
properties of (natural) languages and their 
grammars (such as context-freeness) and of 
refinement of the Chomsky hierarchy. 
I. Introduction 
In this article we want to propose a new 
type of (abstract) nondeterministic automa- 
ton; its most distinguishing feature is that 
its input data is a l_!inear doubly, linked list 
and its output is a s~ntac!ic strncture , on 
condition that the computation was success- 
ful, i.e. the word represented by the input 
list was in the language defined by the auto- 
maton, all nondeterministic decisions of the 
automaton were correct (the automaton "gues- 
sed" what to do) and, hence, the computation 
finished in an accepting configuration. 
Apart from other features, this automa- 
ton gives a uniform formal environment for 
the formulation of formal syntax of natural 
language(s), regardless of the intuitions 
standing behind the linguistic theory in 
Question; here, we have in mind first of all 
the dependency or immediate constituent ap- 
proach to language description. 
The intuition standing behind the de___ep_~D_T 
dencl approach is based on erasin~ words from 
the sentence and studying whether the result- 
ing string is grammatical: by means of this, 
the relative mutual importance of words 
(i.e., d_~ependency, as the relation between 
the syntactically "more important" word, 
governor, and the "less important", 
den__~%, word) can be stepwise determined and 
then expressed e.g. in a dependency tree of 
the sentence. Clearly, in more complex cases, 
it is impossible to subsume all these relat- 
ions in a sentence purely by means of depen- 
dency, since there are also other relations 
to be found between words (such as coordinat- 
ion or apposition), as well as it is impossi- 
ble to express all possible relations of de- 
pendency in the form of a tree, because in 
certain cases a single dependent word might 
have more than one governor (e.g., in cases 
of words depending on coordination of govern- Ors). 
On the other hand, the intuitions stand- 
ing behind the imediate constituent approach 
is that of replacing certain groups of words 
by others, and, again, studying the grammati- 
oality of the result By means of this pro- 
cess, the sentences can be stepwise splitted 
t 9 smaller and smaller parts from which they 
are built of, and the structure thus obtained 
cln be then expressed in an IC-tree. 
,498 
In fact, we believe that both these intui-~ 
tions are extremely insightful and that it is 
a regretful misunderstanding that they are 
still felt as oppositions rather than comple- 
mentations by many linguists; though there 
have been several attempts to merge them into 
a single theory (T-syntax is surely the most 
notable case), we are still convinced that 
the results do not suffice fully. The type of 
automaton ("accepter") we propose is in fact 
able to simulate elegantly any of the two 
approaches during the process of computation 
and to reflect them also in the structure of 
its output. Thus, it makes no distinction 
between these two linguistic approaches and 
allows for formulations of theories based on 
one or the other approach or even on any 
their mixture. 
2. Descri t~n of the Automaton 
The list automaton consists of a fi- 
nite control unl% attached to a (finite) li- 
near list by a head. The head is always able 
t0 read or write symbols to the item of the 
list on which it stands (the current item) 
and, in addition, it is able to read (but not 
%o write) the symbol On the item immediatelly 
to the left in the llst. Every item of the 
list (and, generally, any node in the result- 
ing syntactic structure) consists of a set of 
pointers L ,R ,C ,CH ,0 ,H ,ZL and CP and informat- 
ion parts Cat and Lex (see fig I.). 
The pointers serve the following purposes: 
C...serves for "simple" coordination (such as 
"Peter and Paul") 
CH..serves for embedded coordination (such as 
"Paul and Mary and John and Eve played 
tennis. ") 
L...at the beginning, this pointer (together 
with R) serves for connecting the items 
of the list (see fig 2. for olarifica- 
tlon); after the computation is success- 
fully finished, L points to the item on 
the left edge of the interval of items on 
which the current item depends (as in 
"John and Paul who...", see fig. 3) 
R...is a pointer analogical to L; serves for 
connecting the items of the list initial- 
ly, and after the computation,R paints 
to the item on the right edge of the 
interval on which the current item de- 
pends (see fig. 3) 
O. ~.at the beginning, the value of this poin- 
ter equals %o L; however, it does not 
change during the whole computation and , 
hence, keeps the information about the in-- 
put order of the items in the input list 
N.° .during and after the computatio n, the 
value of this pointer is the "head" (in 
the sense used in X-syntax) of the phrase 
re.Presented by the daughter nodes of the 
current node (current item) in the syn- 
tactic (sub)tree 
ZI,,CP~ose:rve as auxiliary pointers in proces- 
sing complicated syntactic constructions 
(coordinations, non-projective construc- 
tions) 
Fig. 2 
Fig. 3 
Fur%her , let P be set of pointers 
~C,CH,L,R,O,H,ZL,ZP} ,and, in addition, let 
THIS be a special pointer the value of which 
:is always the current item. Let NIL be a 
special "empty" value of a pointer. Then, we 
define the following basic operations of the 
automaton : 
DEL(x,y)o.ofor x~P,y~P u \[THIS}; this opera- 
tlon takes the item which is the value of 
y and sets the value of its pointer x to 
NIL 
CON(x,y,z)oo.for xgP,. y,zmp u{THIS}; perfor- 
ming this operation means setting the 
pointcr,x of the item y to the value z 
G0(x)o.ofor x~P; the head of the automaton 
moves from the current item to the item 
which is the value of x. If x=THIS or 
x=NIL the state of the centre} unlt be- 
comes undefined (i.e., an error occurs) 
NEW(x) .... for x6P; a new item is created and 
• the value of x is set to this new item. 
All pointer values of the new item are set 
to NIL, the information part of the item 
is copied from the current item 
WRITE(1)...i is a symbol fro~ the alphabet of 
the automaton; the value of Cat of the 
current Item Is set to i 
All oper~tlons are performed relatively to 
the current item (i.eo, "x" in their descrip- 
tlon means "x of the current item"). Their 
intuitive sense is reshaping the input lls% 
to a more complex structure by means of 
setting and changing the values of pointers. 
Further, these basic operations can be 
combined to complex operations. For the pur- 
poses of description of Czech syntax, we de- 
fined complex operations of the following 
types (again, the pointers are those of the 
current item): 
Z. GO(R) 
II. WRITE(a) 
III. CON(R,L,R) 
CON(L ,R ,L) 
DEL(R ,THIS) 
GO(L) 
iv. GO(L) 
CON(R ,L ,R) 
CON(L ,R ,L) 
DEL(L ,THIS) 
Go(~) 
v. NEW(H) 
CON(R ,L ,H) 
CON(~ ,H ,R) 
CON(L ,H ,L) 
CON(L,R ,H) 
DEL(R ,THIS) 
DEL(H ,THIS ) 
GO(L) 
and other eight complex operations serving 
for processing coordination, non-projective 
constructions etc. The automaton performs 
each of these operations in one step of the 
computation; the next operation to be perfor- 
med is chosen according to the current inter- 
nal state of the control unit and the infor- 
mation read by the head (i.e., information 
contained in the current item and its left 
neighbour in the llst). Performing one step 
of the computation means performing one of 
the complex operations and, possibly, chan- 
ging the internal state of the control unit, 
both according to the transition function of 
the automaton. 
The set of complex operations intro- 
duced has two important features: first, with 
the help of this set, we are convinced, it is 
possible to describe sufficiently complete 
surface syntax of Czech. Second, the set of 
complex operations of the automaton we use 
for the description of Czech syntax guaran- 
tees that any language accepted by the auto- 
maton with these operations is context-free. 
This point probably deserves further discus- 
sion: the matter is that by changing the set 
!of basic operations (i.e., by adding some new 
basic operations and/or by removing the cur- 
rent ones) and/or by limiting the choice and 
ordering of basic operations in an appro- 
.priate Way and/or by limiting the number of 
"visits" of the head on an item of the llst, 
it is possible to characterize the explica- 
tive power of different subtypes of the auto- 
maton and, hence, to characterize different 
types of grammars strongly equivalent with 
the automaton in question. Thus, e.g., cate- 
gorial grammars can be shown to be strongly 
equivalent with automata with operations I-IV 
and with the number of "visits" limited by 
Jconstant; context-free grammars are strongly 
499 
equivalent with automata with complex opera- 
%ions. I-III and V and c6nstant number of vi- 
sits, generalized dependency grammars (this 
term suspiciously resembles the title of 
(Gazdar,Klein Pullum and Sag,85), but was in 
fact introduced as early as in (G}adkij,73)) 
are strongly equivalent with the automaton 
with operations I-IV etc. For automata using 
complex operations different from I-V we have 
not find any strongly equivalent type of 
grammars in literature. But probably the most 
important point concerns weak equivalence: 
any automaton using the complex operations 
defined is weakly equivalent to some ~ context- 
free grammar. (And extending this ' weak gene- 
rative capacity will be possible only on con- 
dition of adding some new complex opera- 
tion(s).) 
3. Conslusions 
The type of automaton introduced is, in 
our opinion, important for several reasons. 
First, it allows for stepwise refinemen~ 
of the set of its complex operations: first, 
only an acceotor might be constructed, and 
only later its operations can be augmented to 
a real parser. Of course, the augmentation of 
the primary acceptor and turning it into the 
parser might be performed in most different 
ways, which allows for incarnating various 
linguistic theories over the initial accept- 
or. Cenerally, we can start the process of 
creating the automaton by cor~structing the 
csmplex operations from basic operations DEL 
and GO only, applying these two basic operat- 
ions ,to the pointers L,R and THIS solely 
(i.e., only pointers from the input list). 
D'ring its computation, such an accept or will 
simulate the derivation of the input string 
(string represented by the'input list). In 
the second step, i.e. in building the parser, 
we augment these primitive complex operations 
by adding other basic operations and/or using 
other pointers, to get, eventually, the in- 
tended parser. 
Second, from the linguistic viewpoint, 
it enables to construct a recognizing automa- 
ton - a full syntactic parser (i.e., an auto- 
maton which gives a syntacticstructure as 
its output) - which, in addition, allows to 
prove the context-freeness of the processed 
languages, but on grounds profoundly diffe- 
rent than those of (Gazdar,Klein,Pullum and 
Sag,85). 
Third, from the formal viewpoint, it 
allows to describe the whole Chomsky hierar- 
chy of languages by a sln__i_nnf~ abstract automa- 
ton with differently limited set of operat- 
ions rather than with a whole set of relati- 
vely unrelated types of machines (Turing ma- 
chine, linearly bounded automaton, pushdown 
automaton, finite automaton): this is because 
the operations of the proposed automaton are 
in fact Just refined operations of the llst 
automaton proposed in (Chytil, Pl&tek and 
Vogel,86). 

References

Chytil M.P., Pl&tek M. and Vogel J.: 
A note on the Chomsky hierarchy, 
Bulletin of EATCS 28, 1986 

Gazdar G., Klein E., Pullum G. and Sag I.: 
Generalized Phrase Structure Grammar, 
Basil Blackwell, Oxford, 1985 

GladkiJ A.V.: FormalmyJe grammatiki i Jazyki, 
Mir, Moscow, 1973 

Pl&tek M. and Vogel J.: Deterministic list 
automata and erasing graphs, in The Prague 
Bulletin of Mathematical Linguistics ~5, 
Prague, 1986 
