Lexicalized Tree Automata-based Grammars for 
Translating Conversational Texts 
Kiyoshi YAMABANA Shinichi ANDO Kiyomi MIMURA 
Computer & Communication Media Research, NEC Corporation 
4-1-1, Miyazaki, Miyamae-ku, Kawasaki 216-8555, JAPAN 
k-yamabana@ct.jp.nec.com s-ando@cw.jp.nec.com k-lnimura@dad p.nec.com 
Abstract 
We propose a new lexicalizcd grammar 
formalism called Lexicalized Tree 
Automata-based Grammar, which lcxicalizes 
tree acccptors instead of trees themselves. We 
discuss the properties of the grammar and 
present a chart parsing algorithm. Wc have 
implemented a translation module for 
conversational texts using this formalism, and 
applied it to an experimental automatic 
interpretation system (speech translation 
system). 
1 Introduction 
Achieving both broad coverage for general texts 
and better quality for texts from a restricted domain 
has been an important issue in practical natural 
 processing. Conversational  is a 
typical domain this problem has been notable, since 
they often include idioms, colloquial expressions 
and/or extra-grammatical expressions while a 
majority of utterances still obey a standard 
grammar. 
Furusc and Iida (1994) proposed an approach to 
spoken- translation based on pattern 
matching on the surface form, combined with an 
cxalnple-based disambiguation method. Since the 
grammar rules are simple patterns containing 
surface expressions or constituent boundaries, they 
are easy to write, and domain-specific knowledge 
can be easily accumulated in the grammar. On the 
other hand, relationships between two trees arc not 
easy to describe, especially when they are separated 
apart on a larger tree. This might become an 
obstacle in expanding a domain-specific grammar 
into a general gralnlnar with a wide coverage. 
Brown (1996) approached to this problem 
employing a nmlti-engine architecture, where 
outputs from Transfer Machine Translation (MT), 
Knowledge-based MT and Example-based MT are 
combined on the chart during parsing. Ruland et al. 
(1998) employs a multi-parser multi-strategy 
architecture for robust parsing of the spoken 
, where the results fi'om different engines 
are combined on the chart using probability-based 
scores. A difficult part with these hybrid 
architectures is that it is not easy to properly 
compare and combine the results fi'om differcnt 
engines designed on different principles. In addition, 
these methods will require much computational 
power, since multiple parsers have to be run 
simultaneously. 
A third approach, such as Takeda (1996), is 
grammar-based. In this approach, a method is 
provided to associate a grammar rule to a word or a 
set of words in order to encode their idiosyncratic 
syntactic behaviour. An associated grammar rule 
can be sccn as a kind of example if it is described 
mostly by the surface level information. As is 
apparent fl'om this description, this approach is an 
application of strong lexicalization of a grammar 
(Schabes, Abeill6 and Joshi, 1988). 
This approach allows coexistence of general 
rules and surface-level patterns in a uniform 
framework. Combination of both types of rules is 
naturally defined. These advantages arc a good 
reason to employ strongly lexicalized grammars as 
the basic grammar formalism. However, wc feel 
there are some points to be improved in the current 
strongly lcxicalized grammar formalislns. 
The first point is the existence of globally 
defined special tree operation, which requires a 
special parsing algorithm. In a strongly lexicalized 
grammar formalism, each word is associated with a 
finite set of trees anchored by that word. The tree 
operations usually include substitution of a leaf 
node by another tree, corresponding to expansion of 
a nonterminal symbol by a rewriting rule in CFG. 
However, if the tree operation is limited to 
substitution, the resulting grammar, namely 
Lexicalized Tree Substitution Grammar (LTSG), 
cannot even reproduce the trees obtained fi'om 
non-lexicalized context free grammars. This will be 
obvious from the fact that for any LTSG, there is a 
926 
constant such that, in any trees built by the 
grammar, the distance of the root node and the 
nearest lexical item is less than that constant, while 
this property does not always hold for CFG. Tree 
Insertion Grammar (TIG), introduced by Schabes et 
al. (1995), had to be equipped with the insertion 
operation in addition to substitution, so that it can 
be strongly equivalent to an arbitrary CFG. The 
insertion operation is a restricted form of the 
adjoining operation in the Lexicalized Tree 
Adjoining Grammar (LTAG) (Joshi and Schabes, 
1992). 
Thus, a special tree operation other than 
substitution is inevitable to strongly lexicatized 
grammars. It is needed to grow an infinite number 
of trees from a finitely ambiguous set of initial trees 
representing the extended domain of locality 
(EDOL) of the word. 
However, such special tree operation requires a 
specially devised parsing algorithm. In addition, the 
algorithm will be operation-specific and we have to 
devise a new algorithm if we want to add or modify 
the operation at all. Our first motivation was to 
eliminate the need for globally defined special tree 
operations other than substitution whenever 
possible, without losing the existence of EDOL. 
Another point is the fact that lexicalization is 
applied only to trees, not to the tree operations. For 
example, in LTAG, initial tree sets anchored to a 
word is not enough to describe the whole set of 
trees anchored by that word, since initial trees are 
grown by adjunction of auxiliary trees. Since an 
auxiliary tree is in the EDOL of another word, the 
former word has limited direct control over which 
auxiliary tree can be adjoined to certain node. For 
detailed control, the grammar writer has to give 
additional adjoining restrictions to the node, and/or 
detailed attribute-values to the nodes that can 
control adjunction through node operations such as 
unification. 
In short, we would like to define a lexicalized 
grammar such that 1) tree operation is substitution 
only, 2) it has extended domain of locality, and 3) 
tree operations as well as trees are lexicalized 
whenever possible. In the next section, we propose 
a grammar formalism that has these properties. 
2 Lexicalized Tree Automata-based 
Grammars 
In this section we introduce Lexicalized Tree 
Automata-based Grammar (LTA-based Grammar) 
and present its parsing algorithm. 
First, we define some basic terminologies. A 
grammar is strongly lexicalized if it consists of 1) a 
finite set of structures each associated with a lexical 
item; each lexical item will be called the anchor of 
the corresponding structure, and 2) an operation or 
operations for composing the structures (Sehabes, 
Abeilld and Joshi, 1988). 
In the following, the word "tree automaton" 
(TA) will be used as a generic term for an 
automaton that accepts trees as input. It can be a 
finite tree automaton, a pushdown tree automaton, 
or any tree-accepting automaton having a state set° 
state transitions, initial and final states, and optional 
memories associated with states. Although our 
argument below does not necessarily require 
understanding of these general TAs, definitions and 
properties of finite and pushdown TAs can be found 
in G6eseg and Steinby (1997) for example~ 
2.1 Definition of LTA-based Grammars 
The basic idea of an LTA-based grammar is to 
associate a tree automaton to each word that defines 
the set of local trees anchored to the word, instead 
of associating the trees themselves. The lexicalized 
tree automaton (LTA) provides a finite 
representation of a possibly non-finite set of local 
trees. This differs from other lexicalized grammars 
as LTAG, where non-finiteness of local trees is 
introduced through a global tree operation such as 
adjunction of auxiliary trees. 
We define a lexicalized tree automata-based 
grammar as follows. LEt X be a set of terminal 
symbols (words), and NT be the set of nonterminal 
symbols disjoint fi'om 27. Let Tw be a set of trees 
(elementary trees) associated with a word w in 2:.. A 
tree in Tw has nodes either from 27 or from N1, and 
its root )and one of its leaves are marked by a 
distinguished symbol self in NT. Let A,v be the tree 
automaton lcxicalized to the word w, which accepts 
a subset of trees obtained by repeatedly joining two 
trees in Tw at the special nodes labelled selfi one at 
the root of a tree and the other at a foot of another 
tree. From this definition~ A1, can be identified with 
a string automaton; its alphabets are the trees in Tw, 
and a string of the elementary trees are identified 
with the tree obtained by joining the elementary 
trees in a bottom-up manner. Sw is a set of 
nonterminal symbols associated with the word Wo 
They are assigned to the root of a tree when the tree 
is accepted by A,~ 
f For each word w, the set A,, l T,,, A,v, Sw} is the 
set of local trees associated with w. The structure is 
described by Aw and 1'w~ the symbol at the root node 
927 
is fl-om Sw, and se./fin the loot is identified with w. 
We denote the family of Aw as A = {A,,,} for w in Z. 
A lexicalized tree automata-based grammar G is 
defined to be the tree algebra whose trees are the 
set union of Aw for all w in Z', and the basic tree 
operation being the substitution, that is, joining two 
trees at the root node and a foot node when they 
have the same nonterminal in NT other than self. 
2,2 Some Remarks 
1 Strictly speaking, the definition above does 
not satisfy the definition of strongly lexicalized 
grammars that the structures associated to a word 
must form a finite set, since the tree set accepted by 
the automaton may be an infinite set. However, 
since a finite device, namely an automaton, 
describes this possibly infinite set, we will classify 
the proposed formalism as a strongly lexicalized 
grallllTlar. 
2 We defined the lexicalized tree automata 
using string automata where the alphabets are trees. 
The latter is obtained by linearizing the constituent 
trees along the spine of the tree. Because the I/I'A 
can be any tree automaton as long as it accepts all 
and only the (possibly infinite) tree set beaded by a 
word, LTA are essentially tree automata. These 
equivalent two pictures (the tree automata picture 
of a tree grammar and the string automata picture 
employed in the definition) will be used 
interchangeably in this paper. 
3 The grarnmar G can also be defined by a tree 
automaton T that accepts all and only the trees of 
the grammar as follows: First we regard NT as the 
set of states of T. Its initial states are .Z', and the 
final states are also NT. Sw is regarded as the set of 
final states of A .... The set of initial states ofAw are 
the set of nonterminal symbols that appear in T,,. 
and w. The LTAs are combined into T through the 
common state set NT. The recognition era tree t 
proceeds in a bottom-up manner, beginning at the 
leaf nodes that are initial states for G and for some 
Aw. When a subtree of T has been recognized by an 
LTA Aw, its root node is in a state s fl'om S,,. Ifs is 
XP 
- / \ Adjunction tO X' can Specifier X" 
/ \occur arbitrary times 
\ X' 
/ ~ adjunct 
X complenletlt 
Figure I. General fOllll o1" X-bar theory 
an initial state of another LTA A,., the recognition 
can proceed. The tree t has been successfully 
recognized if the recognition step arrives at the root 
node. 
2.3 Examples 
Adjunetion in the X-bar Theory 
We demonstrate how the proposed formalism 
handles the simplest case of an infinite set of the 
local trees. The example is repeated adjunction at 
the bar level 1 of the X-bar theory. Figure I shows 
a general scheme of the X-bar theory. X' at the bar 
level 1 can be paired with some adjunct arbitrary 
times betbre it grows to the phrase level, XP. 
Figure 2 shows how this scheme is realized in the 
LTA-based grammar tbrmalism. Figure 2 (a) shows 
the tree set associated with the word. It consists of 
three trees, corresponding to the bar levels. r3 is for 
the complement, "r2 for adjunction, and T3 for the 
specifier. (b) shows the tree automaton associated 
with this word in the (tree-alphabet) string 
automaton representation. It first accepts "I'~, then 
"I'2 arbitrary times, finally T3 to arrive at the final 
state. This sequence is identified with the trees in 
(d), obtained by concatenating "Fi through T3 in a 
bottom-ut) manner. When the ETA arrives at the 
final state, the root node is given a nonternlhml 
symbol lYom the set in (c), which is XP. 
Tree Adjoining Language 
Figure 3 shows a I~I'AG that generates a strictly 
context sensitive  anb"ec'~d n. The unique 
initial tree 7" in (a) is adjoined repeatedly by the 
unique auxiliary tree A in (b) at the root node 
labeled S. The root and foot of A is labeled S, but 
adjunction to them is inhibited by the index NA. (c) 
shows a tree obtained by adjoinhlg A once to T. 
Generally a string a"b"ec"d" is obtained as the yield 
Tz 
sell" TiI\ 
sell" colnplement 
sell" sell" 
sell" adjunct spccil\]cr self 
sell `=. XP /\ 
specifier sell" re/waled 
sell" \\ a4iuncI 
self ~ complcn~enl 
(a) Tree set T,,, (d) accepted trees 
(b) -rree {ltitoln\[ll()n a,,, in Jls TI . T.,n . .r3 
tree sequence representation 
(c) set of start s) mbol S,,, {Xl': 
Figure 2. I/fA°representation of the tree in Figure 1. 
928 
S S,,A S 'x 
e a S c 
b S,',:\ c 
a) initial tree a 
b) auxil iaD' tree e 
c) "1" adjoined once 
Figure 3: LTAG tbr a"b'~ecIkt '' 
Ca) '.voi'd 
(b) tree set 
sell" 
a sell" d 
F I 
self" /?\ 
h self c 
• I 2 
(C) tree autonlaton(Te).(Ti). 
(d) slari symbols ~,<,j 
sell- F, 
a~d 
b set f c 
b self c I 
sell ":~ e 
(c) trec till i! :: 2 
Figure 4: l,TA-based grallllllar lilt a'%"ec"d" 
era tree produced by adjoining :I n-times to T. 
The same  can be expressed by an 
IM'A-based grammar shown in l:igure 4. "lhe word 
is e Ca). The tree set associated \rich e consists of 
two trees TI and 'I', as shown in (b). The local 
automaton is a pushdown automaton that accepts 
tree sequence (Tx)n(Tl)n, and accepted trees are 
given the nonterminal symbol S as in (d). (e) shows 
a Wee with n = 2. From this setting, it is apparent 
that ibis l,l'A-based grallllnar generates the same 
 as the TAI, in the figure 3. 
By extending this construction, it will be obvious 
that for any I/FAG, all equivalent 171'A-based 
grammar can be constructed within the class of 
pushdown LTA. 
2.,4 Parsing, LTA-based Gramnlars 
Parsing algorithm lbr the I/l'A-based granlnlar is 
a <;traiglltforward extension of the CFG case. In the 
CFG case, an active edge is represented by a 
rewriting rule with a dot in the right hand side. The 
dot shows that the terminals and ilOll-termillals up 
to that location have been ah'eady t'(')tllld, and the 
rule application becomes a success when the (lot 
reaches the end of the rule. If we regafd the 
right-haild side of a rule as an automaton accepting 
a sequence of terminals and non-terminals, with the 
clot representing the current state, this picture can 
be easily generalized to the LTA-based grammar 
case. 
Figure 7 shows an example parse for the sentence 
"'He eats dinner". Figure 5 shows the dictionary 
content of the verb "'eats", which is basically the 
same as llgure 2. Figure 6 shows the dictionary of 
"he" and "dinner". We suppose here that these 
words have no associated trees for simplicity. The 
basic strategy is left to right bottom-up chart 
pall sing. 
First, edges el, e, and e3 are loaded into the chart 
and set to the initial state. They correspond to "'he", 
"'eats" and "'dinner" respectively. The parsing 
proceeds from left to right, and the parser triggers 
the I,TA of el first. Since its only possible 
transition is a null transition, it arrives at the llnal 
state immediately and creates an edge e4 labelled 
,~,'111)/. 
Then the focus moves one step to the right on the 
chart and the I,TA of e, is activated. It tries to find 
Hie tree T,I, and finds that an edge labelled d(;D is 
necessary to its right. Since there is no such edge, 
the I/I'A creates an active edge t with a hole Uoh, as 
in the case of'CFG, and the I/I'A goes into a pause 
waiting tBr the hole to be filled. 
Creation of e5 from e3 is simihu" to the creation of 
% t'rorn el. Then e5 starts the completion step as in 
the CI:(I case. At this step, the active edge created 
above is fouild, and es is found to match the hole. 
Then the I/FA of e, is reactivated, arrives at the 
stale Sl, then creates an edge ca. 
Next the I/I'A of e(, is actiwited. It tries to find 
"t'a, or Ta3 In searching for Ta;, an active edge with 
a hole t)o.vlmcJd is created. While searching for Ta,, 
the I.TA finds that an edge labelled mutT/to the left 
r,, __ i- _ 
(--~df_) ( dob ) 
C~dC> <i, > 
Ta3 ~ "~ 2 
final state :senle/ice 
Ca) tree set (b) tree automaton 
Figure 5:1 ,TA of"eats" 
I .,\miv¢ edges are not shm~n hi the lT~tm: 
929 
final state su\]Tj final state : 
s'ubj I dob t prepol!j 
he dinner 
Figure 6: LTAs of"he" and "'dirmer'" 
is what is necessary, and finds that e4 satisfies this 
condition. By accepting Ta2, tile LTA creates the 
edge ev, label it as senlence and advances to tile 
final state. There is no more possible action on the 
Chart, and the parsing is completed successlhlly. 
Please note that the algorithm exemplified above 
does not depend on the concrete form of LTA. The 
satne algorithm can be applied to pushdown 
atttomata and other class of automata having 
internal memories. 
3 Translation Module 
We buih a bi-directional translation system between 
the ,lapanese and English s usitlg the 
proposed method. It translates conversational texts 
as will appear in a dialogue between two people, to 
help them communicate in a foreign travel situation. 
Figttre 8 shows an overview of the system. 
3.1 Translation Engine 
Since each word in tile dictionary has its own tree 
set and tree automaton, a simple implementation 
will lead to inefficiency. To cope with this problem, 
we provided two mechanisms to share tile UI'A. A 
"rule template" mechanism is provided to share the 
triplet, tmrnely Aw it1 tile definition of I+TA, while a 
"'shared tree" mechanism is provided to sllare the 
elementary trees among different A+. 
The rule template is applied just after dictionary 
loading, and assigns an LTA to a word that matches 
the condition in tile template. It is mainly used for 
words such as cotnmon nouns. A shared tree is 
represented by a pointer to an elementary tree in the 
pool, and is loaded into the systetn when it is tised 
lbr tile first time. 
The  conversion method is based on 
synchronous deriwttion of analysis and generation 
trees, basically the same as the syntax directed 
translation by Aho and Ulhnan (1969) and the 
synchronous LTAG by Shieber and Schabes (1990). 
In this method, elementary analysis tree of each 
word is paired with another tree (elementary 
generation tree). Starting from the root, at each 
(?- 
A ~4 
AL?I 
".. C-c,,-s~b 
I I +3 /'~_ 
_ I +, 
r~777--~ suhl - I I ~- 85 ,)1% "+ 
T . e: ! I, e.; *.. 
he i - ' eats dinner i 
Figure 7: ExamlJle Parse 
node of the analysis tree+ the direct descendant 
nodes are reordered in the generation tree, 
according to tile correspondence of elementary 
analysis and generation trees. Tiffs translation 
mechanisna is basically independent of how the 
analysis tree is constructed, hence the grammar 
l'ornmlistri. In our implerrientation, the gerleration 
tree is a call graph of target  generating 
ftirmtioils, which enables detailed procedural 
control over the syntactic generation process. 
3.2 Grammars and Dictionaries for 
English to Japanese Transhltion 
The English to Japanese translation grammar and 
dictionary has been developed. In order to achieve 
wide coverage for general input and high quality 
for tile target dolnain, we developed general 
gramrrlar l't.iles and donlain-specific rules 
sinltlltaneot, isly. (\]eneral rules are based on a 
standard X-bar theory. Nodes of a tree are 
associated with attribute-value structure in a 
standaM way. As nonterminals, we employed a 
grammatical function-centered approach as lank 
Grammar (Sleator and Temperley, 1991). A phrase 
Input i,,+ 
++ 
I__ I/~c Tcn'i +>tatcsj 
~ Grammar R,tos ~ i,al A~na++ ~~ __ 
t ,Z,L,,<,,,Cy x,<,,,,,,<,,<,<+_,, 
<<+ 
i_ ~Jt. d ...... \[ ()12,/C,'\[}t ll.ll\]\] 
1 J} 
Output t % 
K(II'~2-~(I }' I '+~}'okl -ll>O I\[I\])UI'-H ,~J<J " m 
/h'-~ub\] l/llillt'l'-d++ti C+H 
Figut'e 8: Trai~slatior~ Modttle 
930 
level node is assigned attribute-values that express 
their syntactic ftlnction Stlch as subject, direct 
object, etc, instead of a single part-of-speech 
symbol such as NP. This approach is suitable to 
capture idiosyncratic behaviour of words. 
Domain-specific rules are mostly pattern-like 
rules with special attention to aspects that are 
important for carrying conversations, such as 
modality and the degree of politeness. The English 
to Japanese translation dictionary contains about 
seventy thousand words. The number of words that 
required individual I,TA was a few thousand at tile 
time of this report. 
3°3 Current Status of Implementation 
The system has been iml~lemented using C++, 
and runs on Windows 98 and NT. The requirelnent 
is Pentium I\] 400MItz or above for tile CPU~ about 
61) MB of memory, and 200 MB of disk space. 
Most of the disk space is used for statistical data for 
disambiguation. 
We performed a preliminary evaluation of the 
translation quality of l';nglish to Japanese 
translation. A widely used COlnmercial systeln was 
chosen as a reference system, of which the 
dictionaries were expanded for the target domain, 
t:ive hundred sentellces were randomly chosen from 
a large (about 40K) pool of conversatiolml texts of 
the target domain. '\['hen the output of our system 
and the reference system were mixed, and then 
presented to a single evahmtor at a random order. 
The evaluator classified them into (Bur levels 
(natural, good, understandable and bad). The result 
showed that tile number of sentences classified to 
"'natural" increased about 45% compared to that of 
tlle reference system, i.e. tile ratio of the ntlmber of 
sentences was arotllld 1.45. The ntllllber o|" 
sentences classified as "bad" decreased about 40% 
in the same measure. 
We applied this module to an experimental 
speech translation system (Watanabe et al., 2000). 
4 l)iscussions 
The proposed granllnar fornlalism is a kind of 
lexicalized granll'nar fcnTnalisnl and shares its 
advantages. The largest difference frolll other 
strongly lexicalized granunar tbrnlalisms is that it 
employs lexicalized tree automata (I,TA) to 
describe the tree set associated with a word, which 
allows a finite description of a non-finite set of 
local trees. These automata's role is equivalent to 
additional tree operations in other formalisms. In 
addition, an LTA provide an extended domain of 
locality (EDOL) of the word. 
If all the LTAs are finite automata in the string 
automaton representation, then the tree  
recognized by this grammar is regular and its yield 
is a context-free . The grammar can accept 
general Tree Adjoining Language (TAL) if" the 
LTAs belong to the class of pushdown atttomata in 
the string autonlaton representation. This is a 
reflection of tile thct that pushdown tree automata 
can accept the indexed s (Gdcseg and 
Steinby, 1997), of which the TAL is a subclass. 
As shown in the section 2.4, the control strategy 
of bottom-up chart parsing does not rely ell the 
concrete content of the I,TA, which is an adwmtage 
of the proposed formalism. This implies that we can 
alter even tile grammar class without affecting the 
parsing. Suppose the current L'I'As are finite 
automata, hence the yield  is context-free. 
If we want to introduce a word e that induces a 
non-context-fi'eeness, such as e in a"b"ec"d", then 
what we have to do is to write a pushdown 
automaton in tile figure 4 li)r the word e. We 
change neither tile grammar formalism nor the 
parsing algorithm, and the change is localized to the 
LTA of e. 
Writing automata by hand may seem much more 
complex than writing trees, but our experience 
shows that it is not nlucll different fronl 
convelltional granHllar development. As long as 
appropriate notations are used, writing automata for 
a word anlounts to detornlining possible t'olnl of 
trees headed by that word, a task ahvays required in 
gramil-iar development. In fact, thei'e is tess alllOtllli 
of work since tile gralllnlar writer does not need to 
pay attention to assigning proper nontcrminals 
and/or proper attributes to internal nodes of trees in 
order to control their growth. 
It is another advantage of the proposed 
formalisnl that it can utilize various autorriata 
operations, such as conlposition and intersection. 
For exanlple, a word can append an atltOlllatoll to 
thai of the headword when it becomes a child, 
which enables to specify a constraint fi'onl a 
h)wer-positioned word to a higherq)ositioned v~oi'd 
in tile tree. Another example is cootdination. Two 
edges are conjoined when tile unapplied parts of 
I,TAs have nonempty intersection as automata, and 
tile conjoined edge is given with this intersection as 
the lTfA. Verb phrase conjunction such as ",lohn 
eats cookies and drinks beer" is handled in this 
manner, by conjoining "'eats cookies" and "'drinks 
beer. The intersected automaton will accept the 
subject tree and other sentence-level trees. 
931 
In the proposed method, elementary trees are 
always anchored by the syntactic headword. For 
example, a verb iit a relative clause is in the EDOL 
of the antecedent. Then, if the embedded verb puts 
a constraint on the antecedent, that constraint is not 
expressed in a straightforward manner, which may 
seem a weakness of the method. We just poiltt out 
that this type of problem occurs when the syntactic 
head and the semantic head are different, and is 
common to lexicalized grammars as long as a tree 
is anchored to one word, because constraints are 
often reciprocal. In our current implementation, the 
constraint written in the verb's dictionary is found 
and checked by the relative-clause-tree accepting 
automaton of the antecedent noun. 
There have been many work on syntactic analysis 
based on automata attached to the headword. Evans 
and Weir (1998) used finite state automata as 
representation of trees that can be merged and 
minimized to improve parsing efficiency, lit their 
method, the granlnlar is fixed to be I,TAG or seine 
lexicalized grammar and the automata are obtained 
by automatic conversion from the trees. Our 
nlethod differs frol11 theirs ilt the poiltt that ours 
employs trees as the basic object of automata, 
which enables to handle general recursive 
adjunction in LTAG, while their automata work Oll 
tile nonterillinal and ternlinal synlbols, lit the center 
O\[" Ot.lr method is the notion of" the local orallllllal of 
a word. "\['he whole grammar is divided into the 
global part and the set of h)cal gralltntars specific to 
the words, which is represented by tile LTAs. 
Alshawi (1996) introduced ttead Atitornata, a 
weighted finite machine that accepts a pair o\[" 
sequences of relation symbols. The difference is 
similar as above. Since the tree automata in our 
method are used to define the set of the local trees, 
their role will be equivalent to building the head 
automata themselves, but not to combining the trees 
that are already built, like the I lead Automata. 
5 Conchlsion 
We proposed a new lexicalized grammar formalism, 
called Lexicalized Tree Automata-based Grammar. 
In this formalism, the trees anchored to a word are 
described by a tree automaton associated with it. 
We showed a chart parsing algo,ithm that does not 
depend on the concrete content of the automata. We 
have imt)lemented a bi-directional translation 
module between Japanese and English liar 
conversational texts using this formalism. A 
preliminary evaluation of English to Japanese 
translation quality revealed a promising result. 
Acknowledgement 
We would like to thank Shinichiro Kamei tia" useful 
discussions and Yoshinori Ishihara for his help in 
the implelnentation work. 

References 

Abeilld, A., Schabes, Y. and .loshi, A.K. (1990). t.'si,z<.,, 
l.exicalized 771(;s fin" Machine 77"an.dalion. tn 
f~roceedin<<.,,s oi( 'OI, LVG- 90. p p. 1-6. 

Aho. A.V.. and Ulhn:.ul..I.1). (1969). I'roFe#'lie.v qf,Slvntax 
Directed 7)'anslations..Iotlrllal of C(Hllrltlicr {tlld g}stelll 
Sciences. vol.3, pp.319-334. 

AIshawi. II. (19961. Ilead ..haomam and Bilini,,ual Tilin<<,c 
7)'aHs/alioll with Minima/ Rel)re.ventalion.v. In 
Proceedin.Ts of 34 If' .4#lnlla\[ .lleetlll£, (g {'onl/)lttattolla/ 
Linguistics, pp. 167-176. 

P, rown. P,.D. (1996). ICvamlJle-Based ,l/achine 7)'all.v/all'oil 
ill the Pan gloss ,71'.s'lenl. hi Proceedinw of ('Ol.l,VG-96. 
pp. 169-174. 

\]'vans, R. and Weir, I).,I. (1998). ,4 .S'tructure-,S'harink, 
Parser jot Lexicalized (;rantmars. In tb'oc'eedi, g.s" of 
COLING-,.ICL "98. pp.372-378. 

Furuse, O. and lida. It. (1994). ('onstituenl Boundary 
Parsing .for 1Crample-Ba,sed .Wachine 77"culs/ation. In 
l'roceedin£,s o/('Ol, lNG-94, pp. 105- I I I. 

Gdcsc,, F. and Stcinby, M. (1997). Tree lA.ulguagcs. In 
t/andbook of Formal l.anguages. G. P, osenberg and it. 
Salomaa. trillers. Springer Vcrlag. Vol.3. pp. 1-68. 

Joshi. A.K. and Schabcs. Y. (1992). Tree-..tcOoinin£4 
(Trammars and l, exicali_-ed Grammars. II1 "\['I'CC i\tltOlllala 
and lxmguages. M. NiXal and A. l>odelski, cd,, Elsevier 
Science l~ublishcrs B.V., pp.409-43 I. 

ltuland. T., ltupp. C_I., Spilkcr. J.. \Vcbcr, It. itlld \VorilL K. 
(1998). Jlakin<t~ The Most qf :WulttTUiciO'.- .-t Multi-I'ar.~er 
.llu/ti-.S'lrategy .-Irc/H'tecture for Hie l?olm.vt /~r,'.>c'e,v.vin~ of 
.q'poken l.ang ta£,e In IJroceedin,v.v oJ I('.S'/,/~ '98. 
pp. 1163-. I 166. 

Schabes, Y.. Abcill6. A. and Joshi. A. (1988). /~ar.viu£, 
,~'t#'ategiex with "l, exicali=ed' (h'ammars. In />roceedin£,.s" 
o/( OI, I.\G 8<S. pp.>78-_ 8.~. 

Schabcs. Y. and \Vatcrs. R.C. (1995). Tree ht.~'ertion 
(;#'CIIIIIII\[ll': .l ('u/Uc-JTme. Par.valUe /.'ormali.vm l/lat 
Lexicah-e.~" ('onte:~t-I;)'ee Oralllnlglr without ('han,,,,in<,.,, the 
Trees Prod\teed. Computational l,inguisiics. Vol. 21, 
pp.479-513. 

Shicbcr. S.N,1. and Schabcs, Y. (1990)..S'vnchronou.s Tree 
..le(ioinin£, Grammars. In Prlweedinr~4s ¢)/" (.'()I.I.V(;'90. 
pp.253-258. 

Slcaim. I).I).K. and Tempcrley. \[). (199t). Par\in,, IGz,c, li.dl 
Wl'lJt el l.l'tgk (;#'anlnlar. CM\[J lechnical P, eport 
('N'llJ-C'.q-91- 196. 

Takcda. K. (19961. I~altermBa.seU .~lachine 77"an.v/alien. In 
l'roceedmgs qICOI.IN(; '96. pp. I 155- I 158. 

\Valai~abe. T.. ()kunltlra, A...'qakai. S.. Yail/abal'la. K. alld 
l)oi..'4. (2000)..IIIlOIIIHIIL' \]IIlL'ITJI'L'ILIII'OII. "\['O apl:,ear in 
NI!(_' Technical Jourrial. Vol.53. No.6. (in .lapancseL 
