Proceedings of the 8th International Workshop on Tree Adjoining Grammar and Related Formalisms, pages 1–8,
Sydney, July 2006. c©2006 Association for Computational Linguistics
The Hidden TAG Model: Synchronous Grammars for Parsing
Resource-Poor Languages
David Chiang∗
Information Sciences Institute
University of Southern California
4676 Admiralty Way, Suite 1001
Marina del Rey, CA 90292, USA
chiang@isi.edu
Owen Rambow
Center for Computational Learning Systems
Columbia University
475 Riverside Dr., Suite 850
New York, NY, USA
rambow@cs.columbia.edu
Abstract
This paper discusses a novel probabilis-
tic synchronous TAG formalism, syn-
chronous Tree Substitution Grammar with
sister adjunction (TSG+SA). We use it
to parse a language for which there is
no training data, by leveraging off a sec-
ond, related language for which there is
abundant training data. The grammar for
the resource-rich side is automatically ex-
tracted from a treebank; the grammar on
the resource-poor side and the synchro-
nization are created by handwritten rules.
Our approach thus represents a combina-
tion of grammar-based and empirical nat-
ural language processing. We discuss the
approach using the example of Levantine
Arabic and Standard Arabic.
1 Parsing Arabic Dialects and Tree
Adjoining Grammar
The Arabic language is a collection of spoken
dialects and a standard written language. The
standard written language is the same throughout
the Arab world, Modern Standard Arabic (MSA),
which is also used in some scripted spoken com-
munication (news casts, parliamentary debates).
It is based on Classical Arabic and is not a na-
tive language of any Arabic speaking people, i.e.,
children do not learn it from their parents but in
school. Thus most native speakers of Arabic are
unable to produce sustained spontaneous MSA.
The dialects show phonological, morphological,
lexical, and syntactic differences comparable to
∗This work was primarily carried out while the  rst au-
thor was at the University of Maryland Institute for Advanced
Computer Studies.
those among the Romance languages. They vary
not only along a geographical continuum but also
with other sociolinguistic variables such as the ur-
ban/rural/Bedouin dimension.
The multidialectal situation has important neg-
ative consequences for Arabic natural language
processing (NLP): since the spoken dialects are
not of cially written and do not have standard or-
thography, it is very costly to obtain adequate cor-
pora, even unannotated corpora, to use for train-
ing NLP tools such as parsers. Furthermore, there
are almost no parallel corpora involving one di-
alect and MSA.
The question thus arises how to create a statisti-
cal parser for an Arabic dialect, when statistical
parsers are typically trained on large corpora of
parse trees. We present one solution to this prob-
lem, based on the assumption that it is easier to
manually create new resources that relate a dialect
to MSA (lexicon and grammar) than it is to man-
ually create syntactically annotated corpora in the
dialect. In this paper, we deal with Levantine Ara-
bic (LA). Our approach does not assume the exis-
tence of any annotated LA corpus (except for de-
velopment and testing), nor of a parallel LA-MSA
corpus.
The approach described in this paper uses a spe-
cial parameterization of stochastic synchronous
TAG (Shieber, 1994) which we call a  hidden TAG
model. This model couples a model of MSA
trees, learned from the Arabic Treebank, with a
model of MSA-LA translation, which is initial-
ized by hand and then trained in an unsupervised
fashion. Parsing new LA sentences then entails si-
multaneously building a forest of MSA trees and
the corresponding forest of LA trees. Our imple-
mentation uses an extension of our monolingual
parser (Chiang, 2000) based on tree-substitution
1
grammar with sister adjunction (TSG+SA).
The main contributions of this paper are as fol-
lows:
1. We introduce the novel concept of a hidden
TAG model.
2. We use this model to combine statistical ap-
proaches with grammar engineering (specif-
ically motivated from the linguistic facts).
Our approach thus exempli es the speci c
strength of a grammar-based approach.
3. We present an implementation of stochas-
tic synchronous TAG that incorporates vari-
ous facilities useful for training on real-world
data: sister-adjunction (needed for generating
the  at structures found in most treebanks),
smoothing, and Inside-Outside reestimation.
This paper is structured as follows. We  rst
brie y discuss related work (Section 2) and some
of the linguistic facts that motivate this work (Sec-
tion 3). We then present the formalism, probabilis-
tic model, and parsing algorithm (Section 4). Fi-
nally, we discuss the manual grammar engineering
(Section 5) and evaluation (Section 6).
2 Related Work
This paper is part of a larger investigation into
parsing Arabic dialects (Rambow et al., 2005; Chi-
ang et al., 2006). In that investigation, we exam-
ined three different approaches:
• Sentence transduction, in which a dialect sen-
tence is roughly translated into one or more
MSA sentences and then parsed by an MSA
parser.
• Treebank transduction, in which the MSA
treebank is transduced into an approximation
of a LA treebank, on which a LA parer is then
trained.
• Grammar transduction, which is the name
given in the overview papers to the approach
discussed in this paper. The present paper
provides for the  rst time a complete tech-
nical presentation of this approach.
Overall, grammar transduction outperformed
the other two approaches.
In other work, there has been a fair amount of
interest in parsing one language using another lan-
guage, see for example (Smith and Smith, 2004;
Hwa et al., 2004). Much of this work, like ours,
relies on synchronous grammars (CFGs). How-
ever, these approaches rely on parallel corpora.
For MSA and its dialects, there are no naturally
occurring parallel corpora. It is this fact that has
led us to investigate the use of explicit linguistic
knowledge to complement machine learning.
3 Linguistic Facts
We illustrate the differences between LA and
MSA using an example:
(1) a. a0a1a3a2a5a4a7a6a9a8a7a10a11a0a13a12 a14a16a15a11a17a19a18a21a20a23a22a19a24a25a27a26a28a10a11a0 (LA)
AlrjAl
the-men
byHbw
like
$
not
Al$gl
the-work
hdA
this
the men do not like this work
b. a4a30a29a32a31a33a10a11a0a34a0a35a36a2a37a22a19a24a25a13a26a28a10a11a0a34a38a34a17a19a39a23a40 (MSA)
lA
not
yHb
like
AlrjAl
the-men
h*A
this
AlEml
the-work
the men do not like this work
Lexically, we observe that the word for ‘work’
is a4a30a6a9a8a7a10a41a0 Al$gl in LA but a4a30a29a42a31a9a10a11a0 AlEml in MSA.
In contrast, the word for ‘men’ is the same in both
LA and MSA: a22a19a24a25a27a26a28a10a11a0 AlrjAl. There are typically
also differences in function words, in our example
a12 $ (LA) and a40 lA (MSA) for ‘not’. Morpholog-
ically, we see that LA a14a9a15a21a17a19a18a21a20 byHbw has the same
stem as MA a38a43a17a19a39 yHb, but with two additional
morphemes: the present aspect marker b- which
does not exist in MSA, and the agreement marker
-w, which is used in MSA only in subject-initial
sentences, while in LA it is always used.
Syntactically, we observe three differences.
First, the subject precedes the verb in LA (SVO
order), but follows in MSA (VSO order). This is
in fact not a strict requirement, but a strong pref-
erence: both varieties allow both orders, but in the
dialects, the SVO order is more common, while in
MSA, the VSO order is more common. Second,
we see that the demonstrative determiner follows
the noun in LA, but precedes it in MSA. Finally,
we see that the negation marker follows the verb
in LA, while it precedes the verb in MSA. (Lev-
antine also has other negation markers that pre-
cede the verb, as well as the circum x m- -$.) The
two phrase structure trees are shown in Figure 1
in the convention of the Linguistic Data Consor-
tium (Maamouri et al., 2004). Unlike the phrase
2
S
NP-TPC
a22a19a24a25a13a26 a10a41a0
‘men’i
VP
V
a14a16a15a21a17a19a18a21a20
‘like’
NEG
a12
‘not’
NP-SBJ
ti
NP-OBJ
N
a4a7a6a9a8a7a10a11a0
‘work’
DET
a0a1a3a2
‘this’
S
VP
NEG
a40
‘not’
V
a38a43a17a19a39
‘like’
NP-SBJ
a22a19a24a25a13a26a28a10a11a0
‘men’
NP-OBJ
DET
a0a35a3a2
‘this’
N
a4a30a29a42a31a9a10a11a0
‘work’
Figure 1: LDC-style left-to-right phrase structure trees for LA (left) and MSA (right) for sentence (1)
a14a16a15a21a17a19a18a11a20 ’like’
a4a30a6a33a8a7a10a11a0
‘work’
a0a1a36a2 ‘this’
a22a19a24a25a13a26 a10a41a0
‘men’
a12
‘not’
a38a43a17a19a39 ‘like’
a4a30a29a42a31a9a10a11a0
‘work’
a0a35a3a2 ‘this’
a22a19a24a25a13a26 a10a41a0
‘men’
a40
‘not’
Figure 2: Unordered dependency trees for LA (left) and MSA (right) for sentence (1)
NP
NNP
Qintex
S
NP VP
VBD
sold
NP
PRT
RP
off
NP
NNS
assets
(α1) (α2) (α3) (α4)
Figure 3: Example elementary trees.
structure trees, the (unordered) dependency trees
for the MSA and LA sentences are isomorphic, as
shown in Figure 2. They differ only in the node
labels.
4 Model
4.1 The synchronous TSG+SA formalism
Our parser (Chiang, 2000) is based on syn-
chronous tree-substitution grammar with sister-
adjunction (TSG+SA). Tree-substitution grammar
(Schabes, 1990) is TAG without auxiliary trees or
adjunction; instead we include a weaker composi-
tion operation, sister-adjunction (Rambow et al.,
2001), in which an initial tree is inserted between
two sister nodes (see Figure 4). We allow multi-
ple sister-adjunctions at the same site, similar to
how Schabes and Shieber (1994) allow multiple
adjunctions of modi er auxiliary trees.
A synchronous TSG+SA is a set of pairs of el-
ementary trees. In each pair, there is a one-to-one
correspondence between the substitution/sister-
adjunction sites of the two trees, which we repre-
sent using boxed indices (Figure 5). A derivation
then starts with a pair of initial trees and proceeds
by substituting or sister-adjoining elementary tree
pairs at coindexed sites. In this way a set of string
pairs 〈S,Sprime〉 is generated.
Sister-adjunction presents a special problem
for synchronization: if multiple tree pairs sister-
adjoin at the same site, how should their order on
the source side relate to the order on the target
side? Shieber’s solution (Shieber, 1994) is to al-
low any ordering. We adopt a stricter solution: for
each pair of sites,  x a permutation (either iden-
tity or reversal) for the tree pairs that sister-adjoin
there. Owing to the way we extract trees from the
Treebank, the simplest choice of permutations is:
if the two sites are both to the left of the anchor
or both to the right of the anchor, then multiple
sister-adjoined tree pairs will appear in the same
order on both sides; otherwise, they will appear in
the opposite order. In other words, multiple sister-
adjunction always adds trees from the anchor out-
ward.
A stochastic synchronous TSG+SA adds prob-
abilities to the substitution and sister-adjunction
operations: the probability of substituting an ele-
mentary tree pair 〈α,αprime〉 at a substitution site pair
3
S
NP
NNP
Qintex
VP
VP
VBD
sold
NP
NNS
assets
PRT
RP
off
⇒
S
NP
NNP
Qintex
VP
VBD
sold
PRT
RP
off
NP
NNS
assets
Figure 4: Sister-adjunction, with inserted material shown with shaded background









S
NPi↓ 1 VP
V
a14a9a15a21a17a19a18a21a20
‘like’
NP
ti
NP↓ 2
,
S
VP
V
a38a43a17a19a39
‘like’
NP↓ 1 NP↓ 2









Figure 5: Example elementary tree pair of a synchronous TSG: the SVO transformation (LA on left,
MSA on right)
4
〈η,ηprime〉 is Ps(α,αprime | η,ηprime), and the probability of
sister-adjoining 〈α,αprime〉 at a sister-adjunction site
pair 〈η,i,ηprime,iprime〉 is Psa(α,αprime | η,i,ηprime,iprime), where
i and iprime indicate that the sister-adjunction occurs
between the i and (i + 1)st (or iprime and (iprime + 1)st)
sisters. These parameters must satisfy the normal-
ization conditions
summationdisplay
α,αprime
Ps(α,αprime | η,ηprime) = 1 (1)
summationdisplay
α,αprime
Psa(α,αprime | η,i,ηprime,iprime) +
Psa(STOP | η,i,ηprime,iprime) = 1 (2)
4.2 Parsing by translation
We intend to apply a stochastic synchronous
TSG+SA to input sentences Sprime. This requires pro-
jecting any constraints from the unprimed side of
the synchronous grammar over to the primed side,
and then parsing the sentences Sprime using the pro-
jected grammar, using a straightforward general-
ization of the CKY and Viterbi algorithms. This
gives the highest-probability derivation of the syn-
chronous grammar that generates Sprime on the primed
side, which includes a parse for Sprime and, as a by-
product, a parsed translation of Sprime.
Suppose that Sprime is a sentence of LA. For the
present task we are not actually interested in the
MSA translation of Sprime, or the parse of the MSA
translation; we are only interested in the parse of
Sprime. The purpose of the MSA side of the grammar
is to provide reliable statistics. Thus, we approxi-
mate the synchronous rewriting probabilities as:
Ps(α,αprime | η,ηprime)
≈ Ps(α | η)Pt(αprime | α) (3)
Psa(α,αprime | η,i,ηprime,iprime)
≈ Psa(α | η,i)Pt(αprime | α) (4)
These factors, as we will see shortly, are much eas-
ier to estimate given the available resources.
This factorization is analogous to a hidden
Markov model: the primed derivation is the obser-
vation, the unprimed derivation is the hidden state
sequence (except it is a branching process instead
of a chain); the Ps and Psa are like the transition
probabilities and the Pt are like the observation
probabilities. Hence, we call this model a  hidden
TAG model. 
4.3 Parameter estimation and smoothing
Ps and Psa are the parameters of a monolingual
TSG+SA and can be learned from a monolingual
Treebank (Chiang, 2000); the details are not im-
portant here.
As for Pt, in order to obtain better probability
estimates, we further decompose Pt into Pt1 and
Pt2 so they can be estimated separately (as in the
monolingual parsing model):
Pt(αprime | α) ≈ Pt1(¯αprime | ¯α,wprime,tprime,w,t) ×
Pt2(wprime,tprime | w,t) (5)
where w and t are the lexical anchor of α and its
POS tag, and ¯α is the equivalence class of α mod-
ulo lexical anchors and their POS tags. Pt2 repre-
sents the lexical transfer model, and Pt1 the syn-
tactic transfer model. Pt1 and Pt2 are initially as-
signed by hand; Pt1 is then reestimated by EM.
Because the full probability table for Pt1 would
be too large to write by hand, and because our
training data might be too sparse to reestimate it
well, we smooth it by approximating it as a linear
combination of backoff models:
Pt1(¯αprime | ¯α,wprime,tprime,w,t) ≈
λ1Pt11(¯αprime | ¯α,wprime,tprime,w,t) +
(1 − λ1)(λ2Pt12(¯αprime | ¯α,wprime,tprime) +
(1 − λ2)Pt13(¯αprime | ¯α)) (6)
where each λi, unlike in the monolingual parser,
is simply set to 1 if an estimate is available for that
level, so that it completely overrides the further
backed-off models.
The initial estimates for the Pt1i are set by hand.
The availability of three backoff models makes it
easy to specify the initial guesses at an appropri-
ate level of detail: for example, one might give a
general probability of some ¯α mapping to ¯αprime using
Pt13, but then make special exceptions for partic-
ular lexical anchors using Pt11 or Pt12.
Finally Pt2 is reestimated by EM on some held-
out unannotated sentences of Lprime, using the same
method as Chiang and Bikel (2002) but on the syn-
tactic transfer probabilities instead of the mono-
lingual parsing model. Another difference is that,
following Bikel (2004), we do not recalculate the
λi at each iteration, but use the initial values
throughout.
5 A Synchronous TSG-SA for Dialectal
Arabic
Just as the probability model discussed in the pre-
ceding section factored the rewriting probabilities
5
into three parts, we create a synchronous TSG-SA
and the probabilities of a hidden TAG model in
three steps:
• Ps and Psa are the parameters of a monolin-
gual TSG+SA for MSA. We extract a gram-
mar for the resource-rich language (MSA)
from the Penn Arabic Treebank in a pro-
cess described by Chiang and others (Chiang,
2000; Xia et al., 2000; Chen, 2001).
• For the lexical transfer model Pt2, we cre-
ate by hand a probabilistic mapping between
(word, POS tag) pairs in the two languages.
• For the syntactic transfer model Pt1, we cre-
ated by hand a grammar for the resource-poor
language and a mapping between elementary
trees in the two grammars, along with initial
guesses for the mapping probabilities.
We discuss the hand-crafted lexicon and syn-
chronous grammar in the following subsections.
5.1 Lexical Mapping
We used a small, hand-crafted lexicon of 100
words which mapped all LA function words and
some of the most common open-class words to
MSA. We assigned uniform probabilities to the
mapping. All other MSA words were assumed
to also be LA words. Unknown LA words were
handled using the standard unknown word mecha-
nism.
5.2 Syntactic Mapping
Because of the underlying syntactic similarity be-
tween the two varieties of Arabic, we assume that
every tree in the MSA grammar extracted from the
MSA treebank is also a LA tree. In addition, we
de ne tree transformations in the Tsurgeon pack-
age (Levy and Andrew, 2006). These consist of
a pattern which matches MSA elementary trees
in the extracted grammar, and a transformation
which produces a LA elementary tree. We per-
form the following tree transformations on all el-
ementary trees which match the underlying MSA
pattern. Thus, each MSA tree corresponds to at
least two LA trees: the original one and the trans-
formed one. If several transformations apply, we
obtain multiple transformed trees.
• Negation (NEG): we insert a $ negation
marker immediately following each verb.
The preverbal marker is generated by a lex-
ical translation of an MSA elementary tree.
• VSO-SVO Ordering (SVO): Both Verb-
Subject-Object (VSO) and Subject-Verb-
Object (SVO) constructions occur in MSA
and LA treebanks. But pure VSO construc-
tions (without pro-drop) occur in the LA cor-
pus only 10ordering in MSA. Hence, the goal
is to skew the distributions of the SVO con-
structions in the MSA data. Therefore, VSO
constructions are replicated and converted to
SVO constructions. One possible resulting
pair of trees is shown in Figure 5.
• The bd construction (BD): bd is a LA noun
that means ‘want’. It acts like a verb in
verbal constructions yielding VP construc-
tions headed by NN. It is typically followed
by an enclitic possessive pronoun. Accord-
ingly, we de ned a transformation that trans-
lated all the verbs meaning ‘want’/‘need’ into
the noun bd and changed their respective
POS tag to NN. The subject clitic is trans-
formed into a possessive pronoun clitic. Note
that this construction is a combination lexical
and syntactic transformation, and thus specif-
ically exploits the extended domain of local-
ity of TAG-like formalisms. One possible re-
sulting pair of trees is shown in Figure 6.
6 Experimental Results
While our approach does not rely on any annotated
corpus for LA, nor on a parallel corpus MSA-
LA, we use a small treebank of LA (Maamouri et
al., 2006) to analyze and test our approach. The
LA treebank is divided into a development corpus
and a test corpus, each about 11,000 tokens (using
the same tokenization scheme as employed in the
MSA treebank).
We  rst use the development corpus to deter-
mine which of the transformations are useful. We
use two conditions. In the  rst, the input text is
not tagged, and the parser hypothesizes tags. In
the second, the input text is tagged with the gold
(correct) tag. The results are shown in Table 1.
The baseline is simply the application of a pure
MSA Chiang parser to LA. We see that important
improvements are obtained using the lexical map-
ping. Adding the SVO transformation does not
improve the results, but the NEG and BD trans-
formations help slightly, and their effect is (partly)
6








S
VP
N
a1a3a20
‘want’
PRP$
a0
‘my’
Sprime↓ 1
,
S
VP
V
a1a3a39a2a1a4a3
‘want’
Sprime↓ 1








Figure 6: Example elementary tree pair of a synchronous TSG: the BD transformation (LA on left, MSA
on right)
cumulative. (We did not perform these tuning ex-
periments on input without POS tags.)
The evaluation on the test corpus con rms these
results. Using the NEG and BD transformations
and the small lexicon, we obtain a 17.3% error re-
duction relative to the baseline parser (Figure 2).
These results show that the translation lexicon
can be integrated effectively into our synchronous
grammar framework. In addition, some syntac-
tic transformations are useful. The SVO trans-
formation, we assume, turns out not to be useful
because the SVO word order is also possible in
MSA, so that the new trees were not needed and
needlessly introduced new derivations. The BD
transformation shows the importance not of gen-
eral syntactic transformations, but rather of lexi-
cally speci c syntactic transformations: varieties
within one language family may differ more in
terms of the lexico-syntactic constructions used
for a speci c (semantic or pragmatic) purpose than
in their basic syntactic inventory. Note that our
tree-based synchronous formalism is ideally suited
for expressing such transformations since it is lex-
icalized, and has an extended domain of locality.
Given the impact of the BD transformation, in fu-
ture work we intend to determine more lexico-
structural transformations, rather than pure syntac-
tic transformations. However, one major impedi-
ment to obtaining better results is the disparity in
genre and domain which affects the overall perfor-
mance.
7 Conclusion
We have presented a new probabilistic syn-
chronous TAG formalism, synchronous Tree
Substitution Grammar with sister adjunction
(TSG+SA). We have introduced the concept of
a hidden TAG model, analogous to a Hidden
Markov Model. It allows us to parse a resource-
poor language using a treebank-extracted prob-
abilistic grammar for a resource-rich language,
along with a hand-crafted synchronous grammar
for the resource-poor language. Thus, our model
combines statistical approaches with grammar en-
gineering (speci cally motivated from the linguis-
tic facts). Our approach thus exempli es the
speci c strength of a grammar-based approach.
While we have applied this approach to two
closely related languages, it would be interesting
to apply this approach to more distantly related
languages in the future.
Acknowledgments
This paper is based on work done at the 2005
Johns Hopkins Summer Workshop, which was
partially supported by the National Science Foun-
dation under grant 0121285. The  rst author
was additionally supported by ONR MURI con-
tract FCPO.810548265 and Department of De-
fense contract RD-02-5700. The second author
was additionally supported by contract HR0011-
06-C-0023 under the GALE program. We wish
to thank the other members of our JHU team (our
co-authors on (Rambow et al., 2005)), especially
Nizar Habash and Mona Diab for their help with
the Arabic examples, and audiences at JHU for
their useful feedback.

References
Daniel M. Bikel. 2004. On the Parameter Space of
Generative Lexicalized Parsing Models. Ph.D. the-
sis, University of Pennsylvania.
John Chen. 2001. Towards Efficient Statistical Parsing
Using Lexicalized Grammatical Information. Ph.D.
thesis, University of Delaware.
David Chiang and Daniel M. Bikel. 2002. Recovering
latent information in treebanks. In Proceedings of
the Nineteenth International Conference on Compu-
tational Linguistics (COLING), pages 183 189.
David Chiang, Mona Diab, Nizar Habash, Owen Ram-
bow, and Sa ullah Shareef. 2006. Parsing Arabic
dialects. In Proceedings of EACL.
David Chiang. 2000. Statistical parsing with an
automatically-extracted tree adjoining grammar. In
38th Meeting of the Association for Computational
Linguistics (ACL’00), pages 456 463, Hong Kong,
China.
Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara
Cabezas, and Okan Kolak. 2004. Bootstrapping
parsers via syntactic projection across parallel texts.
Natural Language Engineering.
Roger Levy and Galen Andrew. 2006. Tregex and
Tsurgeon: tools for querying and manipulating tree
data structures. In Proceedings of LREC.
Mohamed Maamouri, Ann Bies, and Tim Buckwalter.
2004. The Penn Arabic Treebank: Building a large-
scale annotated Arabic corpus. In NEMLAR Con-
ference on Arabic Language Resources and Tools,
Cairo, Egypt.
Mohamed Maamouri, Ann Bies, Tim Buckwalter,
Mona Diab, Nizar Habash, Owen Rambow, and
Dalila Tabessi. 2006. Developing and using a pilot
dialectal Arabic treebank. In Proceedings of LREC,
Genoa, Italy.
Owen Rambow, K. Vijay-Shanker, and David Weir.
2001. D-Tree Substitution Grammars. Computa-
tional Linguistics, 27(1).
Owen Rambow, David Chiang, Mona Diab, Nizar
Habash, Rebecca Hwa, Khalil Sima’an, Vincent
Lacey, Roger Levy, Carol Nichols, and Sa ullah
Shareef. 2005. Parsing Arabic dialects. Final Re-
port, 2005 JHU Summer Workshop.
Yves Schabes and Stuart Shieber. 1994. An alternative
conception of tree-adjoining derivation. Computa-
tional Linguistics, 1(20):91 124.
Yves Schabes. 1990. Mathematical and Computa-
tional Aspects of Lexicalized Grammars. Ph.D. the-
sis, Department of Computer and Information Sci-
ence, University of Pennsylvania.
Stuart B. Shieber. 1994. Restricting the weak genera-
tive capacity of Synchronous Tree Adjoining Gram-
mar. Computational Intelligence, 10(4):371 385.
David A. Smith and Noah A. Smith. 2004. Bilingual
parsing with factored estimation: Using English to
parse Korean. In Proceedings of the 2004 Confer-
ence on Empirical Methods in Natural Language
Processing (EMNLP04).
Fei Xia, Martha Palmer, and Aravind Joshi. 2000. A
uniform method of grammar extraction and its appli-
cations. In Proceedings of the 2000 Conference on
Empirical Methods in Natural Language Processing
(EMNLP00), Hong Kong.
