Proceedings of the COLING/ACL 2006 Student Research Workshop, pages 73–78,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Extraction of Tree Adjoining Grammars from a Treebank for Korean  
 
 
 Jungyeul Park
UFR Linguistique  
Laboratoire de linguistique formelle 
Université Paris VII - Denis Diderot 
jungyeul.park@linguist.jussieu.fr 
 
 
  
 
Abstract 
We present the implementation of a system 
which extracts not only lexicalized gram-
mars but also feature-based lexicalized 
grammars from Korean Sejong Treebank. 
We report on some practical experiments 
where we extract TAG grammars and tree 
schemata. Above all, full-scale syntactic 
tags and well-formed morphological analy-
sis in Sejong Treebank allow us to extract 
syntactic features. In addition, we modify 
Treebank for extracting lexicalized gram-
mars and convert lexicalized grammars into 
tree schemata to resolve limited lexical 
coverage problem of extracted lexicalized 
grammars. 
1 Introduction  
An electronic grammar is an interface between the 
complexity and the diversity of natural language 
and the regularity and the effectiveness of a lan-
guage processing, and it is one of the most impor-
tant elements in the natural language processing. 
Since traditional manual grammar development is 
a time-consuming and labor-intensive task, many 
efforts for automatic and semi-automatic grammar 
development have been taken during last decades.  
Automatic grammar development means that a 
system extracts a grammar from a Treebank which 
has an implicit Treebank grammar. The grammar 
extraction system takes syntactically analyzed sen-
tences as an input and produces a target grammar. 
The extracted grammar would be same as the 
Treebank grammar or be different depending on 
the user’s specific purpose. The automatically ex-
tracted grammar has the advantage of the coher-
ence of extracted grammars and the rapidity of its 
development. However, as it always depends on 
the Treebank which the extraction system uses, its 
coverage could be limited to the scale of a Tree-
bank. Moreover, the reliable Treebank would be 
hardly found, especially in public domain.  
Semi-automatic grammar development means 
that a system generates the grammar using the de-
scription of the language-specific syntactic (or lin-
guistic) variations and its constraints. A meta-
grammar in Candito (1999) and a tree description 
in Xia (2001) are good examples of a semi-
automatic grammar development. Even using 
semi-automatic grammar development, we need 
the good description of linguistic phenomena for 
specific language which requires very high level 
knowledge of linguistics and the semi-
automatically generated grammars would easily 
have an overflow problem. 
Since we might extract the grammar automati-
cally without many efforts if a reliable Treebank is 
provided, in this paper we implement a system 
which extracts a Lexicalized Tree Adjoining 
Grammar and a Feature-based Lexicalized Tree 
Adjoining Grammar from Korean Sejong Treebank 
(SJTree). SJTree contains 32,054 eojeols (the unity 
of segmentation in the Korean sentence), that is, 
2,526 sentences. SJTree uses 43 part-of-speech 
tags and 55 syntactic tags.  
Even though there are many previous works for 
extracting grammars from a Treebank, extracting 
syntactic features is tried for the first time. 55 full-
scale syntactic tags and well-formed morphologi-
cal analysis in SJTree allow us to extract syntactic 
features automatically and to develop FB-LTAG. 
73
First, we briefly present features structures 
which are focused on FB-LTAG and other previ-
ous works for extracting a grammar from a Tree-
bank. Then, we explain our grammar extraction 
scheme and report experimental results. Finally, 
we discuss the conclusion. 
2 Feature structures and previous works 
on extracting grammars from a Tree-
bank  
A feature structure is a way of representing gram-
matical information. Formally feature structure 
consists of a specification of a set of features, each 
of which is paired with a particular value (Sag et 
al., 2003). In a unification frame, a feature struc-
ture is associated with each node in an elementary 
tree (Vijay-Shanker and Joshi, 1991). This feature 
structure contains information about how the node 
interacts with other nodes in the tree. It consists of 
a top part, which generally contains information 
relating to the super-node, and a bottom part, 
which generally contains information relating to 
the sub-node (Han et al., 2000).  
In FB-LTAG, the feature structure of a new 
node created by substitution inherits the union of 
the features of the original nodes. The top feature 
of new node is the union of the top features (f
1
 ∪ f) 
of the two original nodes, while the bottom feature 
of the new node is simply the bottom feature (g
1
) 
of the top node of the substituting tree since the 
substitution node has no bottom feature as shown 
in Figure 1.  
 
YX
Y↓
X
Y
t:f
1
b:g
1
t:f
t:f
1 
∪ f
b:g
1
→
 
Figure 1. Substitution in FB-LTAG 
 
The node being adjoined into splits and its top fea-
ture (f) unifies with the top feature (f
1
) of the root 
adjoining node, while its bottom feature (g) unifies 
with the bottom feature (g
2
) of the foot adjoining 
node as shown in Figure 2.  
 
X
Y
Y*
→
t:f
1
b:g
1
t:f
2
b:g
2
Y
t:f
b:g
X
Y
Y
t:f
1 
∪ f
b:g
1
t:f
2
b:g
2
∪ g
 
Figure 2. Adjunction in FB-LTAG 
Several works for extracting grammars, especially 
for TAG formalism are proposed. Chen (2001) 
extracted lexicalized grammars from English Penn 
Treebank and there are other works based on 
Chen’s procedure such as Johansen (2004) and 
Nasr (2004) for French and Habash and Rambow 
(2004) for Arabic. Chiang (2000) used Tree Inser-
tion Grammars, one variation of TAG formalism 
for his extraction system from English Penn Tree-
bank. Xia et al. (2000) developed the uniform 
method of a grammar extraction for English, Chi-
nese and Korean. Neumann (2003) extracted Lexi-
calized Tree Grammars from English Penn 
Treebank for English and from NEGRA Treebank 
for German. As mentioned above, none of these 
works tried to extract syntactic features for FB-
LTAG. 
3 Grammar extraction scheme  
Before extracting a grammar automatically, we 
transform the bracket structure sentence in SJTree 
into a tree data structure. Afterward, using depth-
first algorithm for a tree traverse, we determine a 
head and the type of operations (substitution or 
adjunction) for children nodes of the given node if 
the given node is a non-terminal node.  
3.1 Determination of a head  
For the determination of a head, we assume the 
right-most child node as a head among its sibling 
nodes in end-focus languages like Korean. For in-
stance, the second NP is marked as a head in [NP 
NP] composition while the first NP is marked for 
adjunction operation for the extracted grammar G
1
 
which uses eojeols directly without modification of 
SJTree (see the section 4 for the detail of extrac-
tion experiments). Likewise, in [VP@VV 
VP@VX] composition where the first VP has a 
VV (verb) anchor and the last VP has a VX (auxil-
iary verb) anchor, a principal verb in the first VP 
could be marked for adjunction operation and an 
auxiliary verb in the second VP would be a head, 
that is, the extracted auxiliary verb tree has every 
argument of whole sentence. This phenomenon 
could be explained by argument composition. 
Head nodes of the extracted grammar for a verb 
balpyoha.eoss.da (‘announced’) in (1) are in bold 
face in Figure 3 which represents bracketed sen-
tence structure in SJTree  
 
74
(1) 일본  외무성은  즉각  해명  성명을  발표했다 . 
 ilbon oimuseong.eun  
 Japan ministy_of_foreign_affairs.Nom 
 jeukgak  haemyeng  
 immediately elucidation 
 seongmyeng.eul balpyo.ha.eoss.da 
 declaration.Acc announce.Pass.Ter 
 ‘The ministry of foreign affairs in Japan im-
mediately announced their elucidation.’ 
  
(S (NP_SBJ (NP ilbon/NNP) 
  (NP_SBJ oimuseong/NNG+eun/JX)) 
 (VP (AP jeukgak/MAG) 
  (VP (NP_OBJ (NP haemyeng/NNG) 
                        (NP_OBJ seonmyeng/NNG+eul/JKO)) 
   (VP balpyo/NNG+ha/XSV+eoss/EP+da/EF+./SF)))) 
Figure 3. Bracketed sentence in SJTree for (1) 
3.2 Distinction between substitution and ad-
junction operations  
Unlike other Treebank corpora such as English 
Penn Treebank and French Paris 7 Treebank, full-
scale syntactic tags in SJTree allow us to easily 
determine which node would be marked for substi-
tution or adjunction operations. Among 55 syntac-
tic tag in SJTree, nodes labeled with NP (noun 
phrase), S (sentence), VNP (copular phrase) and 
VP (verb phrase) which end with _CMP (attribute), 
_OBJ (object), and _SJB (subject) would be 
marked for substitution operation, and nodes la-
beled with the other syntactic tags except a head 
node would be marked for adjunction operation. In 
this distinction, some VNP and VP phrases might 
be marked for substitution operation, which means 
that VNP and VP phrases are arguments of a head, 
because SJTree labels VNP and VP instead of NP 
for the nominalization forms of VNP and VP. In 
Figure 4, for example, NP_SBJ and NP_OBJ 
nodes are marked for substitution operation and 
AP node is marked for adjunction operation.  
Children nodes marked for substitution opera-
tion are replace by substitution terminal nodes (e.g. 
NP_SBJ↓ ) and calls recursively the extraction pro-
cedure with its subtree where a root node is the 
child node itself. Children nodes marked for ad-
junction operation are removed from the main tree 
and also calls recursively the extraction procedure 
with its subtree where we add its parent node of a 
given child node as a root node and a sibling node 
as a foot node (e.g. VP*). As defined in the TAG 
formalism, the foot node has the same label as the 
root node of the subtree for an adjunction operation.  
 
 
3.3 Reducing trunk  
Extracted grammars as explained above are not 
always “correct” TAG grammar. Since nodes 
marked for adjunction operation are removed, 
there remain intermediate nodes in the main tree. 
In this case, we remove these redundant nodes. 
Figure 4 shows how to remove the redundant in-
termediate nodes from the extracted tree for a verb 
balpyoha.eoss.da (‘announced’) in (1).  
 
VP
NP_SBJ ↓ VP
S
NP_OBJ ↓ VP
balpyoha.eoss.da
VPNP_SBJ ↓
S
NP_OBJ ↓ VP
balpyoha.eoss.da
→
 
Figure 4. Removing redundant intermediate nodes 
from extracted trees 
3.4 Extracting features  
55 full-scale syntactic tags and morphological 
analysis in SJTree allow us to extract syntactic fea-
tures automatically and to develop FB-LTAG. 
Automatically extracted FB-LTAG grammars 
eventually use reduced tagset because FB-LTAG 
grammars contain their syntactic information in 
features structures. For example, NP_SBJ syntactic 
tag in LTAG is changed into NP and a syntactic 
feature <case=subject> is added. Therefore, we use 
actually 13 reduced tagset for FB-LTAG gram-
mars. From full-scale syntactic tags which end 
with _SBJ (subject), _OBJ (object) and _CMP (at-
tribute), we extract <case> features which describe 
argument structures in the sentence.  
Alongside <case> features, we also extract 
<mode> and <tense> from morphological analyses 
in SJTree. Since however morphological analyses 
for verbal and adjectival endings in SJTree are 
simply divided into EP, EF and EC which mean 
non-final endings, final endings and conjunctive 
endings, respectively, <mode> and <tense> fea-
tures are not extracted directly from SJTree. In this 
paper, we analyze 7 non-final endings (EP) and 77 
final endings (EF) used in SJTree to extract auto-
matically <mode> and <tense> features. In gen-
eral, EF carries <mode> inflections, and EP carries 
<tense> inflections. Conjunctive endings (EC) are 
not concerned with <mode> and <tense> features 
and we only extract <ec> features with its string 
value. <ef> and <ep> features are also extracted 
75
with their string values. Some of non-final endings 
like si are extracted as <hor> features which have 
honorary meaning. In extracted FB-LTAG gram-
mars, we present their lexical heads in a bare in-
finitive with morphological features such as <ep>, 
<ef> and <ec> which make correspond with its 
inflected forms.  
<det> is another automatically extractable fea-
ture in SJTree and it is extracted from both syntac-
tic tag and morphological analysis unlike other 
extracted features. For example, while <det=-> is 
extracted from dependant nouns which always 
need modifiers (extracted by morphological analy-
ses), <det=+> is extracted from _MOD phrases 
(extracted by syntactic tags). From syntactic tag 
DP which contains MMs (determinative or demon-
strative), <det=+> is also extracted
1
.  
The actual procedure of feature extraction is im-
plemented by 2 phases. In the first phase, we con-
vert syntactic tags and morphological analysis into 
feature structure as explained above. In the second 
phase, we complete feature structure onto nodes of 
dorsal spine. For example, we put the same feature 
of VV bottom onto VV top, VP top/bottom and S 
bottom because nodes in dorsal spine share certain 
number of feature of VV bottom. The initial tree 
for a verb balpyoha.eoss.da is completed like Fig-
ure 5 for a FB-LTAG (see Park (2006) for details).  
                                                           
1
 Korean does not need features <person> as in English and 
<gender > or <number> as in French. Han et al. (2000) pro-
posed several features for Korean FBLTAG which we do not 
use in this paper, such as <adv-pp>, <top> and < aux-pp> for 
nouns and <clause-type> for predicates. While postpositions 
are separated from eojeol during our grammar extraction pro-
cedure, Han el al. considered them as “one” inflectional mor-
phology of noun phrase eojeol. As we will explain the reason 
why we separate postpositions from eojeol in the section 4, the 
separation of postpositions would be much efficient for the 
lexical coverage of extracted grammars. In Han et al. <adv-
pp> simply contains string value of adverbial postpositions. 
<aux-pp> adds semantic meaning of auxiliary postpositions 
such as only, also etc. which we can not extract automatically 
from SJTree or other Korean Treebank corpora because syn-
tactically annotated Treebank corpora generally do not contain 
such semantic information. <top> marks the presence or ab-
sence of a topic marker in Korean like neun, however topic 
markers are annotated like a subject in SJTree which means 
that only <case=subject> is extracted for topic markers. 
<clause-type> indicates the type of the clause which has its 
values such as main, coord(inative), subordi(native), ad-
nom(inal), nominal, aux-connect. Since the distinction of the 
type of the clause is very vague except main clause in Korea, 
we do not adopt this feature. Instead <ef> is extracted if a 
clause type is a main clause and <ec> is extracted for other 
type.  
S
NP↓ VP
VPNP↓
VV
balpyoha
b: <ep> = eoss
b: <ef> = da
b: <mode> = decl
b: <tense> = past
t:  <ep> = x, <ef> = y, <mode> = i, <tense> = j
t:  <ep> = x, <ef> = y, <mode> = i, <tense> = j
b: <ep> = x, <ef> = y, <mode> = i, <tense> = j
t:  <ep> = x, <ef> = y, <mode> = i, <tense> = j
b: <ep> = x, <ef> = y, <mode> = i, <tense> = j
t:  -
b: <ep> = x, <ef> = y, <mode> = i, <tense> = j
<cas> = nom
<det> = +
<cas> = acc
<det> = +
 
Figure 5. Extracted FB-LTAG grammar for 
balpyoha.eoss.da (‘announced’) 
4 Extraction experiments and results   
4.1 Extraction of lexicalized trees  
In this paper, we extract not only lexicalized trees 
without modification of a Treebank, but also ex-
tract grammars with modifications of a Treebank 
using some constraints to improve the lexical cov-
erage in extracted grammars. 
 
• G
1
: Using eojeols directly without modifi-
cation of SJTree. 
• G
2
: Separating symbols and postpositions 
from eojeols. Separated symbols are ex-
tracted and divided into α  and β  trees 
based on their types. Every separated post-
position is α  tree. Complex postpositions 
consisted of two or more postpositions are 
extracted like one α  tree
2
. Finally, convert-
ing NP β  trees into α  trees and removing 
syntactic tag in NP α  trees. 
 
Figure 6 and 7 show extracted lexicalized gram-
mars G
1
 and G
2
 from (1) respectively. Theoreti-
cally extracting order is followed by word order in 
the sentence. 
 
 
VP
AP VP*
jeukgak/MAG
β
3
:
S
NP_SBJ↓ VP
VPNP_OBJ↓
α
3
:
NP_SBJ
β
1
:
oimuseong/NNG
+eun/JX
α
1
:
seongmyeng/NNG
+eul/JKO
balpyo/NNG+ 
ha/XSV+eoss/EP
+da/EF+./SF
NP_SBJ*
NP_SBJ
NP_OBJ
β
2
: α
2
:
NP_OBJ*
NP_OBJ
haemyeng/NNG
NP
ilbon/NNP 
NP
 
Figure 6. Extracted lexicalized grammars G
1
 
                                                           
2
 For extracting trees of symbols and of postposition, we 
newly add SYM and POSTP syntactic tags which SJTree does 
not use. See Figure 11 for extracted symbol and postposition 
trees. 
76
VP
AP VP*
jeukgak/MAG
β
1
:
S
NP_SBJ↓ VP
VPNP_OBJ↓
α
5
:
POSTPNP_SBJ↓
NP_SBJ
eun/JX
α
6
:
POSTPNP_OBJ↓
NP_OBJ
eul/JKO
α
7
:
ilbon/NNP
NP
α
1
:
oimuseong/NNG
NP
α
2
:
haemyeng/NNG
NP
α
3
:
seongmyeng/NNG
NP
α
4
:
SYMS*
S
.
SF
β
2
:β
balpyo/NNG+ 
ha/XSV+eoss/EP
+da/EF
 
Figure 7. Extracted lexicalized grammars G
2
  
4.2 Extraction of feature-based lexicalized 
trees 
We extract feature-based lexicalized trees using 
reduced tagset because FB-LTAG grammars con-
tain their syntactic information in features struc-
tures. Extracted grammars G
3
 remove syntactic 
tags, eventually use reduced tagset, add extracted 
feature structures and use infinitive forms as lexi-
cal anchor.  
 
• G
3
: Using reduced tagset and a lexical an-
chor is an infinitive and adding extracted 
feature structures.   
 
G
3
 row in Table 1 below shows the results of ex-
traction procedures above. Figure 8 shows ex-
tracted feature-based lexicalized grammars G
3
 
from (1) 
VP
ADVP VP*
jeukgak
ADV
β
1
:β
POSTPNP↓
NP
eun
JX
α
6
:
POSTPNP↓
NP
eul
JKO
α
7
:
ilbon
NP
α
1
:
NNP
α
haemyeng
NP
α
3
:
NNG
seongmyeng
NP
α
4
:
NNG
SYMS*
S
.
SF
β
2
:
S
NP↓ VP
VPNP↓
VV
balpyoha
<cas> = nom
<det> = +
<cas> = acc
<det> = +
b: <ep> = eoss
b: <ef> = da
b: <mode> = decl
b: <tense> = past
<cas> = x
oimuseong
NP
α
2
:
NNG
<cas> = x <cas> = x <cas> = x
<cas> = nom <cas> = acc
<cas> = x <cas> = x
α
5
:
 
Figure 8. Extracted feature-based lexicalized 
grammars G
3 
3
.  
 
 # of ltrees 
(lexicalized tree) 
Average frequen-
cies per ltrees
G
1
 18,080 1.38
G
2
 15,551 2.57
G
3
 12,429 3.21
Table 1. Results of experiments in extracting lexi-
calized and feature-based lexicalized grammars 
                                                           
3
 To simplify the figure, we note only feature structure which 
is necessary to understand.  
4.3 Extraction of tree schemata 
As mentioned in the Introduction, one of the most 
serious problems in automatic grammar extraction 
is its limited lexical coverage. To resolve this prob-
lem, we enlarge our extracted lexicalized gram-
mars using templates which we call tree schemata. 
The lexical anchor is removed from extracted 
grammars and anchor mark is replaced to form tree 
schemata (for example, @NNG where the lexical-
ized anchor in extracted lexicalized grammars is a 
common noun). The number of tree schemata is 
much reduced against that of lexicalized grammars. 
Table 2 shows the number of template trees and 
the average frequency for each template grammars. 
T
1
 means G
1
’s tree schemata. 
 
 # of tree schemata Average frequencies 
per tree schemata
T
1
 1,158 21.55
T
2
 1,077 37.05
T
3
 385 103.65
Table 2. Results of experiments in converting 
template grammars 
5 Evaluations 
First of all, the lexical coverage for G
1
 and G
2
 is 
tested on the part of Sejong corpus which contains 
about 770,000 “morphologically analyzed” eojeols. 
After modification of SJTree, the extracted gram-
mar G
2
 is increased to 17.8 % compared with G
1
 
for its lexical coverage. G
2
 and G
3
 have same lexi-
cal coverage since they have same lexical entries. 
 
Extracted grammars in this paper are evaluated 
by its size and its coverage. The size of grammars 
means tree schemata according to the number of 
sentences as shown in Figure 9. The coverage of 
grammar is the number of occurrences of unknown 
tree schemata in the corpus by the total occur-
rences of tree schemata as shown in Table 3.  
 
 
(a) Threshold =1  (b) Threshold =2 
Figure 9. The size of grammars 
 
 
77
 Threshold = 1 Threshold = 2
G
1
 0.9326 0.9591
G
2
 0.9326 0.9525
G
3
 0.9579 0.9638
Table 3. Coverage of grammars: 90% of training 
set (2,273 sentences) and 10% of test set (253 sen-
tences) 
 
We manually overlap our 163 tree schemata for 
predicates from T
3
, which contain 14 subcategori-
zation frames with 11 subcategorization frames of 
a FB-LTAG grammar proposed in Han et al. 
(2000) to evaluate the coverage of hand-crafted 
grammars
4
. Our extracted template grammars 
cover 72.7 % of their hand-crafted subcategoriza-
tion frames
5
.  
6 Conclusion 
In this paper, we have presented a system for 
automatic grammar extraction that produces lexi-
calized and feature-based lexicalized grammars 
from a Treebank. Also, to resolve the problem of 
limited lexical coverage of extracted grammars, we 
separated symbols and postposition, and then con-
verted these grammars into template grammars. 
Extracted grammars and lexical-anchor-less tem-
plate grammars might be used for parsers to ana-
lyze the Korean sentences and frequency 
information might be used to remove ambiguities 
among possible syntactic analyses of parsers. 
References  
Candito, Marie-Hélène. 1999. Organisation modulaire 
et paramétrable de grammaire électronique lexicali-
sées. Ph.D. thesis, Université Paris 7. 
                                                           
4
 Our extracted tree schemata contain not only subcategoriza-
tion frames but also some phenomena of syntactic variations, 
the number of lexicalized trees and the frequency information 
while Han el al. (2000) only presents subcategorization frames 
and some phenomena.  
5
 Three subcategorization frames in Han el al. (2000) which 
contain prepositional phrases are not covered by our extracted 
tree schemata. Generally, prepositional phrases in SJTree are 
labeled with _AJT which is marked for adjunction operation.  
Since there is no difference between noun adverbial phrase 
and prepositional phrases in SJTree like [S na.neun [NP_AJT 
ojeon.e ‘morning’] [NP_AJT hakgyo.e ‘to school’] ga.ss.da] (‘I 
went to school this morning’), we do not consider _AJT 
phrases as arguments.  
Chen, John. 2001. Towards Efficient Statistical Parsing 
Using Lexicalized Grammatical Information. Ph.D. 
thesis, University of Delaware. 
Chiang, David. 2000. Statistical Parsing with an Auto-
matically-Extracted Tree Adjoining Grammar. In 
Data Oriented Parsing, CSLI Publication, pp. 299-
316. 
Habash, Nizar and Owen Rambow. 2004. Extracting a 
Tree Adjoining Grammar from the Penn Arabic 
Treebank. In Proceedings of Traitement Automatique 
du Langues Naturelles (TALN-04). Fez, Morocco, 
2004. 
Han, Chunghye, Juntae Yoon, Nari Kim, and Martha 
Palmer. 2000. A Feature-Based Lexicalized Tree Ad-
joining Grammar for Korean. IRCS Technical Re-
port 00-04. University of Pennsylvania. 
Johansen, Ane Dybro. 2004. Extraction des grammaires 
LTAG à partir d’un corpus étiquette syntaxiquement. 
DEA mémoire, Université Paris 7. 
Nasr, Alexis. 2004. Analyse syntaxique probabiliste 
pour grammaires de dépendances extraites automa-
tiquement. Habilitation à diriger des recherches, Uni-
versité Paris 7. 
Neumann, Günter. 2003. A Uniform Method for Auto-
matically Extracting Stochastic Lexicalized Tree 
Grammar from Treebank and HPSG, In A. Abeillé 
(ed) Treebanks: Building and Using Parsed Corpora, 
Kluwer, Dordrecht. 
Park, Jungyeul. 2006. Extraction d’une grammaire 
d’arbres adjoints à partir d’un corpus arboré pour le 
coréen. Ph.D. thesis, Université Paris 7. 
Sag, Ivan A., Thomas Wasow, and Emily M. Bender. 
2003. Syntactic Theory: A Formal Introduction, 2nd 
ed. CSLI Lecture Notes. 
Vijay-Shanker, K. and Aravind K. Joshi. 1991. Unifica-
tion Based Tree Adjoining Grammar, in J. Wedekind 
ed., Unification-based Grammars, MIT Press, Cam-
bridge, Massachusetts. 
Xia, Fei, Martha Palmer, and Aravind K. Joshi. 2000. A 
Uniform Method of Grammar Extraction and Its Ap-
plication. In The Joint SIGDAT Conference on Em-
pirical Methods in Natural Language Processing and 
Very Large Corpora (EMNLP/VLC-2000), Hong 
Kong, Oct 7-8, 2000.  
Xia, Fei. 2001. Automatic Grammar Generation from 
Two Different Perspectives. Ph.D. thesis, University 
of Pennsylvania, PA. 
 
78
