ABL: Alignment-Based Learning 
Menno van Zaanen 
School of Computer Studies 
University of Leeds 
LS2 9JT L(~eds 
UK 
menno@scs, leeds, ac. uk 
Abstract 
This \])al)er introdu(:es a new tyl)e of grammar 
learning algorithm, iilst)ired l)y sl;ring edit dis- 
tance (Wagner and Fis(:her, 1974). The algo- 
rithm takes a (:ortms of tlat S(~lltell(:es as input 
and returns a (:ortms of lat)elled, l)ra(:ket(~(1 sen-~ 
ten(:(~s. The method works on 1)airs of unstru(:- 
tllr(?(l SelltellC(~,s that have one or more words in 
(:onunon. W\]lc, ll two senten('es are (tivided into 
parts that are the same in 1)oth s(mten(:es and 
parl;s tha|; are (litl'erent, this intbrmation is used 
to lind l)arl;s that are interchangeal)le. These 
t)arts are taken as t)ossil)le (:onstituents of the 
same tyl)e. After this alignment learning stel) , 
the selection learning stc l) sel(~('ts the most l)rot)- 
at)le constituents from all 1)ossit)le (:onstituents. 
This m(;thod was used to t)ootstrat) structure 
(m the ATIS (:ortms (Mar(:us et al., 1f)93) and 
on the OVIS ~ (:ort)us (Bommma et ~d., 1997). 
While the results are en(:om:aging (we ol)|;ained 
Ul) to 89.25 % non-crossing l)ra(:kets precision), 
this 1)at)er will 1)oint out some of the shortcom- 
ings of our at)l)roa(:h and will suggest 1)ossible 
solul;ions. 
1 Introduction 
Unsupervised learning of syntactic structure is 
one of the hardest 1)rol)lems in NLP. Although 
people are adept at learning grammatical struc- 
ture, it is ditficult to model this 1)recess and 
therefore it is hard to make a eomtmter learn 
strllCtllre. 
We do not claim that the algorithm described 
here models the hmnan l)rocess of  
learning. Instead, the algorithm should, given 
unstructured sentences, find the best structure. 
This means that the algorithm should assign 
1Opcnbam" Vcrvoer hfformatie Systeeln (OVIS) 
stands for Pul)lic Transt)ort hfformation System. 
sl;ru('ture to sentences whi(:h are similar to the 
,~;tru(:ture peot)le would give to sentences, lint 
not ne(:essarily in the same |lille or Sl);~(;e l'e- 
strictions. 
The algorithm (:onsists of two t)hases. The 
tirst t)hase is a constituent generator, whi(:\]l gen- 
erates a m()tiw~ted set of possible constituents 
1)y aligning sentenc(:s. The se(:ond i)hase re- 
stri(:ts tllis set l)y selecting the best constituents 
from the set. 
The rest of this t)aper is organized as ibl- 
lows. Firstly, we will start t)y describing l)revi- 
ous work in machine learning of  stru(:- 
ture and then we will give a descrit)tion of the 
ABL algorithm. Next, some results of al)t)lying 
the ABL algorithm to different corpora will 1)e 
given, followed 1)y a discussion of the algorithm 
alia flltllre resear(;h. 
2 Previous Work 
I,e;wning metl,o(ls can t)e grouped into suitor- 
vised and unsut)ervised nmthods. Sul)ervised 
methods are initial|seal with structured input 
(i.e. stru(:ture(\] sent(m(:es for grannnar learning 
methods), while mlsut)ervised methods learn l)y 
using mlstru(:tured data only. 
In 1)ractice, SUl)ervised methods outpertbrm 
mlsut)ervised methods, since they can adapt 
their output based on the structured exami)les 
in the initial|sat|on t)hase whereas unSUl)ervised 
lnethods emmet. However, it is worthwhile 
to investigate mlsupcrvised gramlnar learning 
methods, since "the costs of annotation are pro- 
hibitively time and ext)ertise intensive, and the 
resulting corpora may 1)e too suscet)tible to re- 
stri(:tion to a particular domain, apt)lication, or 
genre". (Kehler and Stolcke, 1.999) 
There have 1)een several approaches to the un- 
supervised learning of syntactic structures. We 
will give a short overview here. 
961 
Memory based learifing (MBL) keeps track of 
possible contexts and assigns word types based 
on that information (Daelemans, 1995). Red- 
ington et al. (1998) present a method that 
bootstraps syntactic categories using distribu- 
tional information and Magerman and Marcus 
(1990) describe a method that finds constituent 
boundaries using mutual information values of 
the part of speech n-grams within a sentence. 
Algorithms that use the minimmn description 
length (MDL) principle build grammars that 
describe the input sentences using the minimal 
nunfl)er of bits. This idea stems from intbrnm- 
tion theory. Examples of these systems can be 
found in (Grfinwald, 1994) and (de Marcken, 
1996). 
The system by Wolff (1982) pertbrms a 
heuristic search while creating and Inerging 
symbols directed by an evaluation function. 
Chen (1.995) presents a Bayesian grammar in- 
duction method, which is tbllowed by a post- 
pass using the inside-outside algorithm (Baker, 
1979; Lari and Young, 1990). 
Most work described here cmmot learn com- 
plex structures such as recursion, while other 
systems only use limited context to find con- 
stituents. However, the two phases in ABL 
are closely related to some previous work. 
Tim alignment learning phase is etlb.ctively a 
compression technique comparat)le to MDL or 
Bayesian grammar induction methods. ABL 
remembers all possible constituents, building 
a search space. The selection h;arning phase 
searches this space, directed by a probabilistic 
evaluation function. 
3 Algorithm 
We will describe an algorithm that learns struc- 
ture using a corpus of plain (mlstructured) sen- 
tences. It does not need a structured train- 
ing set to initialize, all structural information 
is gathered from the unstructured sentences. 
The output of the algorithm is a labelled, 
bracketed version of the inlmt corpus. Although 
the algorithm does not generate a (context-fl'ee) 
grammar, it is trivial to deduce one from the 
structured corpus. 
The algorithm builds on Harris's idea (1951) 
that states that constituents of the same type 
can be replaced by each oth, er. Consider the sen- 
Wh, at is a family fare 
Wh, at is th, e payload of an African Swallow 
Wh, at is & family fare)x 
Wh, at is (the payload of an African Swallow)x 
Figure 1: Example bootstrapping structure 
For each sentence sl in the corpus: 
For every other sentence s2 in the corpus: 
Align s~ to s2 
Find the identical and distinct parts 
between s~ and s2 
Assign non-terminals to the constituents 
(i.e. distinct parts of s~ and s2) 
Figure 2: Alignment learning algorithm 
fences as shown in figure 1. 2 The constituents a 
.family fare and the payload of an African Swal- 
low both have the same syntactic type (they 
are both NPs), so they can be replaced by each 
other. This means that when the constituent in 
the first sentence is replaced by the constituent 
in the second sentence, the result is a wflid sen- 
tence in the ; it is the second sentence. 
The main goal of the algorithm is to estab- 
lish that a family .fare and the payload of art, 
African Swallow are constituents and have the 
same type. This is done by reversing Harris's 
idea: 'i1" (a group o.f) words car-,, be; replaced by 
each other, they are constituents and h.ave th, e 
same type. So the algorithm now has to find 
groups of words that can be replaced by each 
other and after replacement still generate valid 
sentences. 
The algorithm consists of two steps: 
1. Alignment Leanfing 
2. Selection Learning 
3.1 Alignment Learning 
The model learns by comparing all sentences 
in the intmt corpus to each other in pairs. An 
overview of the algorithm can be tbund in fig- 
ure 2. 
Aligning sentences results in "linking" iden- 
tical words in the sentences. Adjacent linked 
words are then grouped. This process reveals 
2All sentences in the examlfles can be fbund in the 
ATIS corlms. 
962 
.f,'o,,,. Sa,,. F,'a,.ci.,'co (to Dallas).  
./'rout (Dallas to)| San Francisco 02 
(Sa,, l.o)  Dallas 02 
O, DaUas #o Sa,,. J';'a,.cisco)2 
• \[;1"0 ~II, 
.fF()Ii't 
(San Francisco), to (Dallas)2 
(Dalla.gj to (Sa,,. 
Figure 3: Ambiguous alignments 
1;t1(; groul)S of identical words, 1)ut it also llIlC()v- 
ers the groups of distinct wor(ls in the sentences. 
In figure 1 What is is the identical part of the 
sentences and a fam, ily J'a~v, and the payload of 
an A./ricau, Swallow are the distinct l)arts. The 
distinct parts are interchangeable, so they are 
(tetermilmd to 1)e constituents o17 the same I;yl)e. 
We will now Cxl)lain the stel)s in the align- 
men| learning i)hase in more de, tail. 
3.1.1 Edit Distance 
q\[b find the identi(:al word grouI)S in |;he sen- 
tences, we use the edit; distan(:e algorithm by 
Wagner and Fischer (197d:), which finds the 
minimum nmnl)er of edit operations (insertion, 
(lelei;ion and sul)stii;ul;ion) l;o change one sen- 
te, nce into the other, ld(mti(:al wor(ts in the sen- 
t(races can 1)e t'(mnd at \])\]a(;es W\]l(~,l'e lie edit op- 
eration was al)plied. 
The insl;antia,tiol~ of the algoril;hm that fin(is 
l;}le longest COllllllOll Slll)S(}(\]ll(}ll(;( ~, ill two Sell- 
tences sometimes "links" words that are, too 
far apart, in figure 3 when: 1)esides the o(:cm'- 
rences of.from,, the ocem:rences of San }4"au, ci.sco 
or Dallas are linked, this results in unintended 
constituents. We woukt r;d;her have the lnodel 
linking to, resulting in a sl;1"u(;I;llre with the 1101111 
phrases groul)ed with the same type corre(:tly. 
Linking San Francisco or Dallas results i~l 
constituents that vary widely in size. This stems 
from the large distance between the linked 
urords in the tirsi; sentence mid in th(; s(:cond 
sentence. This type of alignlnent can t)e ruled 
out by biasing the cost fimction using distances 
between words. 
3.1.2 Grouping 
An edit distance algorithm links identical words 
in two sentences. When adjacent wor(ls are 
linked in l)oth sentences, they can l)e grouped. 
A groul) like this is a part of a senten(:e that can 
also be tbmM in the other sentence. (In figure 1, 
What is is a group like this.) 
The rest of the sentences can also be grouped. 
The words in these grout)s arm words that are 
distinct in the two sentences. When all of these 
groups fl:om sentence, one would 1)e relflaced by 
the respective groups of sentence two, sentence 
two is generated. (a family fare and th, c pay- 
load of an African Swallow art: of this type of 
group in figure 1.) Each pair of these distinct 
groups consists of possilfle constil;uents Of the 
same type. :~ 
As can be, seen in tigure 3, it is possible that 
empty groups can lm learned. 
a.l.a Existing Constituents 
At seine 1)oint it may be t)ossible that the model 
lem'ns a co11stituent that was already stored. 
This may hal)l)en when a new sentence is com- 
pared to a senlaen(;e in the partially structured 
corpus. In this case,, no new tyl)e, is intro(hu:ed~ 
lint the, consti|;ucnl; in l;he new sentence gel;s l;he 
same type of the constituent in the sentence in 
the partially structm:ed corpus. 
It may even t)e the case that a partially si;ruc- 
tured sentence is compared to another partially 
sl;rtlctllre(1 selll;elR,e. This occm:s whel~ a s(:n- 
fence that (;onl;ains some sl;ructure, which was 
learner1 1)y COlnl)aring to a sentelme in the par- 
t;\]ally structure(l (;Ol;pllS~ is (;Olllt)ar(~,(\] 1;o all- 
other (t)art;ially stru(:ture(t) sente, n(:e. When 
the ('omparison of these two se, nl;ence, s yields 
a constituent thai: was ah:ea(ly t)resent in both 
senten(:es, the tyl)es of these constitueld;S are 
merged. All constituents of these types are ut)- 
dated, so the, y have the same tyl)e. 
By merging tyl)es of constituents we make t;he 
assuml)tion that co\]lstil;uents in a (:ertain con- 
text can only have one tyl)e. In section 5.2 we 
discuss the, imt)li(:atiolls of this assmnpl;ion and 
propose an alternative at)t)roach. 
3.2 Selection Learning 
The first step in the algorithm may at some 
point generate COllstituents that overlap with 
other constituents, hi figure 4 Give me all 
flights .from Dallas to Boston receives two over- 
lal)ping structures. One constituent is learned 
3Since the alger||Inn does not know any (linguist;|c) 
llalIICS for the types, the alger|finn chooses natural num- 
bers to denote different types. 
963 
( Book Delta 128 )flwn Dallas to Boston 
?'Give m¢(all.fligh, ts)'f,'om Dallas to Boston) 
Give me ( help on classes ) 
l?igure 4: Overlapping constituents 
by comparing against Book Delta 128 firm Dal- 
las to Boston and the other (overlapl)ing) con- 
stituent is tbund by aligning with Give me help 
on classes. 
The solution to this problem has to do with 
selecting the correct constituents (or at least 
the better constituents) out of the possible con- 
stitnents. Selecting constituents can be done in 
several dittbrent ways. 
ABL:incr Assume that the first constituent 
learned is the correct one. This means that 
when a new constituent overlaps with older 
constituents, it can 1)e ignored (i.e. they are 
not stored in the cortms). 
ABL:leaf The model corot)rites the probabil- 
ity of a constituent counting the nmnber of 
times the particular words of the constituent 
have occurred in the learned text as a con- 
stituent, normalized by the total number of 
constituents. 
Ple,f(c) = \]c' C C: yield(c') = yicld(c)l 
ICI 
where C is the entire set: of constituents. 
ABL:braneh In addition to the words of the 
sentence delimited by the constituent, the 
model computes the probability based on the 
part of the sentence delimited by the words 
of the constituent and its non-terminal (i.e. 
a normalised probability of ABL:leaf). 
Pb,.~na,,(clroot(c ) = r) = 
e c: y/el(l(,-') -- yield(c) A ; "1 
Ic" c: ,'oot(c") = 
The first method is non-probabilistic and may 
be applied every time a constituent is found that 
overlaps with a known constituent (i.e. while 
learning). 
The two other methods are probabilistic. The 
model computes the probability of the con- 
stituents and then uses that probability to select 
constituents with the highest probability. These 
methods are ~pplied afl;er the aligmnent learn- 
ing phase, since more specific informatioil (in 
the form of 1)etter counts) can be found at that 
time. 
In section 4 we will ewfluate all three methods 
on the ATIS and OVIS corpus. 
3.2.1 Viterbi 
Since more than just two constituents can over- 
lap, all possible combinations of overlapping 
constitueni;s should be considered when com- 
Imting the best combination of constituents, 
which is the product of the probabilities of the 
separate constituents as in SCFGs (cf. (Booth, 
1969)). A Viterbi style algorithm optimization 
(1967) is used to etficiently select the best com- 
bination of constituents. 
When conll)uting the t)r()t)ability of a com- 
bination of constituents, multiplying the sepa- 
rate probabilities of the constituents biases to- 
wards a low nnmber of constituents. Theretbre, 
we comtmte the probability of a set of con- 
stituents using a normalized version, the geo- 
metric mean 4, rather than its product. (Cara- 
ballo and Charniak, 1998) 
4 Results 
The three different ABL algorithms m~d two 
1)aseline systems have been tested on the ATIS 
and OVIS corpora. 
The ATIS corlms ti'om the P(;nn Treebank 
consists of 716 sentences containing 11,777 (:on- 
stituents. The larger OVIS corpus is a Dutch 
corpus containing sentences on travel intbrma- 
tion. It consists of exactly 10,000 sentences. We 
have removed all sentences containing only one 
word, resulting in a corpus of 6,797 sentences 
and 48,562 constituents. 
The sentences of the corpora are stript)ed 
of their structures. These plain sentences are 
used in the learning algorithms and the result- 
ing structure is compared to the structure of the 
original corpus. 
All ABL methods are tested ten times. Th(, 
ABL:incr method is applied to random orders of 
the input corpus. The probabilistic ABL meth- 
ods select constituents at random when differ- 
ent combinations of constituents have the same 
probability. The results in table 1 show the 
4The geometric mean of a set of constituents 
^... A = VFI  =, P( d 
964 
LEFT 
I{IGI/T 
ABL:INCll 
AI3L:LEAF 
ABL:BI/ANCII 
NCBP 
32.6O 
82.70 
83.24 (1.17) 
81.42 (0.11) 
8, .31 (0.01) 
AT1S OVIS 
NCBI{ ZCS NCB\] ) NCBR ZCS 
76.82 
92.91 
87.21 (0.67) 
86.27 (0.06) 
89.31 (0.01) 
1.J2 
38.83 
18.56 (2.32) 
21.63 (0.5O) 
29.75 (0.00) 
51.23 
75.85 
88.71 (0.79) 
85.32 (0.02) 
89.2.5 (0.oo) 
73.17 
86.66 
84.36 (1.\]0) 
79.96 (0.03) 
8>o4 (0.0|)) 
25.22 
48.08 
30.87 (0.09) 
42.20 (0.01) 
laJ)h, 1: Results of I;he 
mean ;rod standard deviations (between bra(:k- 
ets). 
The two base, line systcnis, left and right, onty 
t)uiM left: mid right brnnching trees respectively. 
Three, metrics hnve been compnl;cd. NCBP 
stmlds for Non-(\]rossing Bra.(:kets Precision, 
which denotes the percentage, of learned (:on- 
stituents th~,t do not overlai) with any con- 
sl;it;uent;s in I;he m'igi'n, al (:orpus. NCIH~ is the 
Non-Crossing Brackets ll.e(:all mid shows |;he 
t)(;rt'ent~ge of constituents in the original col 
t)us thai; (1o not overlap with :my constituents 
in the learned (:oft)us. Finnlly, Z(LS' strums ti)l' 
Zero-(Jrossing Sentences a,nd r(',l)reseuts the per- 
c(ml;age of sentence, s that (t(1 not have m~y over- 
lnt)l)ing constii;uenl;s. 
4.1 Evaluation 
'l-'tm incr modet 1)erfi)rms (tuii:e well (:onsi(hwing 
the t'~mt that it; (:;mnot re(:ov(w t'roln incorre(:t 
(:()nstituents, with a t)re(:ision a,nd re(:~dl of ()V(~l' 
8t) %. The order of the senl;en(:es how(we, r is 
quite iml)orbmt , since |tie sl;ml(tard deviation 
of the inc'r model is quite high (est)e~(:ialty with 
the ZCS, reaching 3.22 % on the OV!S (:orpus). 
We expected the prot)nl)ilistic nmtho(ts to 
i)erform t)o,l;ter, trot the lc((f modet performs 
slightly worse. The, ZCS, however, is somewhat 
better, re, suiting in 21.63 % on the AT1S cor- 
pus. Furthermore, d;he standard deviations of 
the le,:f model (&lid Of the branch, model) are 
c\]ose to 0 %. The st;~tisti(:al methods generate 
more precise, results. 
Ttm bra'n, ch, modet dearly outl)erfornl all 
o~,her models. Using more Sl)e(:itic statistics gen- 
erate better results. 
Although the resull;s of the N FIS (:orpus mM 
OVlS corIms differ, the, conclusions that (:ml })e 
reached are similm:. 
ATIS and OVIS corpora 
4.2 ABL Compared to Other Methods 
It; is difficult to corot)are the results of the ABL 
model ag~dnst other lnethods, since, often dif 
thrent corpora or m(',trics m:e used. The meth- 
ods describe, d by Pereira and Schabcs (1.9(.)2) 
comes reasonably close to ours. The unsuper- 
vised nmthod le~rns structure on plain sentences 
from the ATIS corlms resulting in 37.35 % pre- 
cision, while the "un.supcrvised ABL signili(:mltly 
outperforms this method, reaching 85.31% l)re- 
cision. Only their s'uperviscd version results in 
n slightly higher pre('ision of 90.36 %. 
The syste, nl th;d; simt)ly buihts right branch- 
ins structures results in 82.70 % precision mid 
92.91% teeM1 on the ATIS cortms, where ABL 
got 85.31% and 89.31%. This wa,s expected, 
sin(:e English is a right |)rmmhing ; a 
left branching sysl;Clll t)(~rff)l.'ltle(| lllllCh woFsc 
(32.60 % pre(:ision and 76.82 % rccnll). C(m- 
versely, right branching wouht not do very well 
on ~ ,l~q)mmse, corpus (~ left 1)r~m(:hing lan- 
gua.ge). Sin(:e A\]31, does not have a 1)ref(~renc(~ 
fi)r direction built; in, we exi)ect ABL to t)ertbrm 
similarly on n Ja,t)anese (:orpus. 
5 Discussion and Future Extensions 
5.1 Recurs|on 
All ABL methods des('ribed here can lem:n re- 
cursive structures and have been fomtd when 
~t)plying ABI, to the NIl?IS and OVIS (:orlms. 
As (:ml be sc(m in figure 5, the learned recur- 
sive structure, is similm: to the, original struc- 
ture. Some structure has t)een removed to make 
it easier to see where the recurs|on occurs. 
Roughly, recursive structures arc built in two 
steps. First, the algorithm generates the struc- 
ture with difl'cro, nt non-terminals. Then, the 
two nonq;ermimds are merged as described in 
so, el;ion 3.1.3. The merging of the non-terminals 
m~y occur anywhere in the cortms , sin(:e all 
merged non-terndnals are ut)dated. 
965 
learned 
original 
learned 
original 
Please ezplain the (field FLT DAY in the (table)is)is 
Please explain (the .field FLT DAY in (the table)NP)Np 
Explain classes QW and (QX and (Y)a2)~'e 
Explain classes ((QW)Np and (QX)NI, and (Y)NP)NP 
Fignre 5: Recursive structures learned in the A TIS corpus 
Show me the ( morning )x flights 
Show me the ( nonstop )x fli.qhts 
Figure 6: Wrong syntactic type 
5.2 Wrong Syntactic Type 
In section 3.1.3 we made the assumt)tion that a 
constituent in a certain context can only have 
one type. This assumption introduces some 
problems. 
The sentence John likes visiting relatives il- 
lustrates such a problem. The constituent vis- 
iting relatives can be a noun phrase or n verb 
phrase. 
Another prol)lem is ilhlstrated in figure 6. 
When applying the ABL learning algorithm to 
these sentences, it will determine that morning 
and nonstop are of the same type. Untbrtu- 
nately, morning is a noun, while nonstop is an 
adverb) 
A fixture extension will not only look at the 
type of the constituents, lint also at the con- 
text; of the constituents. Ii5 the example, the 
constituent morning nlay also take the t)lace of 
a subject position in other sentences~ 1)ut the 
constituent nonstop never will. This intbrnm- 
tion can be used to determine when to merge 
constituent types, efl'ectively loosening the as- 
sunlption. 
5.3 Weakening Exact Match 
When the ABL algorithms try to learn with two 
conlpletely distinct sentences, nothing can be 
learned. If we weaken the exact match between 
words in the alignment step of the algorithm, it 
is possible to learn structure even with distinct 
sentences. 
Instead of linking exactly matching words, 
the algorithm should match words that are 
equivalent. An obvious way of implementing 
this is by making use of cquivalence classes. (See 
5Harris's implication does hold in these sentences. 
nonstop can also be replaced by for example cheap (an- 
other adverb) and morning can be replaced by evenin.q 
(another noun). 
for example (Redington et al., 1998).) The idea 
1)ehind equivalence classes is that words which 
are closely related are grouped together. 
A big advantage of equivalence classes is that 
they can be learned in an unsupervised way, so 
the resulting algorithm remains nnsui)ervised. 
Words that are in the same equivalence class 
are. said to be sufficiently equivalent, so the 
aligmnent algoritlnn may assunm they are sin> 
ilar and may thus link them. Now sentences 
that do not have words in common, but do have 
words in the same equivalence class in common, 
can be used to learn structure. 
When using equivalence classes, more con- 
stituents are learned and more terminals in con- 
stitnents may l)e seen as similar (according to 
the equivalence classes). This results in a much 
richer structm'ed corlms. 
5.4 Alternative Statistics 
At the moment we have tested two diflbrent 
ways of computing the probal)ility of a con- 
stituent: ABL:leaf which computes the t)rot) - 
ability of the occurrence of the terminals in a 
constituent, and ABL:b','anch which coml)utes 
the probability of the occurrence of |;11(; termi- 
nals together with the root non-terminal in a 
(-onstitueut, based on the learned corpus. 
Of course, other models can bc imt)lemented. 
One interesting possibility takes a DOP-like ap- 
proach (Bod, 1998), which also takes into ac- 
count the inner structure of the constituents. 
6 Conclusion 
Wc have introduced a new grammar learning al- 
gorithm based 055 c()mparing and aligning plain 
sentences; neither pre-labelled or bracketed sen- 
tences, nor pre-tagged sentences arc used. It 
uses distinctions between sentences to find pos- 
sible constituents and afterwards selects the 
most probable ones. The output of the algo- 
rithm is a structured version of the corpus. 
By l;aking entire sentences into account, the 
context used by the model is not limited by win- 
dow size, instead arbitrarily large contexts are 
966 
used. Furthermore, the model has the ability to 
learn recursion. 
~\['ln'ee ditl'erent instances of the algorithm 
have l)een al)t)lied to two corpora of differ- 
eat size, the ATIS corpus (716 sentences) and 
the OVIS corpus (6,797 sentences), generating 
promising results. Although t;he OVIS corpus 
is almost ten t;imes the size of the ATIS cor- 
pus, these corpora describe a small conceptual 
domain. We plan to ~l)t)ly the ~flgori~hms to 
larger domain corpora in the near fllture. 

References 

J. K. Barker. 1979. Trainabh; grammars for 
speech recognition. In J. J. Wolf and 1). H. 
Klatt, editors, Speech, Communication Papers 
for the Ninety-seventh Meetin.q of the Acous- 
tical Society of America, pages 547-550. 

R,ens Bod. 1998. Beyond Grammar An 
_F, zpcricncc-Bascd Th, eory of Language. Stan- 
Jbrd, CA: CSLI Publications. 

R.. Bonnema, R. Bod, and R,. Scha. 1997. A 
DOP model for semantic iil|;ertn'el;ation. In 
Proceedings of the Association for Compu~ 
tational Ling'aistics/Eurwpean Ch, apter of th, c 
Association for Computational Linguistics, 
Madrid, p~ges 159 167. Sommerset, N J: A s- 
soci~tion tbr Compul;ational Linguistics. 

T. Booth. 1969. Probal)ilistic representation of 
formed s. In Co'~@',rcnce ll, cco'rd o/" 
1959 "lEnth, Annual Symposium on ,5'witcl~,in.q 
and Automata Theory, pages 74 8:1. 

Sharon A. Caraballo and Eugene Charniak. 
1998. New figures of merit for best-first prol)- 
abilist;ie chart parsing. Computational Lin- 
guistics, 24(2):275 298. 

Stanley F. Chert. 1995. Bayesian gralmnar in- 
duction for  modeling. \]:ll Proceed- 
ings of the Association J'or Computational 
Linguistics, pages 228 235. 

Walter Daelemmls. 1995. Memory-based lexi- 
cal acquisition and 1)rocessing. In P. Stefh;ns, 
editor, Mach, inc Translation and the Lexicon, 
vohmm 898 of Lecture Notes in Artificial In- 
telligence, pages 85 98. Berlin: Springer Ver- 
lag. 

Carl G. de Marcken. 1996. Unsupervised Lan- 
g'aage Acquisition. I)h.D. thesis, Departnmnt 
of Electrical Engineering mid Comtmter Sci- 
ence, Massachusetts Institute of Technology, 
Cambridge, MA, sep. 

Peter Oriinwald. 1994. A nfinimmn de, scription 
lengl;h approach to grammar inference. In 
G. Scheler, S. Wernter, and E. R,iloif, editors, 
Connectionist, Statistical and S!pnbolic Ap- 
proaches to Learning for Natural Language, 
vohnne 1004 of Lecture Notes in dl~ pages 
203-216. Berlin: Springer Verlag. 

Zellig Harris. 1951. Methods in Structural Lin- 
guistics. Chicago, IL: University of Chicago 
Press. 

Andrew Kehler and Andreas Stoleke. 1999. 
Preface. In A. Kehler and A, Stolcke, edi- 
tors, Unsuper'viscd Learning in Natural Lan- 
guage Processing. Association for Comlmta- 
tional Linguistics. Proceedings of the work- 
shop. 

K. Lari and S. J. Young. 1990. The estima- 
tion of stochastic context-free grammars us- 
ing the inside-outside ~dgorithm. Computer 
Speech and Language, 4:35 56. 

\]). Magerman and M. Marcus. 1990. Pars- 
ing natural  using mutual intbrma- 
tion statistics. In Pwcecdin.qs o,f th, e National 
Con.fcrcnce on Artificial Intelli.qence, p~ges 
984 989. Cambridge, MA: MIT Press. 

M. Marcus, B. Santorini, and M. Marcinkiewicz. 
1993. Building a large annotated corpus of 
english: the Penn tr(,ebank. Computational 
Linguistics, 19(2):31.3 330. 

F. Pereira and Y. Schgd)e,s. 1992. Inside-outside 
reestimation fl:om pm:tially t)racketed cor- 
pora. In l'rocccdings of th, c Association for 
Computational Lin.quistics, pages :128-135, 
Newark, Debm~are. 

Martin Redington, Nick Chater, and Steven 
Finch. 1998. Distrilmtional information: A 
powerflfl cue for acquiring synt;actic cate- 
gories. Cwnitivc Science, 22(4):4:25 469. 

A. Viterbi. t967. Error bmmds for convoh> 
tiona\] codes and an asymptotically ol)timum 
decoding algorithm. Institute of Electrical 
and Electronics Engineers Transactions on 
Information Th, cory, 13:260 269. 

Robert A. Wagner and Michael J. Fischer. 
1974:. The string-to-string correction prob- 
lem. Journal of th, e Association for Comput- 
ing Machinery, 21(1):168-173, jmL 

J. G. Wollf. 1982. Lmlguage acquisition, data 
compression and generalization. Langv, agc ~ 
Communication, 2:57-89. 
