Theory Refinement and Natural Language Learning 
Hervd Ddjean* 
Seminar fiir S1)rachwissenschaft 
UniversitSt Tiibingen 
dej ean~sf s. nphil, uni-l;uebingen, de 
Abstract 
This 1)aI)er l)resents a learning system for identify- 
ing synta(:tie structures. This system relies on the 
use of backgromld knowledge an(1 default wflues in 
order to buihl utl an initial grammar and the use of 
theory retiimlnent in order to iml)rove this granunar. 
This contbilmtion provides a good machine learning 
fl:amework for Natural Lmlguage Learning. We il- 
lustrate this 1)oint with the 1)resentation of ALLiS, 
a learlfiug system which generates a regular ext)res- 
sion grmnmar of non-recursive phrases fi'om brack- 
eted corpora. 
1 Introduction 
Apl)lying Machine lx~'arning t('.chniques to Natural 
Language Processing is a booming domain of ru- 
s(:~ar('h. One of the reasons is the. (levelopment of cor- 
1)ora with morl)ho-synta(:ti(: and syntacti(: mmota- 
tion (Marcus et al., 1993), (Sampson, 1995). One re- 
cent l)opular sul)task is the learning of non-re(:ursive 
Nouns Phrases (NP) (\]{amshaw and Mart:us, 1995), 
(~\['jong Kim Sang an(1 Vcenstra, 1999), (Mufioz et 
al., 1999), (Group, 1998), (Cardie and Pierce, 1999), 
(Buchholz el; al., 1999). 
When other learning tcchui(lues (symboli(: or sta- 
tistical) are widely used in Natural Language Bern'h- 
int, theory retlnelnent (Aliecker and Schmid, 199G), 
(Mooney, 1993) seems to be ignored (excel)t (Brunk 
and Pazzmfi, 1995)). Theory refinement consists of 
iml)roving an existing knowledge base so that it; bet- 
ter at:cords with data. No work using theory re- 
finement apI)lied to tile grammar learning paradigm 
seems to have been develol)ed. We would like to 
point out in this article the adequacy between the- 
ory refinement and Natural Language Learning. 
To illustrate this claim, we present ALLiS (Archi- 
tecture for Learning Linguistic Structures), a learn- 
ing system which uses theory refinement in order to 
learn non-reeursive noun phrases (also called base 
norm t)hrases) and non-recursive verbal 1)hrases. We 
will show that this technique comliine(1 with the 
* This research is flmded be the TMll. network L('m'ning 
Coml)Ut;~tional (lrmnmars www. leg- www. uia. ac. be/lcg/ 
use of default vahms provides a good architecture 
to learn natural  structures. 
This article is organised as follows: Section 2 gives 
all overview of theory refinement. Section 3 CXl)Iains 
the advantage of combining default vahles and the- 
ory refinement to build a learning system. Section d 
des(:ril)es th('. genc'ral characteristics (if ALLiS, and 
Sc(:tion 5 explains the learning algorithm. The eval- 
uation of ALLiS is described Section 6. The exam- 
ples which illustrate this arti(:le corresl)ond to En- 
glish NPs. 
2 Theory Refinement 
We 1)resent here a l)rief illtro(hlction to theory re- 
finement. For a more (tetailed presentation, we refer 
the reader to (Abecker and Schmid, 1996), (Brunk, 
1996) or (Ourston and Mooney, 1990). (Mooney, 
1993) detines it; as: 
Theory retinement systems develot)ed in 
Machine Learning automatically modify a 
Knowle(lge Base to render it; consiste.nt 
with a set of (:lassifie(1 training examples. 
This technique thus (:onsists of trot)roving a given 
Knowledge Base (here a grammar) on tile l/asis 
of examtIles (here a treebank). Some iml)OSe to 
modif'y the initial knowledge 1)ase as little as pos- 
sible. Applied in conjmmtion with existing learning 
techniques (Explanation-Based Learning, Inductive 
Logic l)rogramming), TR seems to achieve' better 
results than these techniques used alone (Mooney, 
1997). Theory retinement is mainly used (and has its 
origin) in Knowledge Based Systems (KBS) (Craw 
and Sleeman, 1990). It consists of two main steps: 
1.. Build a more or less correct grammar on the 
basis of background knowled.qe. 
2. Refine this grmnmar using training examt)les: 
(a) Identify the revision 1)oints 
(b) Correct them 
The first; step consists of acquiring an initial gram- 
nmr (or more generally a knowledge base). In this 
work, the initial grainmar is automatically induce(1 
229 
from a tagged and bracketed corpus. The second 
step (the refinement) compares the prediction of the 
initial gralnmar with the training corpus in order to 
firstly identify the revision point.s, i.e. points that 
are not correctly described by the grammar, and 
secondly, to correct these revision points. The er- 
ror identification and refinement operations are ex- 
plained Section 5.3. 
The main difference between a TR, system and 
other symbolic learning systems is that a TR system 
must be able to revise existing rules given to tile 
system as background knowledge. (A system such 
as TBL (Brill, 1993) can not be considered us TR 
since it only acquires new rules). In the case of other 
techniques, new rules are learned in order to improve 
the general efficiency of the system (selection of the 
"best rule" according to a preference function) and 
not in order to correct a specific rule. 
3 Theory Refinement, Default values 
and Natural Language L~arning 
This section explains how default values combined 
with theory refinement can provide a good inachine 
learning framework for NLP. 
3.1 The Use of Default Values 
The use of default values is not new in NLP (Brill, 
1993), (Vergne and Giguet, 1998). We can observe 
that often (but not necessarily) in a , an 
element belongs to a predominant class (Vergne and 
Giguet, 1998). Some systems such as stochastic 
models use this property implicitly. Some others use 
it explicitly. For instance, the general principle of 
the Transformation-Based Learning (Brill, 1993) is 
to assign to each element its most frequent category, 
and then to learn transformation rules which cor- 
rect its initial categorisation. A second example is: 
the parser described ill (Vergne and Giguet, 1998). 
They first assign to each granunatical word a de- 
fault category (default tag), and then might modify 
it thanks to local contexts and grammatical relation 
assignment (in order to deal with constraints due to 
long distance relations which can not be expressed 
by local contexts). 
The main work is done by the lexicon and by de- 
fault values (even if further operations are ol)viously 
necessary) 
These approaches are thus different for the disam- 
biguation often used in tagging. The default rules 
are not numerous (one per tag), easy to automati- 
cally generate but they nevertheless produce a sat- 
isfactory starting level. 
3.2 The Combination of Default Values 
with TR 
The idea on which ALLiS relies is tile following: a 
first; "naive gramnmr" is lmilt up using default val- 
ues, and then TR is used in order to provide a "more 
realistic gralnmar". Tiffs initial grammar assigns to 
each element its default category (the algorithm is 
explained ill Section 5.2). Tile rules learned are cat- 
egorisation rules: assign a category to an clement (a 
tag or a word). Since an element is automatically 
assigned to its default category, the system has not 
to learn the categorisation rules for its category, and 
just learns categorisation rules which correspond to 
cases in which the element does not belong to its de- 
fault category. This minimised the number of rules 
that have to be learned. Suppose the element c can 
belong to several categories (a frequent case). Tile 
frst rule learned is tile "default" rule: assign(e, dc), 
where dc is the defimlt category of c. Then ALLiS 
just learns rules for cases where c does no 1)elong to 
its default category. The numerous rules concern- 
ing the default category are replaced by the simple 
default rule. 
4 ALLiS 
The goal of ALLiS x is to automatically build a regu- 
lar expression grammar from a bracketed and tagged 
corpus 9. In this training data, only the structures 
we want to learn are marked at their boundaries by 
square brackets a. The following sentence shows an 
example of tile training corpus for the NP structure 
(only base-NPs occur inside brackets). 
In/IN \[ early/JJ trading/NN \] in/IN \[ 
Itong/NNP Kong/NNP \] \[ Monday/NNP 
\] ,/, \[ gold/NN \] was/VBD quoted/VBN 
at/IN \[ $/$ 366.50/CD \] \[ an/DT 
ounce/NN \] ./. 
ALl, iS uses an internal formalism in order to rep- 
resent the grmmnar rules. In order to parse a text, 
a module converts its formalism into a regular ex- 
pression grammar which can be used by a parser us- 
ing such representation (two modules exist: one for 
the CASS parser (Almey, 1996) and one for XFST 
(Karttunen et al., 1997)). 
Following the principle of theory refinement, the 
learning task is composed of two steps. The first step 
is the generation of the initial grammar. This gram- 
mar is generated ti'om examples and baekgromxd 
knowledge (Section 5). This initial grammar pro- 
vides an incomplete and/or incorrect analysis of the 
data. The second step is the refinement of this gram- 
mar (Section 5.3). During this step, the validity of 
the grammar rules is checked and the rules are im- 
proved (refined) if necessary. This improvement cor- 
responds to find contexts in which elements which 
i http ://www. sfb441, uni- tuebingen, de/'dej ean/ 
chunker, html. 
2'\]'he WSJ corpus (Marcus et al., 1993). 
a(Mufioz et al., 1999) showed that this representation 
tends to provide better results than the representation used 
in (Ramshaw and Marcus, 1995) where each word is tagged 
with a tag I(inside), O(outskte), or B(breaker). 
230 
are considered to be meinbers (}f the structure do 
hoe 1}elong to this stru{:ture (and re(:it}r{)c~dly ). 
We give here a simple exami)le to illustrate tit(.* 
learning process. The first step (initial grammar 
generation) categorises the tag JJ (adje{:tive) as be- 
longing by default to the NP strlleture if it; oec:irs 
before a noun. Tit{; second ste t) (refinemellt) tinds 
ouI; that some adjectives do not ol}ey to these rules 4. 
Tim refiimment is triggered in order t{} modify the 
default rule so that these ex{'el)tions can t}e {:(}rre(:tly 
processed. 
Thus, the learning algorithm siml}ly consists of 
eategorising the elements of the eorlms (tags and 
words) into st}celtic categories, and this eategorisa- 
lion allows the extraction of the stru{:i;ures we want 
to learn. These (:ategories are exi)lained in the next 
section. 
5 The Learning System 
5.1 The Background Knowledge. 
\]n order to ease the learning, the system uses 1)aek- 
ground knowledge. This knowledge, 1}rovide, s a for- 
real and general (les('ription of the S\[;lll{:tllr{)s that 
ALLiS can learn. We SUl)l)ose that the stru{:tures are 
(:Oml}(}se(l of a mmleus with (}t)ti{mal left; an(1 right 
ad.\]un{:ts. We here give int'{}rmal (letiniti{}ns, |h(', for- 
inal/disl;rilmti{mal ones are given in Section 5.2. 
Tit(; nucleus is the head of the structure.. We 
authorise the presence of several nuclei in the. stone 
strll{;l;ure. 
All the other e.lements in the st;ruet;nre (except 
the linker) me (:onsidered as adjuncts. They are 
in dependence relation with the, head of tit{! st:rue- 
tm'e. Tit(; adjuncts are (',hara(:l;eris(;{I 1}y l;heir \])osi- 
tion (left/right) re, lative to the nu(:leus. 
A linker is a Sl)e{:ial elenle, nt; which 1}uihls an en- 
{lo{:entri{: st;ruetlne with t;w(} elentenl;s. It usually 
{;{)rrest}on(ts t;o c(}or(lilla{;i{)n 5. 
An ehBment (micleus or adjunct) might possess the 
break 1)r{}I)erty. This llotiolt is iu|;rodtlce(| (;IS ill 
(l{axnshaw and Marcus, 19\[)5), (Mufioz et al., 1999)) 
in order to deal with sequences where adjacent m~elei 
compose several structures (Section 5.2.4). 
This 1}attern can 1)e seen as a variant of the X- 
l)ar template (Ilead, Spec and COral)), whieh was 
already used in a learning sysl;ent (Berwiek, 1.985) 
(alth(mgh Coral) is not useflfl for the n{m-reeursive 
structures). 
The possible ditferent categories of air element are 
summm'ised in Figure 5.1. 
The tbllowing sentence shows an exmnl}le of cat- 
egorisation (elements which do not al}l)ear in the 
structure (NP) are tagged O): 
4For examl}le: \[ the/l)T 7/el) %/NN |}(mchmarl:/NN is- 
sue/NN \] due/aa \[ October/NNP l.q,qg/Cl) \]. 
5Only linkers occurring between the two coordinated eh!- 
Ill{!ll\[,S al'{, ~ i)l'oc(!,qs{~(\], 
CAT /% 
OUT IN 
A l/r NU +/- B +/- B 
Figure 1: The different categories. 
Ill 0 earlYAB+ tradingNU in 0 I{ongN U KOngNU 
MondayNuB+ ,0 goldNu was 0 quoted 0 at 0 $A 
366.50NU allAB+OlllleeNu -0 
The stru(:ture is fornmlly defined as: 
S+A *\[A *NUb_A *\]+ * l,b+ 1,1> r,b- At,b+ 
\]A *k * A * * " l,bq 1,1}- NUb+ r,b- Ar,b+ 
2File syml}ol , {:orresl)onds to the Kleene star. For 
the sake of legit}ility, we do not introduce linkers in 
this expression. But each symbol X (NU, A) cml 
1)e de, tined 1}y the rule {; X --~- X I X 1 X where l is 
the list; of linkers. The symbols B+ and \]Y- indicate 
whether the element has the. breaker i)rot)erty or n(}t. 
Since lit(; {:orpus does not contain information 
about these distrilmtional categories, ALLiS has to 
figure them {}ut. This {:ategorisation relies on the 
distritmtional behaviour of the ele.nlents, and ean be 
automati{:ally achieved. 
5.2 The Initial Categorisation 
The general idea t{} {:ateg(}rise elements is t(} use sl}e- 
{:iti{: (:ont{:xts which 1)oint {}ut s(}nie of the distrilm- 
l;iona\] t)r(}perti{:s of lhe (;ategory. ~\]Plle {;al;eg()risaf, ioll 
is a sequential 1)ro{;ess. First; the mmM have to be 
found out. 14)r each tag (}f the {:(}rl)uS , we ai)l}ly 
the fim{'ti{m J'm~ (described bek}w). This flmction 
se\]e('ts a fist of elements w\]fich are eateg(}rised as 
mmIei. The fun{:tion ft) is al}p\]ied to this list in 0> 
der to tigure out mmlei which a,'e breakers. Then 
the a(ljmtcts are, found out, and the function fb is 
also apl}\]ied l;(} them to figure out breakers. 
5.2.1 Categorisation of Nuclei 
The (;Olltext used to find out the nuclei relies ott this 
siml}le following ol)servation: A structure requires 
at least; {me nucleus r. Thus, the elements tha.t; oc- 
cur alone in a structure are assimilated to nucleus, 
since a structure requires a nucleus. For example, 
the tags PRP (pronouns) and NNP (proi)er nora:s) 
may eomp{}se a.lone a structure, (respectively 99% 
6The regular exl)ression lbrmalism does not allow such 
rule, and then, this recursion is simulated. 
7The partial structures (structures without mtcleus) tel>re- 
sent 2% of all the structures it: the, training corl)us , and I,h(m 
introduce little noise. 
231 
and 48% of these tags appear alone in NP) but the 
tag JJ appears alone only 0.009%. We deduce that 
PRP and NNP belong to the nuclei and not JJ. But 
this criterion does not allow tile identification of all 
the nuclei. Some often appear with adjuncts (an En- 
glish noun (NN) often s occurs with a deternfiner or 
an adjective and thus appears alone only 13%). The 
single use of this characteristic provides a continuum 
of values where the automatic set up of a threshold 
between adjuncts and nuclei is probleinatic and de- 
pends on the structure. To solve this problem, the 
idea is to decompose the categorisation of nuclei in 
two steps. First we identify characteristic adjuncts. 
The adjuncts can not appear alone since they de- 
pend on a nucleus. The function fchar is built so 
that it provides a very small value for adjuncts. If 
the value of an element is lower than a given thresh- 
old (0char = 0.05), then it is categorised as a char- 
acteristic adjunct. 
= ~c x 
~c P corresi)onds to the number of occurrences of 
the pattern P in the corpus C. For exmnple the 
number of occurrences in the training corpus of the 
pattern \[JJ\] is 99: and the number of occurrences of 
the pattern JJ (including the pattern \[ JJ \]) is 11097. 
So lobar(J J) = 0.009, value being low enough to 
consider JJ as a characteristic adjunct. The list pro- 
vi(le(1 by fchar for English NP is: 
Achar = {DT, P1H~$, POS, .l.l, .I.IR, JJS," "," "} 
These elements can correspond to left; or right ad- 
juncts. All the adjuncts are not identified, but tiffs 
list allows the identification of the nuclei as ex- 
plained in the next paragraplL 
The second step consists of introducing these ele- 
ments into a new pattern used by the funetioll fro,. 
This pattern matches elements surrounded by these 
characteristic adjuncts. It; thus matches nuclei which 
often appear with adjuncts. Since a sequence of ad- 
juncts (as an adjunct alone) can not alone compose a 
complete structure, X only matches elements which 
correspond to a nucleus. 
= Ec { Ache,.* x A l 
The flmction fnu is a good discrimination function 
between nuclei and adjuncts and provides very low 
values for adjuncts and very high values for nuclei 
(table 1). 
5.2.2 Categorisation of Adjuncts 
Once the nuclei are identified, we can easily find out 
tlm adjuncts. They correspond to all the other ele- 
inents which appear in the context: There are two 
SAt least, in the training corpus. 
x Freq(x) fml(X) intcleus 
POS 1759 0.00 no 
PRP$ 1876 0.01 no 
JJ 11097 0.02 no 
DT 18097 0.03 no 
RB 919 0.06 no 
NNP 11046 0.74 yes 
NN 21240 0.87 yes 
NNPS 164 0.93 yes 
NNS 7774 0.95 yes 
WP 527 0.97 yes 
PRP 3800 0.99 yes 
Table 1: Detection of some nuclei of the English NP 
(nouns and pronouns). 
kinds of adjuncts: the left and the right adjuncts. 
The contexts used are: 
\[ _ NU \] fbr tile left, adjuncts 
\[ AI* NU _ \] for the right adjuncts 
If an element appears at tile position of the under- 
score, it is categorised as adjunct. Once tile left 
adjuncts are found out, they can be used for the cat- 
egorisation of the right adjuncts. They thus appear 
in the context as optional dements (this is helpfifl 
to capture circumpositions). 
Since tilt; Adjective Phrase occurring inside an NP 
is not marked in the Upenn treebank, we introduce 
the class of all'inner of adjunct. The contexts used 
to find out the acljuncts of the left adjunct are: 
\[ _ A 1 NU \] for the left adjuncts of A 1 
\[ al* A 1 _ NU \] for the right adjuncts of A 1 
The contexts are similar for adjuncts of right a(1- 
juncts. 
5.2.3 The Linkers 
By definition, a linker commcts two elements, and 
appears between them. The contexts used to find 
linkers are: 
\[ NU _ NU \] linker of nuclei 
\[ A _ A NU \] linker of left adjuncts 
\[ NU A _ A \] linker of right adjuncts 
Elements which occur in these contexts trot which 
have already been categorised as nucleus or adjuncts 
are deleted from the list. 
5.2.4 The Break Property 
A sequence of several nuclei (and the adjuncts which 
depend on them) can 1)elong to a mfique structure 
or compose several ad.iacent structures. An element 
is a breaker if its presence introduces a break into a 
sequence of adjacent nuclei. For example, the pres- 
once of the tag DT in the sequence NN DT JJ NN 
introduces a break before the tag DT, although the 
232 
sequence NN aJ NN (without \])T) can compose a 
single structure in the training corlms. 
... \[the/DT coming/VB¢ week/NN\] 
\[the/l)T foreign/JJ exchange/NN mar- 
ket/NN\] ... 
The tag DT introduces a 1)reak on its lefl;, but some 
tags can introduce a break on thoir right or on their 
left, and right. For instance, the tag WDT (NU by 
defmflt) introduces a bre, ak on its M't and on its 
right. In other words, this tag can not belong to 
the same structure as the preceding adjacent nucleus 
and to the same structure as the following adjacent 
Ilucleus. 
... \[ raih'oads/NNS and/CC trucking/NN 
companies/NNS \] \[ that/WDT \] tw,- 
ga,UW~D iu/llN \[ 19S0/CD \] ... 
... i,l/l:N \[ which/WDT \] \[ peop>/~,'NS \] 
generally/I/\]l arc/V'lTP ... 
lit order to detect wlfich tag has the t)re, ak 1)rot)erl;y, 
we build up two fimctions 'fb h;fl; and fb right" 
.l't, i,;n(x) = ~. Nu x 
" W\] ) 
• fb ,.i~O,t (x) - ~' x \] \[ N, )~-,c' X NU NU : {'nuch'.i} 
• wb (;wb: 
corpus wiLhout l)ra(:kcl,s 
These funcl;ions are used to coniput(; l;he, break 1)rop- 
erl;y for nuclei, but; also for adjullcts (\]\]i i.hi.q case, 
the pattern X is conll)leted })y a(hling the, elemeut 
NU to the left; or 1;o the right of X (tim 1)ot(mtial 
adjunct) according 1;o the kind of adjull(:t (left or 
right adjunct)). The table 2 shows some vahles for 
some tags. An olon)eIll; Call \])e a left brcakem (DT), a 
right breaker (no example fi)r English NI? at the tag 
level), or both (PRP). The break property is gener- 
ally well-markc'd and the thre~shold is easy to set up 
(0.66 in practice). 
TAG fb h;\[I; fb right_ 
1)31 0.97 (yes) O.O0(no) 
pR> 0.97 (yes) O.OS(yes) 
POS 0.95 (yes) 0.O0(no) 
l'lU'$ 0.94 (yes) 0.00(no) 
J,l 0.44 (no) O.O0(no) 
NN 0.04 (no) O.\].l.(no) 
NNS 0.03 (,,o) 0.14(no) 
Table 2: Breaker determina.tion. Values of the flmc- 
tions ot' b left and fb- right for soille elements. 
In the refinement step (Se(:tion 5.3), the breaker 
property can be extended to words. Thus, the word 
yesterday is considered as a right breaker, although 
its tag (NN) is not. 
5.3 The Refinement Step 
5.3.1 The notion of reliability 
Tile preceding functions ident, i(y the category of D, II 
elenmnt when it occurs in the str'ltctltT'e. \]lilt, fill ele- 
lnent can occur in the structure as well as out; of the 
structure. For example, the tag VBG is only consid- 
ered as adjunct when it occurs in the structure. Nev- 
ertheless, it mainly occurs out of the structure (84% 
of its occurrences). If an element mainly 9 occurs out 
of the st, ructure, it is considered as non-reliable and 
its default category is OUT. For each dement occur- 
ring inside the structure, its rdiable is tested. The 
initial grammar corresponds to the grammar which 
only contains the reliable elements. Its precision and 
its recall are aromM 86%. 
Itow is determined the reliability of an element? 
This notion of reliabh; element is contextual and de- 
pends on th(! category of the element. 
For the uuclei, the context is eml)ty. W'e just com- 
tmte l;he ratio })el;ween l;hc number of occurrences in 
the strll(;tur(} over th(} lllllllber of occtlrrelices occur- 
ring outside of the sl, ructure. 
For the adjmicts, the context in(-ludes an adja- 
cent nuc:leus (on the right for left adjun(tts or on the 
h~ft for right a(ljun(-l;s). For instance, the tag .lJ is 
categorise(l as left a@mct for the English NP. It ap- 
1)ears 9617 times 1)efore a lm(:leus and 9,189 times 
in the stru(-ture. It is thus (:onsi(lered as relial)le, 
and its default category is left adjmmt. In the case 
where the tag .\]J occurs without mmleus on its righl; 
(a pr(,dicative us(,,), it is not considered as adjunct 
mM this kind of occnrr(mces ix not used to deter- 
niine the rclialfility of the element. On l;he coutrary, 
the tag VIIG appears 468 times before a mmleus, 
but, in this context, it occurs only 138 times in the 
structure. This is not enough (29%) to consider t.he 
element as a relial)le left; adjunct, and thus il;s de- 
5mlt ca(;egoi'y is OUT. For the adjunct of adjunct, 
the conl;ext includes a(tjm/ct and imcleus. 
5.a.2 Detection of errors 
Once the inil;iM glmnmar ix built ufh its errors have 
I;o 1)e corrected. The detection of the errors corre- 
sponds to a lniscategorisation of a tag. An aut;omatic 
error done by the initial grammar is to wrongly anal- 
yse the structures coml)osed with non-reliable ele- 
ments (false negative examples). Each time that a 
non-reliable elenmnt occurs in the structure corm- 
spends to an error. For instance, the initial grammar 
cmi not correctly recognise the following sequence as 
an NP, the default category of the tag VBG being 
OUT (outside the structure): 
... \[the/1)T eoming/VBG week/NN\] ... 
~')'l'he threshold used is of 50%. 
233 
The second kind of errors corresponds to se- 
quences wrongly recognised as structures (false pos- 
itive exmnples). This kind of error is generated by 
reliable elements which exceptionally do not occur ill 
the structure. In the following example, orde~NN 
occurs outside of the structure, although the default 
category of the tag NN is NU (nucleus), and thus 
the initial grammar recognises an NP. 
... in/IN order/NN to/TO 1,ay/VB ... 
5.3.3 Correction 
In both kinds of errors, the stone technique is used 
to correct them. I~r this purpose, ALLiS disposes 
of two operators, the eontextualisation and the lex- 
icalisation. 
The contcxtualisation consists of finding out con- 
texts in order to fix the errors. The idea is to add 
constraints for recategorising non-reliable elements 
as reliable m. The t)resence of some specific elements 
can completely change the behaviour of a tag. The 
table 3 shows the list of contexts where the tag VBN 
is not categorised as OUT but as left Adjmmt. 
PRP$ VBG NU 
IN\[ VBG NU 
DT VBG NU 
aJ VBG NU 
POS VBG NU 
VBG \[ VBG NU 
Table 3: Some contexts where the non-reliable ele- 
ment VBN becomes reliable. 
For each tag occurring in the, structure, all tile 
possible contexts 11 are generated. For the non- 
reliable tags (first kind of error), we evaluate the 
reliability of them contextually, and we delete the 
contexts in which the tag is still non-reliable (tile 
list of contexts can be empty, and in this case the 
error can not be fixed). For the reliable tags (second 
kind of error), we keel) the contexts in which the tag 
is categorised OUT. 
The lcxicalisation consists of introducing lexical 
information: the word level. Some words can have 
a specific behaviour which does not appear at the 
Part-Of-Speech (POS) level. For instance, the word 
yesterday is a left breaker, behaviour which can not 
be figured out at the POS level (Table 4). The in- 
troduction of the lexicalisation improves the result 
by 2% (Section 6). 
The lexicalisation and the contextualisation can 
be combined when both separately a.re not powerflfl 
enough to fix the error. For example, the word about 
tagged RB (default category of RB: OUT) followed 
l°q'he same techifique is used in (Sima'an, 1997). 
HThe contexts depend on the category of the tag, but are 
just composed of one element. 
word(context) default cat. new cat. 
about/RB ( _ Cl)) OUT A1,B+ 
order (IN _ TO) NU OUT 
yesterday/NN NUB- NUB + 
operating/VBG OUT A1,B_ 
last/.JJ A1,B- A1, B+ 
Table 4: Some specific lexical behaviours. 
by the tag CD is recategorised as left: adjunct and 
left breaker (Table 4). 
6 Evaluation 
We now show some results and give some compar- 
isons with other works (Table 5). The results are, 
quite sinfilar to other approaches. Two rates are 
measured: precision an recall. 
~, ~ Nu~nbcr of correct proposed pattc'rns 
Number of correct patterns 
p z Number o.f correct proposed patterns 
Nu'mber of proposed pattcrns 
Tile training data. are comt)osed of the sections 
15-18 of tile Wall Street Journal Corpus (Marcus el; 
al., 1993), and we use the section 20 for the test 
corpus 12. The data is tagged with the Brill tag- 
ger. The works generating syinbolic rules like ALLiS 
are (Rainshaw and Marcus, 1.995) (Transformation- 
Based learning) and (Cardie and Pierce, 1.998) 
(error-driven pruning of treebank grammars). AL- 
LiS provides better results than them. (Argamon 
et al., 1.998) use a Memory-Based Shallow Learning 
system, (Tjong Kim Sang and Veenstra, 1999) the 
Memory-Based Learning nmtho(l and (Mufioz el; at., 
1999) uses a network of linear functions. The latter 
work seems to integrate better lexical intbrmation 
since ALLiS gets better results with POS only. 
POS only with words 
NP MPRZ99 90.3/90.9 92.40/93.10 
ALLiS 9t.0/91.2 92.56/92.36 
TV99 92.50/92.25 
ADK98 91.6/91.6 
RM95 90.7/90.5 92.3/91.8 
CP98 91.1/90.7 
VP ALLiS 91.39/90.52 92.15/91.95 
Table 5: Results for NP and VP structures (preci- 
sion/recall). 
The main errors done by ALLiS are due to er- 
rors of tagging (the corpus is tagged with the Brill 
tagger) or errors in the bracketing of the training 
cortms. Then, the second type of errors concerns 
12This data set; is avMlable via ftp://ftp, cis. upenn, edu/ 
pub/chunker/. 
234 
tim (:o()rdin;dx:(1 sl,ru(:tures. 'l'h(;s(~ two tyl)(;s (}l'l'Ol'S 
correspond to 51% of the, overall errors. We (:an tin(l 
1;11(: sam(; l;yl)olop;y in other works (\]{anlshaw :rod 
Marcus, 1995), (Ca rdi(: and Pierc(:, 1998). \Ve did 
some tries in order to manually imt)r()v(', th(; final 
,grammar: I)ut l;lm only tyt/(; ()f errors whi(:h can 1)e 
mmnmlly iml)r()v(;(1 (:(m(:(wns tim t)r()t)h;m of the (tuo- 
laI;ion tamks (th(~ inll)rov(mw.nl is al)()ul ()l' 0.2% i~l 
l)re(:ision mid recall). 
Error types ~j/- % 
tagging/bracketing errors 57 28.5% 
coordination d5 22.5% 
germM 15 7.5% 
a dv(wl) 13 6.5% 
ai)l)ositiv(:s 13 6.5% 
(lUOi;al;ion nl;lrks~ 1)ml(:tualion \] 0 5 % 
past; 1)artMl)le 9 :1.5% 
that (IN) fi 3% 
Tal)h'. 6: Typology of the 200 tirst errors. 

References 

An(h'eas Ab(;ck(;r and Klaus Schmid. :1996. From 
tlmory l'(:filmm(mt 1o kl) lIl}liIl((~ll}lllC(~: ;/ I)()sition 
:-d;a,l;(:ln(;lll;. \]11 U6'Al'96, Budat)(>l;, \]hmgary. 

Steven Abney. 1996. Partial parsing via finite-state 
cascades. In Proceedings of the ESSLLI '96 Robust 
Parsing Workshop. 

Shlomo Argamon, Ido Dagan, and Yuval Krymolowski. 1998. A memory-based approach to learning shallow natural  patterns. In COLING '98, Montrdal. 

Robert C. 13erwick. 1985. Th, c acquisition of syntac- 
tic k'nowlcdg('.. M\[T press, Cmnbri(lg('.. 

Eric Brill. 1993. A Corpv, s-Bascd Ap\]nvach 1,o 
Language Learning. Ph.l). thesis, 1)el)artnmnl; of 
Computer and hfformation Science, University of 
\]'ennsylvania. 

Clifl'ord Alan Brunk and Michael Pazzmii. 1995. 
A lexically based semantics bias for theory rex'i- 
:don. In Morgan l(aufl'man, e(litor, 7'welflh lnl, cr- 
national Co't@rc'nc(: on Mach.ine Learning, pages 
81-89. 

Cliflbrd Alto1 Brtmk. 1996. An invest, igation 
of Knowlc@c h~,tensivc Appwach, cs to Concept 
Learning and Theory Refinement. Ph.\]). thesis, 
University of Caliibrnia, irvine. 

Sabine Buchholz, Jorn Veenstra, and Walter Daelemans. 1999. Cascaded grammatical relation assignment. In Proceedings of EMNLP/VLC-99, 
pages I)-239 2."16, University of Maryland, USA. 

Claire, Cardie and David Pierce. 1998. Error-driven 
pruning of treebank grammars for base noun 
phrase identitification. In Proceedings of the 17th 
International Conference on Computational Linguistics (COLING-ACL '98). 

Claire, Ca rdie and David Pierce. 1999. The. rol(~ of 
lexicalization an(| pruning for base noun 1)}n'ase 
grammars. \]n l)~vccc:dings of I, he Sizl, c.cnth, Na- 
l, ional Confl:renc(: on Artificial Intelligence. 

Susan Craw and I). Sleenmn. 1990. Automating the 
r(~lineln(!nl; of l~nowledge-based syst;cnls. \]n Pro- 
cecdings of the £'(MI'90 Cm@~v',,cc: pages 167 
172. 

The XTAC l{.esear(:h Group. 1998. A lexicalized 
tree adjoining grammar fbr english. Technical Re- 
port i\]{CS 98-18, University of Pemlsylvania. 

Lauri Karl;I;un(~ll, ~l'am~is Gal, and Andrd K(!Illp(~,. 
1997. Xerox tinite-state tool. Technical report, 
Xerox \]{.es(!arch Centr(~ Eurol)e, Grenobl(,,. 

Mitchell Marcus, Beatrice Santorini, and Marc Ann 
Marcinkiewicz. 1993. Building a large annotated 
corpus of english: the penn treebank. Computational 
Linguistics, 19(2):313-330. 

Raymond J. Mooney. 1(.)93. In(hlction over the l111- 
exl)lained: Using overly-general domain theories 
(o aid (:on(:(;p( learning. Mach, i',,r~ Lc:arni',,g, 10:79. 

Raymond J. Mooney. 1997. Inductive logic programming for natural  processing. In Sixth International Inductive Logic Programming 
Workshop, pages 205-22d, Sl;o(:ldlohn,Swedcn. 

Marcia Munoz, Vasin Punyakanok, Dan Roth, and 
Dav Zimak. 1999. A learning approach to shallow 
parsing, In Proceedings of EMNLP-WVLC'99. 

l)irk ()m'st(m and \]{aymond Mooney. 1990. Chang- 
ing the rules: A (:(mq)rt~hensiv(,, al)prtmcli to th('.ory 
l'(tfillOlll(tlll.. \]ll l'roccc:dings of t/w ls'igM, National 
Co'l@re'm:r, o'n Arl,~ific:ial \[nl, elligr:nce, 1)ag(!s 815 
820. 

Lance A. Ramshaw and Mitchell P. Marcus. 1995. 
Text chunking using transformation-based learning. In ACL Third Workshop on Very Large Corpora, pages 82-94. 

Geotti'ey Sampson. 1995. English, for the Comp'utcr. 
The SUSANNE Uorpus and Analytic ,5'ch, cme. 
Oxford: Clarendon Press. 

Khalid Sima'an. 1997. Explanation-based learning 
of data oriented parsing. In T. Mark Ellison, editor, Cmoputational Natural Language Learning (CoNLL), ACL/EACL-97, Madrid. 

Erik Tjong Kim Sang and Jorn Veenstra. 1999. 
Representing text chunks. In Proceedings of 
EACL'99, Association for Computational Linguistics, Bergen. 

Jacques Vergn(~ mid Enunan:le, l Giguet. 1998. Re- 
gards thdoriqu(',s sur le "tagging". In proceed- 
ings of 7}'aitemc'nt Automcztiquc des Langucs Na- 
tu~vlh'.s (TALN I998), 1)aris. 
