Structural Patterns vs. String Palterns for Extracting 
Semantic Information from Dictionari~ 
SIMONEqI~A MONTEMAGNI 
Dipartimento di Linguistiea - Uuivcrsilh di Piml 
Via Santa Maria 36 - 51600 Pisa Italy 
e-mail: GRAMMAR @ ICNUCEVM 
LUCY VANDERWENDE 
Microsoft Corp, Research Division 
Redmond, WA 98052 
e-mail: LUCYV @ MICROSOFT.COM 
1. Introduction 
As tile research on extracting semantic information from on- 
line dictionaries proceeds, most progress Iris been made in 
the area of extracting the genus terms. Two methods are 
being used -- pattern matching at the string level and at the 
structural analysis level -- t~th of which seem to yield 
equally promising results. 
Little theoretical work, however, is being doue to determine 
the set of possible differentiae to be identified, and therefore 
also the set of possible ,semantic relations that can be 
extracted from them. lit fact, Wilks remarks that as far as 
identifying the differenliae and organizing that information 
into a list of properties is concerned, "sucb demands are 
beyond the abilities of lhe best current extraction 
techuiqaes" (Wilks et al. 1989, p.227). However, the 
current stile of the art in computational linguistics demands 
that semantic information beyond genus terms be available 
now, on a large scale, to push forward the current theories, 
whetber that is knowledge-based parsing or parsing first 
with a syntactic component, followed by a semantic 
component. 
In this paper, we will focus on analyzing the definitions not 
for the genus terms, but for the semantic relations that can 
be extracted from the differentiae (Calzolari 1984). 
Although many have accepted the use of syntactic analyses 
for this purpose for some time now (for example Jeosen 
and Binot 1987, Klavans 1990, Ravin 1990, and 
Vanderwende 1990, all of which use the PLNLP F~lglish 
Parser to provide the structural information), many others 
still do not. We will demonstrate with examples why only 
patterns based on syntactic information (henceforth, 
structural patterns) provide reliable semantic relations for 
the differentiae. Patterns that match definition text at the 
string level (henceforth, striug patterns) are conceivable, but 
cannot capture the variations in the differentiae as easily as 
structural patterns. In addition, although it is possible to 
parse the definition texts using a grammar designed for one 
dictionary (e.g. a grammar of "Longmanese," see Alshawi 
1989), we have found that a general, broad-coverage 
grammar of English or of Italian provides a level of analysis 
that is as good as, and possibly superior to, a dictionary- 
specific grammar I. In addition, there is up extra effort 
required to apply a broad-coverage text parser to the 
definitions of more than one dictionary, as we found for the 
Longman Dictionary of Contemporary English (henceforth, 
LDOCE) and Webster's 7th New Collegiate Dictionary 
(henceforth, W7) for English, and for II Nuovo Dizionario 
Garzanti (henceforth, Garzanti) and Italian DMI Database 
(henceforth, DMI) for Italian. 
The result of analyzing the differentiae of the definitions is 
presented in the form of a semantic frame; there is one 
semantic frame for each word sense of the entry. The 
contents of the frame will be any number of semantic 
relatioas (including the genus term) with, as values, the 
word(s) extracted from the definition text. Except for a 
commitment to the theoretical notion that a word has 
distinguishable sense,s, the semantic frames are intended to 
be tbeory-independent. The semantic frames presented in 
this paper correspond to a description of the semantic 
frames produced by the lexicon-producer (Wilks, pp. cit., p. 
217-220) and so can be the input to a knowledge-based 
parser. Also, these semantic frames represent the 
appropriate level of semantic information that is needed by 
a semantic component that has the task of resolving the 
ambiguities remaining after a syntactic component has 
assigned an initial analysis (,see Jensen & Binot 1987, 
Vanderwende 1990). More generally, the result of this 
acquisition process is the construction of a Lexical 
Knowledge Base to be used as a component for any NLP 
system. 
2. Semantic Relations 
The semantic relations that are needed to provide a 
semantically-motivated analysis of the input text have not 
yet been enumerated by anyone. It is possible that this is 
due to tile absence of information on a Large scale that can 
be used to test any hypothesis of a necessary and sufficient 
set of semantic relations. Semantic relations associate a 
particular word sense with word(s) extracted automatically 
from the dictionary, and those words may be further 
specified by additional relations. The values of the semantic 
IFor example, the grammar for English was used, without 
modification, to parse over 4000 noun definitions. With a 
parser that forces an NP analysis, over 75% of these 
definitions parsed as full NPs. These are very good results, 
especially since many of the remaining 25% do not form 
complete NPs and so were parsed correctly. 
ACRES DE COLING-92, NANTEs, 23-28 ho~r 1992 5 4 6 PROC. OF COLING-92. NANTES, AUG. 23-28. 1992 
relatious therel0re bave nlore ocinteEt Iban binary featnres 
and are llot abstract semantic prilnitives, bill rather 
reliresenlations of the iinplicit links to other senlauti(: 
fralnes. 
An example of a semautic lelation than cau I~ ideutilied in 
the differentiae is LOCATION-OF. The deliuilion of 
'market' (LDOCE n.l) is expressed its follows: 
"a building, sqnare, lir open place wllere pcxlple meet to buy 
and soil g~lds, esp. tkxld, or sometimes animals." 
As we will show later, it is possible from the structural 
descripti(ul of this definition to extiact the followiug wilues 
for the semantic relation I,OCATION.OF: 
............................................................ 
MARKET 
LOCATION-OF MEET 
(ItAS-SUI1JFXSF 'PEOPLE') 
IIUY 
(IIAS-OBJECT 'GOODS.' 
't:17)O1).' 'ANIMALS') 
SEt ,L 
(H AS -OB JEL.~I ' 'GOODS,' 
'I~'OOD, ' 'ANIMALS') 
............................................................... 
Figure 1. Senuultie franle for 1be definitiou of the noun 
"niarke(." 
Accm>diug to this semantic fEaine, the vellis "meet," "buy," 
and "sell" :ire related as LOCATION-OF to the noun 
"market." AIIbough the words extracted from the 
definitions are not di~imbiguated themselves according to 
Iheir senses, as nnlcb iulbrmation as possible is iuEluded in 
the semantic fraine as the definition being analyzed 
provides, lu this example, the word "nicer" is further 
specified by a semantic rehition HAS-SUII.IFMT that has 
"people" as its value. Also, since the verbs "buy" and "sell" 
me conjoined, bxlth verbs have a HAS-OBJECT relation 
with all the syntactic objt~ts identified in Ibe analysis. 
namely "goods", "food" and "animals." 
Semantic infomlalion ~lf this type is necessary, fin" example, 
in order to automatically interpret noun conipomlds. Given 
the (partial) semantic frame above flit "marker' and given 
tbat "vegetable" lias a purpose relatiou to "food" 
(infornlation also automatically derived by applying 
structural patteras 1o the dcfinitiml text), tile uoun 
conllxiund "vegetable market" is iuterpreted autonlalically 
as: 
"Market is a location for the purlio~ of buying and/lir 
selliug vegetables." (see Vanderwende 1992) 
Examples of other semantic relations that were required to 
interpret noun compounds are: 
SUBJE(TF-OF, OBJECT-OF, FOOD, MATERIAl:. 
TIME, ttUMAN, IS-FOR. LOCATION-NOUN, MADE- 
OF, CAUSED-BY, CAUSES, MEASURE, and MEANS. 
3. Strnctaral Patterns 
Tile acqaisition of seluautic relalimts IiOltl online 
di/Dliouaries lU(xseeds by applying patterns to the sttut:tm~ll 
descriptions ot the det trillions und cxanlplc sentencl:s. The 
pattetus emb(~ly kuowledge of which relati(nm cetlaiu 
let'utfillg elelllelltS aud COllSllqlctioits convey ill tile tit.lille)el 
(if the dictioimry. Fro inslance, the tel;lieu PURPOSE is 
coaveyed in Italian by life phrases: "con lo/allo scope di," 
"al fine di," "p::r," "nsato per," "alto a," "the serve a," and 
"utile a" folltlwed by a noun phi;so itr au infinitival clause. 
In English, this same rebiti(ulsllip is conveyed by qnite 
siiuilar phrases, also followed by a nlnm llbr;ise, luesent 
pailicil)le, m itdiuitival clause: "l(ir (tile) liUrp~lse(s) o1," 
"lku," "used hit," "iilteuded for," au(I past palticiple 
followed by "to." 
Alter locating the llatLelu within the deiinitiou, the Ilue 
exlractioll process cousisls iu identifying tile values to bc 
ass(x3ialed with tile Seluautic tel;lion detected. Typically 
the values (if tile semantic relali(ius fire the It~lds of the 
pattern itselt tit" (if the complenleul(s) in let ms of slruolnlal 
patterns, iir the next conteilt word(s) in tetras (if string 
patterns. H(lwever, exlracting eveu hi(ire spccilie 
informatiou trum ttle differentiae, fiir example that lbe verb 
"nicer" has "t~tlple" as its subject when it is die 
I,OCATION-OF "lnatkol". also inv(llvcs the ideulilieatioll 
o1 fuuctillual atgnnleuts el verbs and ill the ease Il|' nouns, 
identilicaliou of adjectives aud "with" clmiplements. 
A simple ex;nilple of a sll-uchnal l)atteiu is llle liatielli lllal 
extracts Iho semantic tel;lion PURI~°JSE, fioill the itaKsell 
deliniti(m text. The pattettl can be palaplnasexl (in pall) as: 
if Iho verb "used" is Faist~ni(rclilied by a PP with the 
preposili()n "for," then e:,.tract file head(s) (if that I't' and 
return those ;is the vah\]e of the PURPOSE relatiou. If Ihe 
1't> has a verb ;.is ils he~<ld a\]id an OI\],JE(.+'i + attribute, llCiUlll 
tile llead(s) (if the Of IJECT as the values el a 11AS. OIIJECT 
relation; and if it has a SUBJECT attribute, return tile 
head(s) tit" the SUBJECT as the wdues of a ItAS--SUBJI ~(Ti" 
relation. 
Cunsidor the relevant sectiml of Ihe parsed (leliniti(ul (if 
'collar' (LI)OCE n, 1): 
(~,) @.o) (-,.,. ? 
ks + ~,tl~t~ good~; 
Figure 2. Parse tree for the delinition ~ll tile nonn "cellar." 
ACEI~S rOE COLING-92. NANTES, 23-28 AO~t' 1992 5 4 7 Pitt)t;. ot: C{3LING 92, NANTES. AIJ(i. 23 28, 1992 
The parse tree shown above 2 is but one representation of the 
structural description of this definition. Below is an excerpt 
of the record structure containing the functional information 
for tree node PPI above: 
.................................................... 
NODE 'PPI' 
PRMODS PREP 1 "for" 
HEAD VERB2 "storing" 
PSMODS NOUN2 "goods" 
PRP PREPI "for" 
OBJECT NOUN2 "goods" 
.................................................... 
Figure 3. Functional information for the prepositional 
phrase in Figure 2. 
Following tile structural pattern for PURPOSE, we see in 
Figure 2 that tile VERB 1, "used", is post-modified by a PP 
with the preposition "for" and so the base form of the PP 
head, VERB2 ("store"), 3 is extracted as the value of the 
PURPOSE relation associated with "cellar". In addition, an 
OBJECT has also been identified in the structural 
description, namely NOUN2, and so its head "goods" (ill 
this case, tile noun itself) is the value of the HAS-OBJECT 
of "store". The result of this pattern will be the partial 
semantic frame for "cellar": 
............................................................ 
CELLAR 
PURPOSE 'STORE' 
(HAS-OBJECT 'GOODS') 
............................................................ 
Figure 4. Partial semantic frame for "cellar," 
4. Inadequacy of String Patterns 
Some patterns to identify semantic relations are relatively 
trivial and can be handled by string patterns. For example, 
no matter where the string is found in the definition text, 
"for (the) purpose(s) of" as well as "con lo/allo scopo di" 
always indicates a PURPOSE relationship between the 
definiendum and the head of the phrase (noun or verb) 
following "of/di". Markowitz et al. also discuss patterns at 
the string level, based on defining formulae, which extract 
such features as stative or active for adjectives, or member- 
set relations for nouns. These are adequate because the 
patterns described are generally all found at or close to the 
beginning of tile definition text. But the most interesting 
patterns that identify the differentiae and tile (possibly 
embedded) semantic relations expressed therein rely on 
2The parse trees in this paper are altered representations 
isomorphic to actual machine output which IBM ASD has 
not allowed us to reproduce. Heads of constituents are 
directly below their parent node and the nodename is in 
bold. 
SPPs are analyzed with a preposition premodifier and a 
nominal as the head. 
complex stnzctural iuformation, information which cannot 
be expressed adequately in string patterns, 
The following addition makes the pattern for extracting the 
PURPOSE relation, paratthrased in the previous section, 
more complete: 
if tile PP with "for" is not a post-modifier of a verb "used", 
then a PURPOSE relation between the definiendum and the 
head(s) of the PP c,'m be hypothesized if the nearest noun 
that the PP post-modifies is the genus term.4 
Consider the syntactic analysis of the relevant portion of 
text in the definition of "laboratory" (W7 n,l) shown below 
in Figure 5. Since PP2 and PP4 are coordinated, tile 
structural relation to the rest of the analysis will be tested 
for tile conjoined constituent, PP1. The nearest noun phrase 
that PP1 post-modifies is NP1, the head of which, NOUN1, 
is indeed the genus term (also identified automatically by 
structucal patterns applying to this analysis.) Thus, part of 
tile semmltic frame for Sense 1 of "laboratory" will be: 
............................................................ 
LABORATORY 
PURPOSE 'STUDY,' 'TESTING,' 'ANALYSIS' 
............................................................ 
• (a•,o) 
Figure 5. Semantic frame for "lalx)ralory" and the parse 
from which it was derived. 
Now consider the syntactic analysis of the relevant portion 
of text in the definition of "council" (LDOCE n): "a group 
of people appointed or elected to make laws, rules, or 
decisions, for a town, church, etc., or to give advice": 
4Currently, for English, an abstract relation 1S-FOR ks 
extracted which will satisfy any searches for a PURPOSE 
relation. 
ACRES DE COLING-92, NANTES, 23-28 Ao~r 1992 5 4 8 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
io,~, da~ *re 
Figure 6. "for"-PP that does not create a PURPOSE 
relationship. 
Tile nearest noun phra~ that PPI post-modifies is NPI, 
which is a coordimlted construction. None of the heads of 
NPI, "laws", "rules" or "decisions" can he identified as the 
genus term, and so tile patteru does not succeed in 
extracting a PURPOSE relation from this definition. 
In order to write a string pattern that would correctly 
identify tile semantic relations above, the pattern would 
have to identify conjoined heads and apply some measure of 
distance from the genus while counting conjoined phrases 
as single units. In addition, string patterns would also have 
to skip parentheses, identify functional arguments, and 
abstract from the surface realizations of the pattern, e.g. 
pre- and post-modification (similar observations are made 
in Ktavans 1990). Even if the language of dictionary 
definitions is characterized by its regularity, variations of 
the defining formulae exist. These restrictions seem to be 
far too complex at tile string level, while writing the patteru 
at the level of syntactic analysis describes the dependency 
in an iutuitive manner, namely in terms of heads and 
modifiers. 
The inadequacy of string patterns is not only evident when 
extracting the semantic relations directly related with the 
definiendum, but also when extracting those relations that 
show further specifications. In particular, the HAS- 
SUBJECT and HAS-OBJECT relations cannot possibly be 
extracted reliahly without structural information. Wider 
syntactic context is also required to correctly extract the 
semantic features such as COLOR, SHAPE, TASTE, and 
SMELL not only as features of tile definiendum, but also as 
further specifications of the words extracted as the vahms of 
.semantic relations. 
The structural pattern that extracts semantic features such as 
COLOR and TASTE would seem to be trivial: modifying 
adjectives or nouns that express these properties. The 
attachment of these modifiers, however, can be established 
only ou the basis of syntactic information (and sometimes 
syntax is not enough). And only those modifiers should be 
extracted that relate to the definiendum or those that relate 
to some other word within the definition which stands in 
some semantic relation (for instance HAS-PART, MADE- 
O1:. and so forth) with the definienduul. In tile tatter case 
tbe informatiml extracted still has an indirect link with the 
iemma I~eing defined, but it is not expected to be interpreted 
as a semantic feature of lhe dcfiniendum itself. 
Consider these examples from tile Garzanti dictionary 
(followed hy their English glosses): 
acagiil: "alhcro tropicale dai ti'utti saporiti." 
(mahogany tree: tropical tree with tasty fruits) 
alchechengi: "pianta erbaceal con bacche di color arancio 
racchiuse in uu involucre membranaceo, commestibile." 
(winter cherry: herbaceous plant with orange berries, 
contained in a membranaceous coveting, edihle). 
The TASTE aml tile COl,OR features should not be 
extracted as seamntic features of the definicndum. In the 
case of "acagifi," this is clear due to the lack of agreement 
between "albero" (tree) and "sapotiti" (tasty): tile adjective 
cannot modify the head noun/genus term Ixx:ause they do 
not agree in nmnber. "Saporito", however, is the value of 
the semantic feature TASTE of "frutto" (fruit), which is in 
turn the value of the HAS-PART relation of the 
definiendum, also extracted by means of a structural pattern 
front the dcfinition text. The semantic frame for "acagiil'" 
is showu in Figure 7: 
ACAGIU 
IS-A ALBERO 
tIAS-PART H~UTTO 
(TAS'I\] ~. SAPOR1TO) 
............................................................ 
Figure 7. Semantic frame for tile definition of "ac,agifi." 
In the case of "alchechengi," the PP "di color arnncio" ("of 
orange color") does not contribute a COLOR feature to the 
dcfiniendmn since it cannot mt~lify tile head/genus "pianta" 
("plant") given its embedded position within tile syutactic 
structure: 
Figure 8, Parse tree for the definition of "alchechengi." 
AcrEs DE COLING-92, NANTES, 23-28 .~O1~'1' 1992 5 4 9 Plmc. OF COLING-92. NANr~;s, AUt~. 23-28, 1992 
If we consider the shaictural description of the definition for 
"alchechengi," we can see clearly that the embedding of 
PP2 within the syntactic structure, followed by another 
modifier of "bacche," AJP1, makes it impossible for PP2, 
"di color arancio," to modify Ihe head noun "pianta", and so 
the semantic fczlture COLOR "anancio" is extracted for 
"bacche", which is in a PART-OF relation with the 
definiendum. 
Syntactic information is not always sufficient for resolving 
the correct assigmnenl of semanfic fealures. Consider the 
DMI definition for "agnolotto" (a kind of ravioli): 
agnolottn: "involucro di pasta all'uovo rolondo o 
rettaugolarc." 
(ravinli: ronnd or rectangular covering of egg pastry) 
The attacimlent of the adjectival phrase "rotondo o 
rettangolare" is ambiguous and cmrnot be determined on the 
hasis of syntactic information, but only I)ased ou semantic 
information; the correct analysis would read a "round or 
rectangular covering" and not "a round or rectangular egg." 
Despite this syntactic ambiguity, the range in ambiguity for 
extracting semantic relatimrs and tcatures is quite reduced if 
we start from syntactic structures instead of from simple 
strings. 
5. Why a general text parser is sufficient 
There are two rea~ns wily a general text parser is essential 
fo~ providing the syntactic analyses. First, of the four 
dictionaries that have been explored in this research, 
Garzanti and DMI (for Italian) and LDOCE and W7 (for 
Englisi0, only LDOCE attempts to nse a restricted 
vocabulary in the definition texts. Therefore, Ihe scope of 
the vocabulary is the same as unrestricted text. Moreover, 
the language used in dictionaries cannot appropriately be 
called a specialized language given that it does not operate 
in a specialized domain. Second, at tire syntactic level, the 
variety of couslruclions can be compared to thai of textual 
corpora. The regularity of the language used within 
dictionary definitions, lexically and syntactically 
constnained, lies in the flequent occurrence of lexieal and 
syulactic patterns to express particular conceptual 
categories or semantic relations. This regularity, which is 
crucial with respect to the extraction of semantic 
information, can be considered almost irrelevant from tire 
point of view of persing I)ecause of the variety of lexical 
choices and phrasal constnlctions used to express tile 
patterns. A parser, therefore, is faced with the ,,anne range 
of p\['oblems in arralyzing ordinary texts as in dictionary 
dcfinitions and so the use of a gencral ptapose grammar is a 
lundmnental choice in the definition of our research 
framework. 
One of the main disadvantages ascril)ed to using general 
tent parsers is the mobiguity still remaining at the end of the 
synt:~cfic analysis. It has oficn been observed that 
de~riptions associated with syntactically ambiguous 
coustfuctions ill fi'ee text can bc di~ambiguated in the 
context of dictionary definitions. For example, within our 
system the default strategy in free text is to attach a 
prepositional phrase to tile nearest available head and to 
keep track of the alternative possible attachment sites. In 
the context of dictionary definitions, the choice resulting 
from such a default strategy carl often be overridden on the 
basis of lexical and/or syntactic conditions which 
disamhiguale tile potential ambiguity; for instance, with 
regard to the PP attachment case, there is a class of genus 
terms (such as "atlo," act, "effetto." effect; "processo," 
process) that, together with given structural conditions, 
make the attachment decision l)ossihle. 
Also, while functional assignment may be ambiguous in 
Italian in some cases (Chauod et al. 1991). we can assume 
that constructions used within dictiomuy definitions and 
example senlcnces are always unmarked, and consequently 
that the ambiguity derived from taking into account also 
marked orders of sentence constituents (such as 
Subject-Object-Verb. Object-Verb-Subject and so forth) is 
very unlikely to occur in the dictionary text. 
Rather than taking these observations as justification for 
building a dictionary specific parser, we use first a broad 
coverage parser, followed by a post-processor which tailors 
the output of the parser based oil the differences observed 
between dictionary text and general text. As it turns out, the 
size of file post-processor is negligible compared to the size 
of the grammar. This supl)orts our claim that the variety of 
syntactic constructions in dictionary text is comparable to 
that of textmd eorp~)ra. If dictionary text were substantially 
different from general text, we would have had to write 
more rtdes in the posl-processor and it would have to be 
bigger than it in fact is. Tile structural patterns for the 
extraction of semantic information naturally operate on the 
result of the post-processor (see Montemagni 1992). 
Twn kinds of refinements have been devised in order to 
achieve more appropriate results with respect to the 
I~mguage used within diclionaries: 
(1) rule out ambiguity in the attachment of modifiers or in 
the assignment of functional roles which is not applicable in 
the context of dictionary definitions; 
(2) handle parses that are incomplete due to either 
dictionary specific constructions not occurring in free texts, 
or, more generally, to gaps in tile lexical or grammatical 
knowledge of the system. 
While the first refinement operates on a complete analysis 
but aims to reduce the high degree of ambiguity typical of 
free text by exploiting pcculiarities of dictionary language. 
the second refinement concerns thc robustness of the system 
in the abscnce of a complete parse. 
For an example of refining the parse in order to reduce the 
ambiguity, consider the Garzanti definition (n,l) of 
"comput:~ione" (computation) defined ,as "alto, effelto del 
computare" (the act or result of computing). The first 
Acrl!s DI~ CO\[JNG-92. NANTES, 23-28 AO0r 1992 5 5 0 PROC. OF COLING-92, NAICrES, AUG. 23-28, 1992 
stnlctural description below shows the NP parse for geueml 
text. This default analysis shows PP1 "del computare" in be 
attached to the closest availahle head, NOUN2 "effetto", 
while the alternative attachment site is malked with a 
question nmrk. The second parse below shows the 
resolution of the PP attachment ambiguity; PPI now 
modifies tile coordinated nouns NOUNI ~md NOUN2. 
It~ont 
" ,,? 
&fl~" -- -- 
Figure 9. Resolution of PP attachment ambiguity. 
This refinement is made when a prepositional phrase or an 
infinitival clause post-modifies e(~)rdinated bead nouns that 
are the top nodes of the syntactic analysis. This is the 
typica| paltenl of the definitions of deverbal nouns; the PP 
indicates which verb the definiendum is derived from. The 
lexical and synlactic couditions which make tim 
disamhigualion possible ;tre defined in the l~)sVprocessor to 
the general text analysis. 
Tbe solution to a robust phrasal analysis while parsing 
dictionary text with a general grammar cau he secn and 
faced from two different perspectives. The first perspective 
is dictionary specific and de,'ds with incomplete pauses due 
to input which would be considered ungrammatical outsidc 
of the context of dictionary definitions. The second 
perspective copes with incomplete knowledge of language 
use by exploiting the general technique of fitted parsing 
provided by the system fi)r handling ill-formed inpnt 
(Jensen et al. 19831. 
Dictionary definitions are quite often fl)rnmlated as 
condensed fragments of real texts, with elidcd elements 
which make the definition syntactically ill-formed and 
interpretable only by reference to a wider context. This is 
tim case with noun definitions consisting of a noun phrase 
pre-modified by a prepositional phrxse, where the latter 
specifies the usage domain of the word sense expressed by 
the former. The general grammar is unable to produce an 
NP node covering the whole input string given that the 
sequence PP-NP does not freely occur within ordinary texts. 
It is the refinement stage that should reshape the analysis 
and restore it ,as regular input on the basis of Slrecialized 
diclionat y use. The aualysis below of the Garzxudi 
definition 111,1,2) "nettare" (nectar), defined as "nella 
nlitologia classica, la bewm&t degli dei" (in classical 
mythoh)gy, the drink of gods) exenrplifies this kind of 
le fiuenlen\[. 
t, ofo,o ffx,o~) 
J .~'<-~- -. 
(.,,~,)(,,u,o;_ _ :9 (0n~@.__ @o,~.oo.~ 
cla.sstca ",~ tloiJli d4Ji 
A.~ C-,;~? ) 
Figtue 10. ReI\]nentenl of fitted parse into NP. 
The filst of tile two llarses alnJve has I~..cn generated by the 
general Erauunar; the XXXX1 label at the top node shows 
that the i)alsc is incomple|e, q'he second one has been 
rebuilt (hit ing the refincnlem stage: the XXXX1 has IKven 
replaced hy the l)fol~r lahcl NPI. In this case, knowledge 
ol dicti(nlary peculiarities resolves the initial partial parse 
and converts it into a complete and succc.ssful analysis. 
Not all iucolnplete parses can l)~ so easily restructured. 
Some are due to gaps in the system with respect to lexical 
as well as phlase constrnction knowledge. Those cases are 
handled hy lacilifies in tbe fitting pa~cedure, provided by 
the system to cope with umeslricted input. Wilen the 
g~tllnUar is unable to produce at coulplete ~malysis, then a 
reasonable atlPt+oximate but incomplete structure is assigned 
to the input. 
Such a lOU~,.h parse Call still be used as input Ibr fltrther 
i)rocessi,;,~ ~;t~tges and h)i' the extraction procedure itself. By 
allowing stn~ctural l)atterus in apply to incomplete parses as 
well, the auto,italic extraction (If semantic iuh)rmatiou is 
not threatened. There is, however, a difference in the 
extraction pa)cednre applied to complete (i.e. computed by 
tbe geuel~d grammar or restored during the posl-processing 
stage) and incomplete analyses. While slructm-al patlenis 
are used 1o extract semantic infonlmtion frmn definitions 
and example ~ntences successfully parsed, partial 
structural patterns and string palterm; are combiueu.l when 
handling incomplete pa~ses. By differentiating the 
ACI'I!S DE COI~/qG-92, NAIVrEs, 23-28 Anus 1992 5 5 i lqt¢)¢:. ()l: COLIN(;-92, NAN-rEs, Au(t. 23-28, 1992 
extraction procedure for the two kinds of results, the 
procedure becomes robust and overcomes the variability of 
parsing performauces. 
Finally, a brief account of the parsing performance of the 
Italian grammar for a corpus of 1000 definitions. The 
general Italian grammar provided complete parses for about 
65-70% oflhese definitions. An improvement comprising 
10-15% of the total was achieved during the refinement 
stage. For the unresolved incomplete parses, approximately 
15-20%, a different extraction procedure, based on a 
combination of partial structural patterns and string patterns 
as described above, has been hypolhesized. Even if this 
procedure is at an early development stage, it is possible to 
evaluate the first results. Because of the robust strategy, the 
extraction procedure can be applied to lhe entire corpus of 
definitions, without the worry Ihat incomplete parses would 
affect the extraction of semantic information. Some 
information is extracted in any ease; in the worst case the 
information is not very deep or detailed (at least the genus 
term is extracted). The results can be differentiated by 
degree of detail, but the extraction procedure never fails to 
produce some results. 
6. Conclusion 
Viewed ideally, the choice between structural patterns and 
string patterns is obviously in favor of structural patterns, 
because they are more suitable for achieving accuracy in the 
extraction of semantic information from dictionaries. 
Controversy rises only when considering the reliability of 
parsing the dictionary definitions themselves. In this paper, 
we show the feasibility of applying structural patterns to 
parsed definitions in order to extract semantic information 
from dictionaries, with the goal of deriving and making 
explicit the basic general knowledge implicitly stored in any 
standard printed dictionary. Structural patterns, much more 
than string patterns, provide the rich semantic information 
that makes the lexicon a relational network expanding in n- 
dimensions. Not only semantic features or relations directly 
related to the definiendum are extracted, but also further 
specifications of the words extracted as values of semantic 
features or relations. 
We have also described a robust procedure for extracting 
this semantic information. The syntactic analysis of the 
definition text provided by a general text parser is evaluated 
automatically and, if necessary, a post-process applies 1o 
refine the parse given the context of a dictionary, The 
results of the structural patterns are differentiated according 
to the success of the parse. In this way, the use of a 
grammar improves the quality and the reliability of the 
semantic information extracted. 
Note 
For the specific concerns of the Italian Academy, 
Vanderwende is responsible for sections 1-3 and the English 
part of section 4, and Montemagni is responsible for the 
Italian part of section 4 and sections 5-6. 

References 

Alshawi, H. 1989. "Analysing the Dictionary Definitions" in 
Bnguraev & Briscoc, eds., Computational Lexicography for 
NaturalLanguage Processing, Longman, London, pp. 153-170. 

Boguraev, B., and T. Briscoe, eds. 1989. Computational 
Lexicography for Natural Language Processing, Longman, 
London. 

Calzolari, N. 1984. "Detecting Patterns in a Lexical Data 
Base" in Proceedings of the lOth International Conference 
on ComputationalLinguistics, Stanford, CA, pp.170-173. 

Chanod, J.P., B. Harriehausen and S. Montemagni. 1991. 
"A two-stage algorithm to parse multi-lingual argument 
structures" in Proceedings of the International Conference 
on Current Issues in Computational Linguistics, University 
Sains Malaysia, Penang, June 21-27 1991. 

Jensea, K., and J.-L. Binol. 1987. "Disambiguating 
Prepositional Phrase Atlnehments by Using On-Line 
Dictionary Definitions" m Computational Linguistics 13.3- 
4.251-60. 

Jensen, K., G.E. Heidorn, L.A. Miller, and Y. Ravin. 1983. 
"Parse Fitting and Prose Fixing" in American Journal of 
Computational Linguistics 9.3-4.123 -36. 

Klavans, J., M. Chodorow, and N. Wacholder. 1990. "From 
Dictionary to Knowledge Base via Taxonomy" in 
Electronic Text Research, University of Waterloo, Centre 
for the New OED and Text Research, Waterloo, Canada. 

Markowitz, J., T. Ahlswede and M. Evens. 1986. 
"Semantically Significant Patterns in Dictionary 
Definitions" in Proceedings of the 24th Annual Meeting of 
the Association for Computational Linguistics, June 10-13 
1986, pp. 112-119. 

Monlemagni, S. 1992. "Tailoring a broad coverage 
grammar for the analysis of dictionary definitions" to 
appear in Proceedings of EURALEX-92, August 4-9, 
Tampere. 

Ravin, Y. 1990. "Disambiguating and Interpreting Verb 
Definitions" in Proceedings of the 28th Annual ACL 
Conference, June 6-9. Pittsburgh, pp. 260-267. 

Vanderwende, L. 1990. "Using an On-line Dictionary to 
Disambiguate Verbal Phrase Attachment" in Proceedings of 
the 2nd IBM Conference on NLP, March 13-15, La Defense, 
Paris. 

Vanderwende, L. 1992. "Understanding Noun Compounds 
Using Semantic Information Extracted from On-Line 
Dictionaries." Dissertation in preparation, Georgetown 
University, Washington DC. 

Wilks, Y., D. Fass, C. Guo, J. McDonald, T. Plate, and B. 
Slator. 1989. "A Tractable Machine Dictionary as a 
Resource for ComputationM Semantics" in Bognraev & 
Briscoc, eds., Computational Lexieography for Natural 
Lzmguage Processing, Longman, London, pp. 193-228.. 
