An Object-Oriented Linguistic Engineering Environment using 
LFG (Lexical Functionnal Grammar) and CG (Conceptual 
Graphs) 
J~r6me Vapillon, Xavier Briffault, Gdrard Sabah, Karim Chibout 
Language and Cognition Group 
LIMSI- CNRS 
B.P. 133, 91403 Orsay Cedex, FRANCE 
vap/xavier/gs/chibout ©rl ims i. fr 
Abstract 
In order to help computational linguists, 
we have conceived and developed a lin- 
guistic software engineering environment, 
whose goal is to set up reusable and evo- 
lutive toolkits for natural language pro- 
cessing. This environment is based on a 
set of natural language processing compo- 
nents, at the morphologic, syntactic and 
semantic levels. These components are 
generic and evolutive, and can be used 
separately or with specific problem solv- 
ing units in global strategies built for man- 
machine communication (according to the 
general model developed in the Language 
and Cognition group: Caramel). All these 
tools are complemented with graphic in- 
terfaces, allowing users outside the field of 
Computer Science to use them very eas- 
ily. In this paper, we will present first the 
syntactic analysis, based on a chart parser 
that uses a LFG grammar for French, and 
the semantic analysis, based on conceptual 
graphs. Then we will show how these two 
analyses collaborate to produce semantic 
representations and sentences. Before con- 
cluding, we will show how these modules 
are used through a distributed architecture 
based on CORBA (distributed Smalltalk) 
implementing the CARAMEL multi-agent 
architecture. 
1 Introduction 
1.1 Generalities 
Natural language processing is nowadays strongly 
related to Cognitive Science, since linguistics, psy- 
chology and computer science have to collaborate 
to produce systems that are useful for man-machine 
communication. This collaboration has allowed for- 
malisms that are both theoretically well-founded and 
implementable to emerge. In this paradigm, we have 
conceived and developed a linguistic software engi- 
neering environment, whose goal is to set up reusable 
and evolutive toolkits for natural language process- 
ing (including collecting linguistic data, analysing 
them and producing useful data for computer pro- 
cesses). Based on a large number of graphical, very 
intuitive, interfaces, this environment has two main 
goals: 
* to provide tools usable by users outside the field 
of Computer Science (e.g., computational lin- 
guists) for them to be able to easily collect data 
and test their linguistic hypotheses 
* to allow computer scientists to exploit these 
data in computer programs 
Remark: in the text, some figures describe the 
structure of our tools; we have used Booch's conven- 
tions (Booch, 1994) about object oriented analysis 
and conception. They are summarized here: 
I name I class operations 
cent ains/uses 
1,2...N relation cardinality 
inherits 
Figure 1: symboles used in the figures 
1.2 Extensions to LFG formalism 
Four types of equations are defined in classical LFG 
(Bresnan and Kaplan, 1981): 
1. unifying structures (symholised by "-"), 
2. constrained unification of structures, only true 
if a feature is present in both structures, but 
may not be added (symbol "=c"), 
3. obligatory presence of a feature (symbol "~"), 
4. obligatory absence of a feature (symbol tilde). 
99 
We have defined three non-standard types of equa- 
tions used in our parser: 
1. obligatory difference between two values (sym- 
bol "#"), 
2. disjunction of obligatory differences (a sequence 
of obligatory differences separated by the sym- 
bol "1") (this can also be viewed as the negation 
of a conjonction of obligatory presences) 
3. prioritary union, copy into a F-Structure the 
attributes of the other that are not present in 
the first one, nor inconsistent with it. 
Among other existing systems (e.g., A. An- 
drew's system , Charon, The "Konstanz LFG 
Workbench" and Xerox "LFG Workbench"; see 
http://clwww.essex.ac.uk/LFG for more details on 
these systems), only the last one is a complete en- 
vironment for editing grammars and lexicons. Our 
system adds to this feature an open architecture and 
many interfaces that make it very easy to use. 
2 The LFG Environment 
2.1 Foundation: a LFG parser 
According to the principles of lexical functional 
grammars, the process of parsing a sentence is de- 
composed into the construction of a constituent 
parts structure (c-structure) upon which a func- 
tional structure is inserted. C- structure construc- 
tion is based on a chart parser, that allows the sys- 
tem to represent syntactic ambiguities (Kay, 1967), 
(Winograd, 1983). In order to be used within a LFG 
parser, a classical chart has to be complemented 
with a new concept: completed arcs (which repre- 
sent a whole syntactic structure) have to be differ- 
enciated between completed arcs linked with a cor- 
rect F-Structure, and those which are linked to an 
F- Structure that cannot be unified or that does not 
respect well-formedness principles. 
2.2 Visualising the Chart 
In the Chart interface, words are separated by 
nodes, numbered from 1 to numberOfWords + 1. 
Each arc is represented by a three segment polygon 
(larger arcs are above the narrower, for readibility 
reason). 
Active arcs are grey and positioned under the 
words. Completed arcs with uncorrect F-Structures 
are red and also placed under the words. Com- 
pleted arcs with correct F-Structures are blue and 
above the words. Lastly, completed arcs with F- 
Structures that don't respect well formedness prin- 
ciples are grey and above the words. The user can 
select the kind of arc he is interested in. By clicking 
on an arc with the left button, the arc and all its 
daughters become green, thus showing the syntac- 
tic hierarchy. By clicking with the middle button, a 
iii suaJ saU°n vl°.,~ . ®! 
PlIOPOSITiON:t~, 
GNSSCOMPL',I_t .... \] 
CONTEXTE :t 4 ' 
........ _GNSIMPLE:t2 
SN:I~/ 
ii I - G; ;SCOMPm__.__....~1:2 ~1): ~ ,,, GVCO!IUR;~ 7 P__L~_ :nilC N$$COMPL;s SV:te iii ,PI).O r,I :L-dl. Nk'd .... V :flit 'ONCT '..m I 
i 
me Z chien 3 mangeait 4 $ 
Figure 2: The Chart Interface 
menu appears within which one can choose to exam- 
ine the applied rule or the F-Structures (see below 
for the corresponding interface). 
2.3 Visttalising F-Structures 
As shown in Figures 3 and 4, F-Structures are repre- 
sented by attribute-value pairs (a value may itself be 
a F- Structure). In addition to such a graphical rep- 
resentation, a linear representation (more suitable 
for storing data on files or printing them) has been 
developed and it is possible to switch from one to 
the other. This allows us to keep track of previous 
results and to use them for testing the evolution of 
the system. 
2.4 Lexicon and lexicon management 
Since LFG is a "lexical" grammar, it is important 
to have powerful and easy to use lexicon manage- 
ment tools. To be as flexible as possible, we have 
choosen to use several lexica at the same time in the 
same analyser. The lexicon manager contains a list 
of lexica ordered by access priority. For each word 
analysed, the list is searched, and the first analysis 
encountered is returned. 
Two kinds of lexica are currently used; this kind 
of structuration is quite flexible: 
• if the user uses a big lexicon, but wants to re- 
define a few items for his own needs, he just 
has to define a new small lexicon containing the 
modified items, and to give it a high priority. 
• if the user has a big lexicon with a slow ac- 
cess, the access can be optimised by putting the 
mm 
u 
m 
n 
n 
m 
\[\] 
N 
m 
n 
mm 
m 
\[\] 
m 
\[\] 
100 
UlJlitaires 
PTypeProv = 
Pred = 
Neg = 
Temps = 
TypeAux = 
$uj = 
Mode = 
PType = 
Aft= 
AUX = 
Sujet - * 
Trans = direct 
Pronominal = - 
id : 2.262000 
I 
assert 
/'manger'< Suj \[ Suj Oh) >/ 
imparfait 
au;,C, voir 
Def = defini 
Numeral = ~ 
Pred = /'chien'/ 
Genre = masc 
Num = sing 
Pers = prs3 
Rel = ~ 
Article - + 
id ' 2261 ,¢,38 
indicatif 
assert 
÷ 
Figure 3: graphical representation of a F-Structure 
words frequently used in a direct access lexicon 
stored in memory. 
Our lexicon currently contains 7000 verbs, all the 
closed classes words (e.g., prepositions, articles, con- 
junctions), 12000 nouns and about 2500 adjectives. 
To mitigate the consequences of some lacks of this 
lexicon, a set of subcategorisation frames is indepen- 
dently associated with the lexicon (3000 frames). 
The user may also define a direct access lexicon, 
whose equations are written in a formalism close to 
the standard LFG formalism. Dedicated interfaces 
have been developped for editing these lexica, with 
syntactic and coherence checking. 
Example of an entry of a canonical form: 
@chien={ 'chien canonique' 
CAT-- N; 
T Pred = chien} 
Example of an entry of an inflected form: 
#chiennes={ "chien fern plur' "chiennes f-flechie" 
T Num = plur; 
T Genre =fem; 
@chien-~'chien canonique'} 
All these lexica conform to the specification de- 
fined by an abstract lexicon class. It is possible, and 
101 
UUlitalr~ 
PTypeProv = assert 
Pred =/'manger'< Suj I Sul Obj >/ 
Neg - - 
Temps = imparfait 
TypeAux = auxAvoir 
Mode = indicatif 
PType = assert 
Aft= + 
AU× = ~ 
$ujet = + 
Trans = direct 
Pronominal = - 
Suj = FS:\[ 
Def = deflni 
Numeral = ~ 
Pred =/'chien'/ 
Genre = masc 
Num = sing 
Pers = prs3 
Rel = ~ 
Article = ÷ 
} 
• l I¢ 
Figure 4: textual representation of a F-Structure 
very easy, to add new kinds of lexica, provided they 
conform to this specification. 
2.5 Tracking failure causes 
A specific feature ("Error ") allows the system to 
keep a value that makes explicit the reason why the 
unifying process has failed. Possible situations are 
listed below: 
1. Unifying failure. The values of a given feature 
are different between the two F-Structures to 
be unified. The generated F-Structure contains 
the feature Error , whose value is an associa- 
tion of the two uncompatible values. Example: 
Num = sing --+ plur. 
2. A feature present in an equation has non 
value in either of the two F-Structures to 
be unified. Example: with the equation 
'~ Suj Num = ~ Num" and two F-Structures 
without the Num feature, the generated F- 
Structure contains "Num -- nil-+ nil" . 
3. While making a constrained unification (e.g., 
J, Num =c sing ) a feature does not exist. We 
obtain: Num = sing --* nil. 
4. An obligatory feature is absent.Example: Num 
-- obligatoire. 
5. A forbidden feature is present. The forbid- 
den state for a feature is represented by adding 
the value "tilde" to the feature (e.g., Num -- 
"tilde"). Therefore, this is the same situation 
as the simple unification. A failure results from 
the case when a F-Structure contains this fea- 
ture. Example: Num=sing-+ "tilde". 
6. A feature has a forbidden w~lue. Example: 
Num= "tilde" sing. 
7. When a disjunction of constraints is the rea- 
son of the failure, the block itself is set as the 
value of the "Error" feature in the resulting F- 
Structure. 
These errors can be recovered through the interface 
(errors are highlighted in the representation), which 
allows the user to track them easily. Moreover, these 
well defined categories make it easy to find the real 
cause of the error and to correct the grammar and 
the lexicon. 
2.6 Structure of the rules 
Smalltalk80 specific features (mainly the notions of 
"image" and incremental compilation) have been 
heavily exploited in the definition of the internal 
structure of the grammar rules. Basically a rule is 
defined as the rewriting of a given constituent (left 
part of the rule), equations being linked to the right 
constituents. Each non terminal constituent of the 
grammar is then defined as a Smalltalk class, whose 
instance methods are the rules whose left part is this 
constituent (e.g., NP is a class, NP --* ProperNoun 
and NP --~ Det Adj* Noun are instance methods of 
this class). 
The Smalltalk compiler has been redefined on 
these classes so that it handles LFG syntax. There- 
fore, all the standard tools for editing, searching, 
replacing (Browsers) may be used in a very natural 
way. A specific interface may also be used to consult 
the rules and to define sets rules to be used in the 
parser. 
A great interest of such a configuration is to allow 
the user to define his own (sub-)set or rules by defin- 
ing sub-classes of a category when he wants to define 
different rules for this category (since a method with 
a given name cannot have two different definitions). 
On the use of the Envy/Manager source 
code manager to maintain the syntactic rules 
base. Envy/Manager is a source code manager for 
team programming in Smalltalk, proposed by OTI. 
It is based on a client-server architecture in which 
the source code is stored in a common database ac- 
cessible by all the developpers. Envy stores all the 
successive versions of classes and methods, and pro- 
vides tools for managing the history. Applications 
are defined as sets of classes, methods, and exten- 
sions of classes, that can be independently edited 
and versioned. Very fine grained ownership and ac- 
cess rights can be defined on the software compo- 
nents. The structuration of our syntactic rules base 
enables us to benefit directly of these functionali- 
ties, and hence to be able to manage versions, access 
rights, comparisons of versions (Figure 5)... on all 
our linguistic data. 
I Non termin~d Constituent I iMAGE 
I User CItes r I DellN~on(m e t 
I Application Versi(x~ ~ CI~a Version ~ Method Version 
ENV Y/MANA GER 
Figure 5: Structuring the set of rules 
Content of the rules. The current grammar 
contains about 250 rules that covers most of the 
classical syntactic structures of French simple sen- 
tences. They have been tested on data coming from 
the TSNLP european project. In addition to these 
simple sentences, difficult problems are also han- 
dled: clitics, complex determiners, completives, var- 
ious forms of questions, extraction and non limited 
dependancies, coordinations, comparatives. Some 
extensions are currently under development, includ- 
ing negation, support verbs, circonstant subordinate 
phrases and ellipses. 
3 Conceptual graphs 
Conceptual graphs (Sowa, 1984) form the basis of 
the semantic and encyclopedic representations used 
in our system. Conceptual graphs are bipartite 
graphs composed of concepts and relations. A con- 
ceptual graph database is generally composed of the 
following subparts: 
• a lattice of concepts and relation types 
• a set of canonical graphs, associated with con- 
cepts and relation types, used for example to 
express the selectionnal restrictions on the ar- 
guments of semantic relations. 
• a set of definitions, associated with concepts 
and relation types, used to define the meaning 
of concepts. 
• a set of schemas and prototypes. 
• a set of operations, such as join, contraction, 
expansion, projection... 
• a database containing the description of a situ- 
ation in terms of conceptual graphs. 
The framework we describe here aims at managing 
all this information in a coherent manner, and at 
facilitating the association with the linguistic pro- 
cesses described above. 
Graphs can be visualized, modified, saved, 
searched through different interfaces, using graph- 
ical or textual representations. Operations can be 
performed programmatically or using the interface 
shown in Figure 7. 
102 
Figure 6: Graphical representation of "a cheap horse 
is scarce" (with second order concepts) 
The lattice, and the different items of informa- 
tion associated with concepts and relations types, 
can be visualized, modified, searched and saved us- 
ing graphical or textual representations (Figure 10). 
An "individual referents inspector" allows to in- 
spect the cross-references between references, con- 
cepts and graphs. 
4 Analysing a sentence 
The processus of analysis from sentence to seman- 
tic representation can be separated into three sub- 
processes. After the sentence has been segmented, 
we obtain the lexical items in LFG-compliant form 
via the lexieal manager. After parsing, we obtain 
some edges with their respective F-Structures. (Del- 
monte, 1990) has developed a parser which uses basic 
entries with mixed morphological, functionnal and 
semantic informations. The rules use different level 
information. We propose to map the semantic struc- 
ture on the syntactic one in a manner that avoids 
too many interdependencies. We use a intermedi- 
ate structure (named "syntax-semantic table") that 
expresses the mapping between the value of a LFG 
Pred and a concept, as well as connected concepts 
and relations. Semantic data in the lexical knowl- 
edge base are defined by using conceptual graphs, 
as shown in the paragraph 4.1 below about some 
verb examples. Selectional restrictions defined with 
canonical graphs are then used to filter the graphs, 
when more than one is obtained at this level. 
4.1 Semantic verb classification in the 
lexical knowledge base 
The lexical knowledge base is based on a hierarchical 
representation of French verbs. We have developped 
a systematic and comprehensive representation of 
verbs in a hierarchical structure, data coming from 
the French dictionary "Robert". Our method relies 
on classification method proposed by (Talmy, 1985) 
and (Miller, Fellbaum and Gross, 1989), (Miller and 
Fellbaum, 1991). We chose a description with a 
structure composed of a basic action (the first of the 
most general uperclasses, e.g. stroll and run can 
be associated with walk as a basic action, andwalk, 
ride, pass point atmoving, which is a step further in 
generality) associated with thematic roles that spec- 
ify it (i.e., object, mean, manner, goal, and method). 
The basic actions are in turn defined with the same 
structure, based on a more general basic action. 
The hierarchy of verbs depends on the thematic 
relations associated with them. A verb V1 is the 
hyperonym (respectively a hyponym) of a verb V2 
(which is noted VI~-V2, respectively VI-<V2) if they 
share a common basic action and if, in the thematic 
relations structure associated with it, we have: 
* absence (for the hyperonym) or presence (for 
the hyponym) of a particular thematic relation: 
e.g. for the pair divide /cut ; to cut is to divide 
using a sharp instrument, thus divide ~- cut 
• presence of a generic value thematic relation 
vs. a specific value (example cut (object is 
generic:solid object ~- behead (object is ahead)) 
For every verb: 
• the semantic description pointed out is coded 
in the lexical knowledge base as a definitional 
graph. 
type cut (*x) is \[divide: *x\]- 
(obj) \[Object: (car) --* \[solid\] 
(method) --+ \[traverse\]--~ (Object: ~'y) 
(mean) -+ \[Instrument\]--* (car) --+ 
\[shar,\]. 
• a canonical graph makes explicit the selectional 
restrictions 
Canonical graph for cut is 
(Agent) -~ \[Animate\] 
(Obj) ---+ \[Object: *y\]--~ (car) ---+ \[solid\]. 
4.2 An example 
Below, we give an example for the sentence "Un av- 
ocat vole une pomme" (a lawyer steals an apple), 
where "avocaf' is ambiguous and refers to a lawyer 
or to an avocado. A semantic representation of 
this sentence is derived from its non-ambiguous F- 
Structure. 
The entries in the translation table (from LFG 
pred \[in French\] to conceptual graphs types \[in En- 
glish\]) are as follow: 
103 
Figure 7: Conceptual graph operation manager, showing the result of a join between two graphs, and the 
liste of available operations. 
'avocat' --. (Lawyer Avocado) 
'pomme' ---* (Apple). 
'voler(derober)' --* (Steal(Agent ~ I Suj; Ob- 
jet --* 1" Obj)), 
Explanations: the first item between quotes is 
the Pred value, followed by a list of types of con- 
cepts (or types of relations) and their mapping def- 
inition structure in the F-Structure. ~ represents 
the local F-Structure. T represents the F-Structure 
that contains the local F-Structure. For example, 
Agent --* ~ Suj means that a concept of Type "Steal" 
is connected to a concept that can be found in the F- 
Structure of the feature "Suj" in the local F- Struc- 
ture. From these data, the following graphs (Figure 
8) are obtained. 
The "Deft feature of the F-structure gives us in- 
formation about the referents of concepts. For ex- 
ample, the F- Structure for 'apple' contains "Def = 
indefini", which implies the use of a generic referent 
for the concept (corresponds to an apple, indicated 
by a star in Figure 8). Then, since canonical graphs 
express selectional restrictions, they are used to fil- 
ter the results through the join operation. For ex- 
2) Avocado'.* Agent SteaJ Object Apple:' 
Figure 8: Graphs from the sentence "Un avocat vole 
une pomme" 
ample, "Steal" needs an animated agent (Figure 9), 
therefore graphs with the "Avocado" concept can be 
removed from the selection. 
Figure 9: Canonical Graph for "Steal" 
These principles are the bases of the system cur- 
rently available, but we are working on improve- 
ments and extensions. We want to address the 
issue of adjunct processing, prepositional comple- 
IL 
104 
Figure 10: Lattice visualizer, showing (bottom right) the canonical graphs of "d~concerter" (to disconcert), 
the "graph origin" inspector (top right), and the menu of operations (bottom left) 
ments (with problem of second order concepts), etc. 
5 Concluding remarks: Sharing the 
tools on a network with CORBA 
The different tools described in this paper are cur- 
rently being extended to be CORBA-compatible. 
CORBA (Common Object Request Broker Archi- 
tecture) (Ben-Natan, 1995), has been defined by 
the OMG as an interoperability norm for heteroge- 
neous languages (Smalltalk, C++, JAVA) and plat- 
forms (UNIX, Macintosh, PC). CORBA defines a 
common interface definition language (IDL), as well 
as a set of services (naming service, security, con- 
currency management...). CORBA objects can be 
distributed worldwide (for example using Internet) 
using an ORB (Object Request Broker). Various 
tools implement this CORBA norm. We have used 
Distributed Smalltalk (Pare Place Digitalk) to real- 
ize the distributed implementation of an analyser. 
With this system, users can currently make an anal- 
ysis, see the results of this analysis, the F-structures, 
see the syntactic rules base... With this kind of 
architecture, systems necessiting a large amount of 
ressources can be distributed amongst workstations 
on a network and/or be used by clients having few 
ressources. Moreover these ressources can be phys- 
ically located in any place of a network, allowing 
thus to distribute the responsibility of their man- 
agement and maintenance to different persons. With 
the communication possibilities offered by Internet, 
it makes it possible to coordinate the cooperative 
efforts of several teams in the world around a sin- 
gle, coherent, though distributed system. We are 
continuing our work toward the implementation of a 
complete distributed multi-agent system, following 
the CARAMEL architecture (Sabah and Briffault, 
1993), (Sabah, 1995), (Sabah, 1997). 

References 
Ben-Natan, R. 1995, CORBA, a Guide to the Com- 
mon Object Request Brocker. McGraw-Hill. 
Booch G. 1994, Analyse et conception orientees 
objets , Addison-Wesley, Reading Mass. 
Bresnan Joan and Ronald Kaplan 1981, Lexical 
functional grammars ; a formal system for gram- 
matical representation, The mental representation 
of grammatical relations, MIT Press, Cambridge, 
Mass. 
Delmonte R. 1990, Semantic Parsing with LFG and 
Conceptual Representations, Computers and the 
Humanities, Kluwer Academic Publishers, 24 , p. 
461-488. 
Kaplan R.M. and J.T. Maxwell \].994, Grammar 
Writer's Workbench, Xerox Corporation, Version 
2.0. 
Kay Martin 1967, Experiments with a powerful 
parser, Proceedings 2nd COLIN(\], , p. 10. 
Kay Martin 1979, Functional grammars, Proceed- 
ings 5th. annual meeting of the Berkeley linguistic 
society, Berkeley, p. 142- 158. 
Miller A. G., C. Fellbaum and D. Gross 1989, 
WORDNET a Lexical Database Organised on 
Psycholinguistic Principles, Proceedings IJCAI, 
First International Lexical Acquisition Workshop, 
Detroit. 
Miller G. A. and C. Fellbaum 1991, Semantic net- 
works of English, Cognition, 41 , p. 197-229. 
Pitrat Jacques 1983, R~alisation d'un analyseur- 
g@n@rateur lexicographique g@n~ral, rapport de 
recherche GR22, Institut de programmation, Paris 
VI, 79/2. 
Sabah G@rard 1995, Natural Language Understand- 
ing and Consciousness, Proceedings AISB - work- 
shop on "Reaching for Mind", Sheffield. 
Sabah G~rard 1997, The fundamental role of 
pragmatics in Natural Language Understanding 
and its implications for modular, cognitively mo- 
tivated architectures, Studies in Computational 
Pragmatics: Abduction, Belief, and Context, Uni- 
versity College Press, to appear, London. 
Sabah G@rard and Xavier Briffault 1993, Caramel: 
a Step towards Reflexion in Natural Language Un- 
derstanding systems, Proceedings IEEE Interna- 
tional Conference on Tools with Artificial Intelli- 
gence, Boston, p. 258-265. 
Sowa John 1984, Conceptual structures: informa- 
tion processing in mind and machine , Addison 
Wesley, Reading Mass. 
Talmy L. 1985, Lexicalisation patterns: Semantic 
structure in lexical forms, Language typology and 
syntactic description, 3 , Cambridge University 
Press, New York, p. 57-149. 
Winograd Terry 1983, Language as a cognitive pro- 
cess, Volume I syntax, Addison Wesley, Reading 
Mass. 
