SOLVING AMBIGUITIES IN THE SEMANTIC 
REPRESENTATION OF TEXTS 
Marie-Claude Landau 
IBM France Paris Scientific Center 
3-5 Place Vend6me 75021 PARIS cedex 01 FRANCE 
Abstract 
One of the issues of Artificial Intelligence is 
the transfer of the knowledge conveyed by 
Natural Language into formalisms that a 
computer can interpret. In the Natural Lan- 
guage Processing department of the IBM 
France Paris Scientific Center, we are de- 
veloping and evaluating a system prototype 
whose purpose is to build a semantic rep- 
resentation of written French texts in a rig- 
orous formal model (the Conceptual Graph 
model, introduced by J.F Sowa \[10\]). 
The semantic representation of texts may 
then be used in various applications, such 
as intelligent information retrieval. The ac- 
curacy of the semantic representation is 
therefore crucial in order to obtain valid re- 
suits in any subsequent applications, in this 
article we explain how ambiguities related 
to Natural Language may be solved by se- 
mantic analysis using the Conceptual Graph 
model. 
Key words 
Natural Language Understanding, Computa- 
tional Linguistics, Conceptual Graph Model. 
almost completely solved by the syntac- 
tic analyzer. 
• Struclurat arnbiguities, a consequence 
ef the multiple possible attachrnents of 
the syntactic components in a sentence. 
This kind of ambiguity rnay be solved to 
a large extent by the semantic analyzer. 
• Anaphoric ambiguities, that could be 
solved in part by syntactic analysis 
within a sentence \[3\], but cannot be 
solved across different sentences I)e- 
('ause a syntactic analyzer processes 
each sentence independently. In our 
system, the resolution of anapheric am- 
biguilies is done uniquely by the se- 
mantic analyzer. 
'+ Ellipses, that could also be solved in part 
by syntactic analysis. But an incomplete 
synlactic analysis may in some cases be 
complemenled by the semantic analysis. 
• Semantic ambiguities coming frorn 
polysemous lemmas, that can only be 
solved at the sen\]antic level (unless a 
polysemy leads to different syntactic 
conslructions). 
It+ this article, we concentrale especially on 
the practical solving of the different kinds of 
ambiguities, showing that these problems 
are inter-related and may be solved by unF 
fled n/ethods. 
introduction 
In the system prototype we have been de- 
veloping, the analysis of a texl is carried oul 
in two sleps: first syntaclic and lhen seman- 
tic \[1\]. 
We assll~lle lhat lhe synlax of a lext conveys 
.~;orne meaning, but since our syntactic ana- 
lyzer does nol lake semantics into account, 
a Iol of ambiguilies remain: 
Lexical aml)iguities, corning from the 
fact tllat the sarne word may cer+respond 
te several lemmas in the syntactic dic- 
liotla~y. This kind of ambiguity can be 
The Conceptual Graph model 
The Conceplual Graph model is a very 
promising unified model, because it gener- 
alizes many ideas contained in preceding 
works on natural language sernantics, such 
as Fillmore \[7\], Schank \[9\], Montague \[5\], 
Wilks \[12\], and Karnp \[8\], for example. 
For the sake of clarity, we briefly recall here 
the Conceptual Graph model introduced by 
J.F. Sowa \[101\]. A Conceptual Graph is an 
orienled graph macle up of concept nodes 
related by conceptual relation edges. The 
cencepls are represented by boxes, the re- 
lations by circles. Example: 
1 239 
IGIRL:'Sue'_ G EAT B PPLE 
The concepts may have referents which 
specialize them. A referent can be a con- 
stant ('Sue') to denote individuals, a variable 
to denote cross-references, or more com- 
plex expressions. Most of the relations are 
binary relations (OBJ), some are unary. The 
concepts are organized in a concept type 
lattice with a partial ordering relation. The 
top concept type is ENTITY. Example: 
BNTITY 
ECONOMIC....~ NT I TY M~AHURI~_UNIT 
INTEI~EBT RATE CUI~RENCY TIMF~__U NI'\]\[ 
DOLLAR FRANC MDNTH 
Conceptual Graphs may be combined to- 
gether using various algorithms, the most 
important of which are the projection and the 
join algorithms. They are pattern matching 
algorithms which take the concept types hi- 
erarchy into account. 
The projection of one Conceptual Graph into 
another one is a restriction of the first graph 
to a sub-graph of the second one. The 
projection also gives the pending edges of 
the second Conceptual Graph in relation to 
the result. 
The join of two Conceptual Graphs forms a 
common overlap, while keeping the most 
specialized concept types in the result, and 
attaches to the common overlap the pending 
edges remaining in the two graphs. 
G:~= ~IRL;'Sue; G DRIVE FAST 
Result of the projection of G1 into G2: 
~Y: 'John ~ 6 DRIVE I\] CAR 
Result of the join of G1 and G3: 
hIRL:' 
The semantic analyzer: general 
method 
The semantic analyzer produces one or 
more Conceptual Graphs for each sentence, 
including cross-references within a sentence 
or between different sentences. 
Our semantic analyzer is written in the 
VM/Programming in Logic (VM/PROLOG) 
programming language \[_11\]. The semantic 
analyzer takes as input the annotated syn- 
tactic tree(s) resulting from the syntactic 
analysis. Applying compositionality rules, it 
links together the Conceptual Graphs corre- 
sponding to each word or locution of the 
sentence, according to the indications given 
by the syntactic tree(s). 
The Conceptual Graphs for each word or 
loc=dion are retrieved from a semantic lexi- 
con. The words of the Natural I_anguage 
rnay be coded in a semantic lexicon general 
to Natural Language and/or in a semantic 
lexicon specific to an application. In our 
project, we have concentrated on developing 
specialized semantic lexicons, in order to 
get fast results on texts dealing with a spe- 
cific subject (econornics, pharmacology). 
In cases of polysemy there may be several 
entries (hence several Conceptual Graphs) 
for one word in the semantic lexicon. If, 
however, a word is missing in the semantic 
lexicon, default options are taken. 
The directed join algorithm as a 
disambiguation tool 
The Conceptual Graphs for words are linked 
by an algorithm that we call the directed 
join. In fact, the directed join is a 
deterministic version of the join algorithm 
described by J.F. Sowa: we force such and 
such concept box in the first graph to be 
mapped onto such and such concept box in 
the second graph, by use of attachrnent point 
labels which lie inside the concept boxes. 
The join may then be propagated along the 
edges related to those initial concept boxes. 
Semantic constraints on the concept lypes, 
contained in the concept type lattice, make 
it possible to rule out invalid polysemous 
combinations, and in sorne cases to discard 
non-pertinent syntactic analyses. 
Ill addition, we have implemented a directed 
join management algorithm which allows 
the "best" possible solution to be chosen. 
Indeed, when two semantic structures must 
be linked together, all the conceptual 
choices (corresponding to the different en- 
tries for each word in the semantic lexicon) 
240 2 
are simultaneously taken into account by the 
directed join management algorilhm, which 
only keel)s tire solutions leading to a maxi- 
mum overlap between the two sets of Con- 
ceplual Graphs (according to the link 
constraints). 
For example, suppose we have the following 
coding for the verb "passer" ("to go from ... 
to") in lhe semantic lexicon: 
VER IB('passer', 1 ) is 
VERB('passer',2) is 
For the sentence "le dollar est pass6 de 6 
francs ~:l 5 francs", ("The dollar went down 
from 6 hancs to 5 francs") the directed join 
algorithm will enly give solution 2, automat- 
ically discarding solution 1. 
SOLUTION 1 is 
SOLUTION 2 is 
\[ 
Therefore, the final result is usually not the 
combinatorial product of all the entries of 
polysemous words in the semantic lexicon. 
We thus see thai the direcled join algorithm 
is a powerful tool which carl help 
disambiguate polyserny. It also helps fill in 
the gaps of incomplete syntactic information, 
as well as solve anaphors, as we shall ex- 
plain below. 
Processing of incomplete syntactic 
information 
We prefer to speak here of incomplete syn- 
taclic iniormation rather than of ellipses, in 
that the solving of true ellipses has not yet 
been clone in our system. 
In our system, the solving of incornplete 
syntactic information deals with missing 
subjects of complement clauses (infinitive 
verbs, verbal prepositional groups). The 
choice of the missing subject is made ac- 
cording to: 
o the preposilion introducing the comple- 
ment clause (if applicable), 
® the subject, object and dative of the 
main verb (i.e. the verb to which the 
complement clause is syntactically re- 
lated), 
• in some cases, the adverbial phrases of 
the complement clause. 
For this processing it is necessary to have a 
knowledge base about the warbs of the Na- 
tural Language, along with their possible 
prepositional syntactic constructions. This 
knowledge base is organized into classes of 
verbs for which similar syntactic con- 
structions lead to the same choice for the 
rnissing subject. Surprisingly enough, we 
have found that these classes also corre- 
spond in French to semantic classes (ne- 
cessity, motion, perception, accompaniment, 
intention, delegation of power, etc.). Our al- 
gorithm has been written for the French lan- 
guage and should be partially or totally 
rewritten for other Natural Languages. 
Here is an example of the kind of results we 
get: 
"Le directeur demande ~t son employ#; de 
faire r6..parer le terminal par le service 
d'entretien" ("The manager asks his em- 
ployee to have the terminal repaired by the 
maintenance people") 
Sometimes, the solution is not so straight- 
forward. For example, let us consider the 
sentences: 
"J' ai entendu jouer les enfants" ("1 heard the 
children playing") 
"J'ai entendu jouer la musique" ("1 heard the 
music playing") 
In one of these sentences (both in French 
and in English), lhe noun phrase following 
the infinitive is its subject, in lhe other il is 
3 241 
its object. Yet the structure of the sentences 
appears the same. Only by checking the 
semantic constraints with the directed join 
algorithm will the right interpretation be 
given. This is why, in our system, the proc- 
essing of incomplete syntactic information is 
done at the level of semantic analysis rather 
than at the level of syntactic analysis. 
Processing of anaphors 
In this paragraph we group together the 
solving of the following co-reference prob- 
lems, since the same resolution method is 
used: 
Personal pronouns ("he", "them", ...) 
Demonstrative pronouns ("this one", 
"those ones", ...) 
Demonstrative determiners ("this per- 
son", ...) 
Noun ellipses ("another one", "that of", 
,.,) 
Possessive pronouns ("theirs", ...) 
Possessive determiners ("her coat", ...) 
The solving of a co-reference problem con- 
sists in instantiating tile anaphoric element 
by assigning to it a concept type and possi- 
bly a referent which have already been used 
in the text. In some cases, it is also neces- 
sary to have a look-ahead procedure which 
scans the text forwards. 
Backward search algorithm 
In our system, the backward search is done 
by scanning a LIFO stack of concepts and 
referents. 
Before starting to build a Conceptual Graph 
for the sentence, all the nouns (proper or 
conln\]on nouns, not preceded by a 
demonstrative determiner) and anaphors are 
processed in the order in which they appear 
in the sentence. 
We assign to each of the nouns a new 
referent number (or new set of referents in 
the case of polysemy) and we store in a LIFO 
stack the sentence sequence number, tile 
lemma, the noun Conceptual Graph(s), its 
referent(s), its gender and number. This 
processing of nouns is done once and for all, 
several syntactic analyses giving rise to the 
same referent number for the same noun at 
the same place in the sentence. 
As for the anaphors, the stack is scanned 
LIFO and gender and number are checked. 
The result of this search is a set of possible 
solutions. In fact, the set of possible sol- 
utions for an anaphor may be viewed as an 
"extended polysemy". For reasons of prag- 
matism and performance, tile search is lim- 
ited to a definite number of sentences 
upward in the text. This number is 
parameterized and may be specified by the 
user. 
When the set of graphs corresponding to an 
anaphor is linked to its context (e.g. a pro- 
noun subject to a verb), the "best" solutions 
are chosen by the directed join management 
algorithm, as explained above in the exam- 
ple of polysemy ("to go from.., to..."). 
Then the solution corresponding to the most 
recent entry in the concept stack is selected, 
to avoid having too many solutions. This is 
done by way of a projection of the Concep- 
tual Graph contained in the stack into the 
result of the directed join. However this se- 
lection of the most recent solution may 
backtrack: this is useful if the set of graphs 
for the anaphor has to be linked several 
times. (This is the case for coordinated 
verbs with the same subject, or for infinitives 
with the same subject as the main verb, for 
example). In this case, thanks to the di- 
rected join management algorithm, the best 
solution of the whole process is chosen. 
Example: 
"Le pilote et le garcon sont arrives hier. II 
projette de piloter I" avion" ("The pilot and the 
boy came yesterday. He plans to pilot the 
plane") 
Suppose we have the following entries in the 
semantic lexicon: 
garc, on (boy) < PERSON in the lattice 
avion (plane) < VEHICLE in the lattice 
VERB('projeter',l) is (to plan) 
SUBS('pilote',l) is (pilot) 
. 
VERB('piloter',l) is (to pilot) 
The result for the first sentence is: 
The result for the second sentence is: 
Forward search algorithm 
If no solution has been found in the stack 
with the backward search algorithm, or if the 
242 4 
solutions round have led to a failure in the 
linkage to the context, then the forward 
search algorithm is activated. This is easy 
since we already have in the stack the infor- 
mation concerning all the nouns of the sen- 
tence. If the forward search also leads to a 
failure, our system simply prompts the user. 
If no answer is given (or if we are in balch 
mode), the system instantiates the anaphor 
to the most general concept in the lattice, 
which is ENTITY. 
However, it is not always sufficient to acti- 
vate the torward search algorithm only in 
cases of total failure of the backward search 
algorithm. In fact, some syntactic con- 
structions (corresponding to cataphoric re- 
lations) should autornatically start the 
forward search algorithm, even though there 
might be some solutions given by the back- 
ward search algorithm. Such cataphoric re- 
lations may correspond to set expressions 
that emphasize a word which appears later 
in tile sentence (at least, in French): "11 
marche bien, ce programme" (Literally, "It 
works well, this program" ). "11" ("it") refers 
to "progranlmo" ("program"). 
Miscellaneous problems related to 
~lnaphors 
tn the case of dernonstrative determiners, 
Ihe information corresponding 1o the concept 
type is already given bythe noun. But there 
may be set expressions for which the noun 
lollowing the demonstrative does not corre- 
spond exactly R) a previous word in the text. 
I~xarnple: "La hausse du dollar s'est 
intensifi6e bier ~ Paris. Cette 6volution a 
provoqu./; ..." ("The rise of the dollar sharp- 
ened yesterday in Paris. This change caused 
...") In this case, the search is the stack must 
not be nlade ac('erding to words: instead, a 
projection of the Conceptual Graph(s) of the 
noun ("change") must be made into 1he 
Conceptilal Graphs of the stack. 
For noun ellipses ("another one", "that of"), 
the thing to do is to search only for a concept 
type in the stack, and to assign a new 
leferent to it. For example, the sentence: 
"Le d6ficit de t988 est ~.quivatent ~ celui de 
1987" ("The deficit of 1988 is equivalent to 
that of 1987") gives the following solution: 
~..,,~_.I{D E F I C I TT$1 ME. :_ _~ 
In order to solve possessive pronouns 
("theirs"), concept types have to be follnd 
both for the possessed entity and for the 
owner, and the two have to be linked to- 
gether with an appropriate conceptual re- 
lation. 
Example: "Le garc, on a fait ses devoirs et la 
fille a fait les siens" ("The boy did his home- 
work and the girl did hers") 
A difficult problem is plural anaphors, since 
they may correspond te several entries in 
the stack (implicit coordination). 
Example: "L'homme est arriv6 avec la 
femme. IIs sont all6s d6jeuner" ("The man 
arrived with the woman. They went to 
lunch"). 
In this case, we either search for a non- 
syntactically coordinated plural antecedent, 
or for a set of antecedents which have a 
common ancestor in the lattice;, favoring el- 
ements which are already syntactically co- 
ordinated. This requires storing information 
concerning syntactic coordination of nouns 
in the stack. 
Further to the problem of plural anaphors, it 
may happen that an anaphoric element is 
quantified ("those three persons", "the three 
of them", etc.), tn suchacase, and wherever 
applicable, the referents must be posted up- 
wards until the target sum is reached. 
In addition, in order to prevent the gener- 
ation of absurd Conceptual Graphs, prag- 
matic rules based on syntax are applied. For 
the resolution of a given anaphor, this proc- 
essing mainly consists in forbidding the 
stack entries whose syntactic structures in 
the sentence are incompatible with the syn- 
tactic structure of the anaphor \[4\]. (For ex- 
ample, a possessive determiner cannol refer 
te the possessed entity). 
The semantic coherence checking 
algorithm 
We have seen that the directed join and di- 
rected join management algorithms are use- 
ful in solving polysemy, incomplete syntactic 
information and anaphors. But this is not 
sufficient, because these problems may be 
inter-related. For example, we may have co- 
ordinated verbs with the same subject, this 
subject being polysemous, or- even worse, a 
pronoun. We may also want to carry the 
5 243 
polysemous or pronoun subject of a main 
verb over to its infinitive complement. 
In such cases, we have to check that the 
same solution for the subject has been taken 
everywhere in the resulting Conceptual 
Graph. This is the purpose of the semantic 
coherence checking algorithm. First, it en- 
sures that different polysemous entries of 
one occurrence of a word in the sentence do 
not appear in the final result for the sen- 
tence. Secondly, it checks that the same 
solution for a pronoun has been selected 
throughout the processing. In cases of fail- 
ure, the backtrack is activated. The back- 
track on a pronoun is cut as soon as a 
satisfactory solution is found. This semantic 
coherence checking algorithnl uses lhe 
projection algorithm. 
Conclusion 
Our prototype is still under development, 
and we do not claim to have solved all the 
ambiguities which can be found in Natural 
Language. However the Conceptual Graph 
model, along with the appropriate algo- 
rithms, has proven to be useful for the re- 
solution of ambiguities wtlich occur most 
often in real texts. 
As far as the treatment of anaphors is con- 
cerned, we plan to extend it, as follows: 
• The search for a referent will be applied 
to every proper noun and to every com- 
mon noun preceded by a definite article, 
in order to introduce more cohesion in 
the representation of the text. ("Mr John 
Akers, manager of IBM ... Mr Akers ... 
John ... the manager"). 
• But, in order to avoid wrong interpreta- 
tions, the local context of a noun (i.e. its 
qualifiers) will then be stored in the 
stack of concepts and referents. This 
should also allow the solving of qualified 
noun ellipses ("the red one"), but the 
problem of the scope of a local context 
then arises. 
• The solving of anaphors referring to 
statements is theoretically feasible with 
the Conceptual Graph model, by the use 
of conceptual pointers between PROP- 
OSITIONS. 
• The resolution of anaphors within long 
quotations, which introduce a context 
change, should take the context change 
into account. 
Finally, sonle ambiguities may only be 
solved by the application of rules of common 
sense and/or deduction. A deductive com- 
ponent has been implemented in our system 
\[6\] \[2\]. This deductive component, applying 
appropriate production rules, should be in- 
voked either during the text processing, or 
as post-processing on the set of Conceptual 
Graphs for a text. 

References 

A. Berard-Dugourd, J.Fargues, M.C. 
Landau, Natural Language Analysis Using 
Conceptual Graphs, International Computer 
Science' 88, Hong-Kong, December 19-21, 
1988. 

A. Berard--Dugourd, J. Fargues, M.C. 
Landau, J.P. Rogala, Natural Language In- 
formation Retrieval from French Texts, 3rd 
Workshop on Conceptual Graphs sponsored 
by AAAI, St-Paul, Minnesota, August 27, 1988 

P. Bosch, Some Good Reasons for Shal- 
low Pronoun Processing, IBM Conference on 
Natural Language Processing, Thornwood, 
NY, October 24-26, 1988 

L. Danlos, G~n6ration automatique de 
textes en langues naturelles, pp 191-208, 
Masson, Paris, 1985 

D.R. Dowry, R.E. Wall and S. Peters, In- 
troduction to Montague Semantics, D. Reidel 
Publishing Company, Dordrecht (Holland), 
1981. 

J. Fargues & al., Conceptual Graphs for 
semantics and knowledge processing, IBM 
Journal of Research and Development, Vol 
30, No. 1, January 1986, pp 70-79. 

C.J. Fillmore, The Case for Case, Uni- 
versals in Linguistic Theory, E. Bach and R.T 
Harms Eds, Holt, Rinehart & Winston, New 
York, 1968, pp 1-88. 

H. Karnp, Events, Discourse Represent- 
ations and Temporal References, Langages 
64, pp 39-64, Larousse Publishing Company, 
Paris, France, December 1981. 

R.C. Schank Ed: Conceptual Information 
Processing, North-Holland, Amsterdam, 
1975. 

J.F. Sowa, Conceptual Structures. In- 
formation Processing in Mind and Machine, 
Addison Wesley Publishing Company, Read- 
ing, MA. 1984. 

VMIProgramming in Logic (VMIProlog), 
IBM PO 5785-ABH, available through IBM 
branch offices. 

Y.A. Wilks, Making preferences more 
active, Artificial Intelligence, Vol 11, 3, 1978, 
pp 197-224. 
