Conceptual Language Models for Dialog systems 
Renato De Mori 
LIA CNRS  BP 1228  
84911 Avignon Cedex 9 - France 
renato.demori, @lia.univ-avignon.fr 
Frederic Béchet 
LIA CNRS  BP 1228  
84911 Avignon Cedex 9 - France  
frederic.bechet, @lia.univ-avignon.fr 
 
1 
2 
Introduction 
The purpose of computer speech understanding is to 
find conceptual representations from signs coded into 
the speech signal. 
 
Contrary to speech interpretation by humans in which 
the same discourse may be interpreted differently by 
different subjects, for practical applications of computer 
understanding the result of interpretation should be 
unique for a given signal. Usually it is represented by an 
object which is an instance of class corresponding to a 
semantic structure which can be fairly complex even if 
it is built with instances of conceptual constituents be-
longing to a small set of major ontological categories.  
 
The mapping process that leads to a semantic interpreta-
tion can be derived manually because human interpreta-
tion of sentences can be completely explained with a 
logical formalism or it can be inferred by machine 
learning algorithms in order to ensure a large coverage 
of possible sentence patterns. Theories and practical 
implementations of these approaches are proposed in 
[1],[2]. 
 
 Limitations of coverage in the manual approach and in 
precision of machine learning can be reduced by making 
manually a detailed analysis of a limited number of ex-
amples and generalizing each analysis with automatic 
methods. In particular, a well structured lexicon can be 
very useful, in which the meaning of words is repre-
sented together with suggestions of possible syntactic 
and conceptual structures.  
 
Word associations found with networks of word rela-
tions [3] can also be useful for suggesting compositions 
of semantic constituents into conceptual structures. 
Thus, given an observed example, other examples can 
be manually derived and generalized automatically. 
 
Computer understanding of a spoken sentences is prob-
lem solving activity whose central engine is a search 
process involving various types of models.  
 
Searching for concepts can be combined with searching 
for words. This suggests that statistical language models 
(LMs) could be adapted based on expectations of con-
cepts predicted by a system belief. With this perspec-
tive, it is important to notice that, while the observation 
of only certain words may be sufficient for hypothesiz-
ing a conceptual structure, complete details of word 
phrases expressing a conceptual structure have to be 
known in order to adapt a generic LM to the expectation 
of such a structure.  
 
This paper introduces a search method and a learning 
paradigm based on the just introduced considerations.  
 
The search engine built with this method finds the best 
common path between the system knowledge repre-
sented by the composition of Stochastic Finite State 
Transducers (SFST) and a Stochastic Finite State 
Automaton (SFSA) representing the lattice of word hy-
potheses generated by an Automatic Speech Recogni-
tion System (ASR).  
Hypothesis evaluation and search 
Let a dialogue system have a belief which generates 
expectations B about conceptual structures.  
 
Expectation uncertainty is represented by a probability 
distribution P(B) which is non-zero for a set of concep-
tual structures expected at a given time. Thus for a gen-
eral concept structure Γ and a description Y of the 
speech signal, one gets:   
 
)B,Y,(Pmax)B,Y,(P)Y,(P
)Y,(Pmaxarg)Y|(Pmaxarg*
B
B
Γ≈Γ∑=Γ
Γ=Γ=Γ
ΓΓ
 
 
{})B,,W(P)W|Y(Pmaxarg*
)B,,W(P)W|Y(Pmax
)B,Y,,W(P)B,Y,(P
B,W,
W
W
Γ≈Γ
Γ
≈Γ∑=Γ
Γ
  (1) 
 
)B(P)B|W(P)BW|(P)B,W,(P Γ=Γ  
 
A general concept structure Γ can be represented as a 
string of parenthesized terminals and non-terminals.  
 
These expressions can be decomposed into chunks. A 
sentence may contain only one or more chunks of an 
incomplete structure, Thus, a system should be able to 
generate interpretation hypotheses about parts of a con-
ceptual structure. In this case, symbol Γ makes refer-
ence only to a set of components.  
 
Probability P(Γ|BW) can be simply set equal to 0 for a 
conceptual structure which cannot be inferred from W. 
If the conceptual structure is part of the expectations of 
system beliefs and can be inferred unambiguously from 
W, then P(Γ|BW) as in many practical applications in-
cluding the one considered in this paper, then P(Γ|BW). 
Otherwise, let 
[
 be the sequence of con-
cept symbols corresponding to the preterminal symbols 
in Γ. Probability P(Γ|BW) can be expressed as follows: 
]c....c...c
1 Γγ
 
{}∏
==Γ
Γ
=γ
−γγ
Γγ
2
111
1
BW]c...c[|cP)BW|c(P
)BW|]c....c...c([P)BW|(P
 (2) 
 
At least, for some values of γ the probability 
{ }BW]c...c[|cP
11 −γγ
 is one for a class of applica-
tions. 
 
Let Φ be the set of conceptual components, chunks of 
them or conceptual structures known to the system. Ex-
pectations derived from the system belief can be 
grouped into a set B1. Let B2 the complement of B1 
w.r.t. Φ and F be a filler structure representing all the 
conceptual structures not in the application or just ig-
nored by ignorance of the system knowledge. B1, B2 
and F are the possible values for B in the (1) and their 
probabilities P(B) can be established subjectively or by 
evaluating counts for user responses consistent with the 
belief, consistent with the application but not with the 
belief and inconsistent with the application knowledge. 
 
Probability P(W|B) is that of an LM which is adapted to 
the system belief. It can be obtained with an LM built in 
the following way.  
 
Each conceptual structure or part of it Γ is represented 
by a finite-state network N(Γ). 
 
All the networks corresponding to structures in B1 are 
connected in parallel in a single structure with associ-
ated a probability P(B1). A similar structure is built for 
the automata corresponding to structures in B2. A filler 
F is also considered containing a network derived by a 
trigram LM. A network N(Γ) is obtained by the con-
catenation of finite-state automata C(Γ) inferred with 
the procedure described in the next section representing 
chunks of knowledge with fillers F. These automata 
output components of conceptual structures. 
 
A search is performed by finding the most likely com-
mon path in the network and in the automaton derived 
from a lattice of word hypotheses generated by the 
speech recognizer with the generic trigram LM. System 
belief make vary the topology of the network by dy-
namically changing the composition of  sets B1 and B2. 
Network recompilation can be avoided by just putting 
all the N(Γ) in parallel and dynamically assigning each 
network of B1 a probability : 
 
1B
)1B(P
)](N[P =Γ
   (3) 
where 
1B
 indicates the number of elements in B1.  
Probabilities of networks in B2 are assigned in a similar 
way. 
A word sequence W always corresponds to a path in F 
and may correspond to one or more conceptual struc-
tures represented by paths in networks in B1 and B2. In 
the second case, the likelihood of W in F will be much 
lower than the likelihood in B1 or B2 because phrases 
recognized by the chunk automata of the network are 
boosted as it will be shown later. Thus the best path for 
W, in this case, will go through a network whose auto-
mata produce as output the components of a conceptual 
structure.  
 
3 Knowledge inference 
Usually, when an application is developed, an even 
small training corpus is available. 
 
Semantic categories and functions are manually derived 
for an application. They can be modified when the ap-
plication is deployed in order to correct errors or add 
missing constituents.  
 
A number of words in the lexicon have lexical entries 
containing their syntactic category, syntactic constructs 
which can appear in the same sentence, semantic fea-
tures and constructs they can be part of. When one of 
these words is encountered in the training corpus, it is 
considered as a trigger for the semantic categories con-
tained in its lexical entry. The association between 
words and semantic features is part of the semantic 
knowledge of the system. 
 
The presence of a category in the sentence under analy-
sis can be verified manually or by deriving it from the 
parse tree of the sentence. As lexical entries, grammars 
and rules for deriving semantic structures from parse 
trees may be imprecise or incomplete, a single example 
can be carefully examined and validated manually.  
 
Once a single example is available with a detailed syn-
tactic and semantic analysis, it can be generalized. A 
sentence may contain a complete or partial semantic 
structure or just one component concept. Let Γ represent 
such a semantic interpretation. Furthermore, each struc-
ture may correspond to a pattern made of phrases and 
fillers of the sentence represented by a sequence of 
words W. Semantic Classification Trees (SCT) pro-
posed in [1] can be used for automatically deriving sen-
tence patterns corresponding to conceptual structures.  
 
The purpose of learning is to build or modify a SFST 
that accepts a sequence of words and output a semantic 
interpretation Γ. 
 
The initial analysis of an example starts by using a tag-
ger for replacing words with their preterminal syntactic 
categories.  
 
Then, semantic tags are automatically associated with 
sequences of syntactic tags manually or using the se-
mantic knowledge. A tag expression made of syntactic 
and semantic tags is obtained in this way as a represen-
tation for of Γ. As a by-product, expressions for the 
constituents of and components of  Γ are built and 
added to the semantic knowledge.  
 
Generalization of the example uses a phrase generator to 
produce sequences of words from the tag expression. 
These sequences of words enrich the finite state transla-
tor which has to map word sequences into the concep-
tual structure Γ.  
 
Further generalization can be obtained by inferring 
synonyms with a WordNet. If generalization has pro-
vided erroneous sequences of words, these sequences 
can be removed  by manual inspection or when it is ob-
served that the system has made an interpretation error 
because of them. With a similar procedure, new se-
quences of words can be added to the automaton for Γ.  
 
Once it has been found that a word (noun or verb) con-
tributed to hypothesize a concept in the semantic struc-
ture, the concept is added as semantic feature in the 
lexical entry of the word.  
 
In summary learning of semantic knowledge follows the 
following steps: 
 
1 Set the semantic categories for the application. 
 
2 Set the lexical entries for the words that are semanti-
cally relevant for the application. 
 
3 For every analyzed sentence  
• if semantic interpretation is correct   then do 
nothing, 
• if a phrase is misplaced in the representation of 
a semantic structure then remove it, 
• if a phrase is missed in the representation of a 
semantic structure, but the corresponding tag 
expressions is present in the semantic knowl-
edge, then the phrase is added to the corre-
sponding SFST, 
• if the tag expression does not exist in the se-
mantic knowledge, then it is built and se-
quences of words are generated from it with 
the above outlined generalization procedure. 
 
A set of SFST is built in this way. They are added to the 
LM to provide concept specific components and to pro-
duce semantic interpretations at the same time with a 
translation process.  
 
References 
 
[1] Kuhn R.  and De Mori R. (1995). The Application of 
Semantic Classification Trees to Natural Language Un-
derstanding. IEEE Trans. on Pattern Analysis and Ma-
chine Intelligence, 17 : 449-460.  
 
[2] Pieraccini R., Levin E., and Lee C.-H. (1991). Sto-
chastic Representation of Conceptual Structure in the 
ATIS Task. Proceedings of the, 1991 Speech and Natu-
ral Language Workshop, 121-124, Morgan Kaufmann 
publ, Los Altos, CA. 
 
[3] Vossen P. Diez-Orzas P. and Peters W., (1997) 
The multilingual design of EuroWordnet. 
Proc ACL/EACL workshop on automatic information 
extraction and building of lexical semantic resources for 
NLP applications, Madrid, 1997. 
