Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, pages 134–143,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Balancing Con icting Factors in Argument Interpretation
Ingrid Zukerman, Michael Niemann and Sarah George
Faculty of Information Technology
Monash University
Clayton, VICTORIA 3800, AUSTRALIA
{ingrid,niemann,sarahg}@csse.monash.edu.au
Abstract
We present a probabilistic approach for the
interpretation of arguments that casts the
selection of an interpretation as a model
selection task. In selecting the best model,
our formalism balances con icting fac-
tors: model complexity against data  t,
and structure complexity against belief
reasonableness. We  rst describe our ba-
sic formalism, which considers interpreta-
tions comprising inferential relations, and
then show how our formalism is extended
to suppositions that account for the beliefs
in an argument, and justi cations that ac-
count for the inferences in an interpreta-
tion. Our evaluations with users show that
the interpretations produced by our system
are acceptable, and that there is strong sup-
port for the postulated suppositions and
justi cations.
1 Introduction
The source-channel approach has been often used
for word-based language tasks, such as speech
recognition and machine translation (Epstein,
1996; Och and Ney, 2002). According to this ap-
proach, an addressee receives a noisy channel (lan-
guage or speech wave), and decodes this channel
to derive the source (idea). The selected source is
that with the maximum posterior probability.
In this paper, we apply the source-channel ap-
proach to the interpretation of arguments. This
approach enables us to cast argument interpreta-
tion as a trade-off between con icting factors, viz
model complexity against data  t, and structure
complexity against belief reasonableness. This
trade-off is inspired by the Minimum Message
Length (MML) Criterion  a model selection
method that is the basis for several machine learn-
ing techniques (Wallace, 2005). According to this
trade-off, a more complex model might  t the data
better, but the plausibility (priors) of the model
must be taken into account to avoid over- tting.1
Our argument interpretation mechanism has
been implemented in a system called BIAS
(Bayesian Interactive Argumentation System).
BIAS presents to a user a set of facts about the
world (evidence), and the user constructs an argu-
ment about a particular goal proposition in light
of this evidence. BIAS then generates an interpre-
tation of the user’s argument, i.e., it tries to un-
derstand the argument. When people try to under-
stand an interlocutor’s discourse, their interpreta-
tion is in terms of their own beliefs and inference
patterns. Likewise, our system’s interpretations
are in terms of its underlying knowledge repre-
sentation  a Bayesian network (BN). The inter-
pretations generated by BIAS include inferences
that connect the propositions in a user’s argument,
suppositions that postulate a user’s beliefs that are
necessary to make sense of the argument, and ex-
planatory extensions that justify the inferences in
the interpretation (and in the argument). BIAS
does not generate its own arguments, rather, it in-
tegrates these components to make sense of the
user’s argument.
In this paper, we  rst describe our basic for-
malism, which is used to calculate the probability
of interpretations that include only inferences, and
then show how progressive enhancements of this
formalism are used for more informative interpre-
tations.
In Section 2, we explain what is an argument
interpretation, and describe brie y the interpreta-
tion process. Next, we discuss our probabilistic
formalism for selecting an interpretation, which is
the focus of this paper. In Section 4, we present
1Other model selection criteria such as Akaike Informa-
tion Criterion (AIC) and Bayes Information Criterion (BIC)
(Box et al., 1994) also argue for model parsimony, but they
do so by penalizing models with more free parameters.
134
the results of our evaluations, followed by a dis-
cussion of related work, and concluding remarks.
2 Argument interpretation
We de ne an interpretation of a user’s argument as
the tuple fSC,IG,EEg, where SC is a supposi-
tion con guration, IG is an interpretation graph,
and EE are explanatory extensions.
 A Supposition Con guration is a set of sup-
positions attributed to the user (in addition to
or instead of shared beliefs) to account for the
beliefs in his or her argument.
 An Interpretation Graph is a domain struc-
ture, in our case a subnet of the domain BN,
that connects the nodes mentioned in the argu-
ment. The nodes and arcs that are included in
an interpretation graph but were not mentioned
by the user  ll in additional detail from the BN,
bridging inferential leaps in the argument.
 Explanatory Extensions are domain struc-
tures (subnets of the domain BN) that are added
to an interpretation graph to justify an infer-
ence. Contrary to suppositions, these explana-
tions contain propositions believed by the user
and the system. The presentation of these ex-
planations is motivated by the results of our
early trials, where people objected to belief dis-
continuities between the antecedents and the
consequent of inferences, i.e., increases in cer-
tainty or large changes in certainty (Zukerman
and George, 2005).
To illustrate these components, consider the ex-
ample in Figure 1. The top segment contains
a short argument, and the bottom segment con-
tains its interpretation. The middle segment con-
tains an excerpt of the domain BN which in-
cludes the interpretation; the probabilities of some
nodes are indicated with linguistic terms.2 The in-
terpretation graph, which appears inside a light
gray bubble in the BN excerpt, includes the ex-
tra node GreenInGardenAtTimeOfDeath (boxed).
Note that the propagated beliefs in this interpre-
tation graph do not match those in the argument.
To address this problem, the system supposes that
the user believes that TimeOfDeath11=TRUE, in-
stead of the BN belief of Probably (boldfaced and
2We use the terms Very Probable, Probable, Possible and
their negations, and Even Chance. These terms, which are
similar to those used in (Elsaesser, 1987), are most consis-
tently understood by people according to our user surveys.
ARGUMENT
Mr Green
he                  had the opportunity to kill Mr Body, butpossibly
possibly
being in the garden at 11 implies thatprobably
he                 did murder Mr Body.not
INTERPRETATION
Hence, he
supposing that the time of death is 11
Mr Green                   being in the garden at 11, and
had the opportunity to kill Mr Body, butpossibly
probably
Mr Green probably was in the garden at the time of death.
implies that
possibly not
Mr Green probably did not have the means.
Therefore, he                  did          murder Mr Body.
GreenLadder
AtWindow
GreenVisitBody
LastNight
NbourHeardGreen&Body
GreenHadOpportunity
GreenMurderedBody
ArgueLastNight
GreenInGardenAt11TimeOfDeath11
.   .   .
.   .   .
GreenHadMotive
.   .   .EXCERPT OF DOMAIN BN
Probably
ProbablyNot ProbablyNot EvenChance
Probably
ProbablyNot
GreenHadMeans
TimeOfDeath
GreenInGardenAt
Figure 1: Sample argument, BN excerpt and inter-
pretation
gray-boxed). This  xes the mismatch between the
probabilities in the argument and those in the in-
terpretation, but one problem remains: in early tri-
als we found that people objected to belief discon-
tinuities, such as the  jump in belief from pos-
sibly having opportunity to possibly not murder-
ing Mr Body (this jump appears both in the origi-
nal argument and in the interpretation, whose be-
liefs now match those in the argument as a re-
sult of the supposition). This prompts the gen-
eration of the explanatory extension GreenHad-
Means[ProbablyNot] (white boldfaced and dark-
gray boxed). The three elements added during the
interpretation process  the extra node in the inter-
pretation graph, the supposition and the explana-
tory extension  appear in boldface italics in the
interpretation at the bottom of the  gure.
2.1 Proposing Interpretations
The problem of  nding the best interpretation is
exponential. In previous work, we proposed an
anytime algorithm to propose interpretation graphs
and supposition con gurations until time runs out
(George et al., 2004). Here we apply our algorithm
to generate interpretations comprising supposition
con gurations (SC), interpretation graphs (IG)
and explanatory extensions (EE) (Figure 2).
Supposition con gurations are proposed  rst, as
instantiated beliefs affect the plausibility of inter-
135
Algorithm GenerateInterpretations(Arg)
while fthere is timeg
f
1. Propose a supposition con guration SC that
accounts for the beliefs stated in the argument.
2. Propose an interpretation graph IG that con-
nects the nodes in Arg under supposition con-
 guration SC.
3. Propose explanatory extensions EE for inter-
pretation graph IG under supposition con g-
uration SC if necessary.
4. Calculate the probability of interpretation
fSC,IG,EEg.
5. Retain the top N (=6) most probable interpre-
tations.g
Figure 2: Anytime algorithm for generating inter-
pretations
pretation graphs, which in turn affect the need for
explanatory extensions. The proposal of supposi-
tion con gurations, interpretation graphs and ex-
planatory extensions is driven by the probability
of these components. In each iteration, we gener-
ate candidates for a component, calculate the prob-
ability of these candidates in the context of the
selections made in the previous steps, and proba-
bilistically select one of these candidates. That is,
higher probability candidates have a better chance
of being selected than lower probability ones (our
selection procedures are described in George et al.,
2004). For example, say that in Step 1, we selected
supposition con guration SCa. Next, in Step 2,
the probability of candidate IGs is calculated in
the context of the domain BN and SCa, and one
of the IGs is probabilistically selected, say IGb.
Similarly, in Step 3, one of the candidate EEs is
selected in the context of SCa and IGb. In the next
iteration, we probabilistically select an SC (which
could be a previously chosen one), and so on. To
generate diverse interpretations, if SCa is selected
again, a different IG will be chosen.
3 Probabilistic formalism
Following (Wallace, 2005), our approach requires
the speci cation of three elements: background
knowledge, model and data. Background knowl-
edge is everything known to the system prior to in-
terpreting a user’s argument, e.g., domain knowl-
edge, shared beliefs with the user, and dialogue
history; the data is the argument; and the model
is the interpretation.
We posit that the best interpretation is that with
the highest posterior probability.
IntBest = argmaxi=1,...,qPr(SCi,IGi,EEijArg)
where q is the number of interpretations.
After applying Bayes rule, this probability is
represented as follows.3
Pr(SCi,IGi,EEijArg) = (1)
α Pr(SCi,IGi,EEi) Pr(ArgjSCi,IGi,EEi)
where α is a normalizing constant that ensures that
the probabilities of the interpretations sum to 1parenleftbigg
α= 1summationtextn
j=1Pr(SCj,IGj,EEj)×Pr(Arg|SCj,IGj,EEj)
parenrightbigg
.
The  rst factor represents model complexity,
and the second factor represents data  t.
 Model complexity measures how dif cult it is
to produce the model (interpretation) from the
background knowledge. The higher/lower the
complexity of a model, the lower/higher its
probability.
 Data  t measures how well the data (argument)
matches the model (interpretation). The bet-
ter/worse the match between the argument and
an interpretation, the higher/lower the proba-
bility that the speaker intended this interpreta-
tion when he or she uttered the argument.
Model Complexity
Model complexity is a function fB,Mg![0, 1]
that represents the prior probability of the model
M (i.e., the interpretation) in terms of the back-
ground knowledge B. The calculation of model
complexity depends on the type of the model: nu-
merical or structural.
The probability of a numerical model depends
on the similarity between the numerical values (or
distributions) in the model and those in the back-
ground knowledge. The higher/lower this similar-
ity, the higher/lower the probability of the model.
For instance, a supposition con guration SC com-
prising beliefs that differ signi cantly from those
in the background knowledge will lower the prob-
ability of an interpretation. One of the functions
we have used to calculate belief probabilities is the
Zipf distribution, where the parameter is the differ-
ence between beliefs, e.g., between the supposed
3In principle, Pr(SCi, IGi, EEi|Arg) can be calculated
directly. However, it is not clear how to incorporate the priors
of an interpretation in the direct calculation.
136
beliefs and the corresponding beliefs in the back-
ground knowledge (Zukerman and George, 2005).
That is, the probability of a supposed belief in
proposition P according to model M (bel M(P)),
in light of the belief in P according to background
knowledge B (bel B(P)), is
Pr(bel M(P)jbel B(P))= θjbel
M(P) bel B(P)jγ
where θ is a normalizing constant, and γ deter-
mines the penalty assigned to the discrepancy be-
tween the beliefs in P. For example,
Pr(bel M(P)= TRUEjbel B(P)=Probable) >
Pr(bel M(P)= TRUEjbel B(P)=EvenChance)
as TRUE is closer to Probable than to EvenChance.
The probability of a structural model (e.g., an
interpretation graph) is obtained from the proba-
bilities of the elements in the structure (e.g., nodes
and arcs) in light of the background knowledge.
The simplest calculation assumes that the proba-
bility of including nodes and arcs in an interpreta-
tion graph is uniform. That is, the probability of
an interpretation graph comprising n nodes and a
arcs is a function of
 the probability of n,
 the probability of selecting n particular nodes
from N nodes in the domain BN: parenleftbigNnparenrightbig−1,
 the probability of a, and
 the probability of selecting a particular arcs
from the arcs that connect the n selected nodes.
This calculation generally prefers small models
to larger models.4
Data  t
Data  t is a function fM,Dg ! [0, 1] that rep-
resents the probability of the data D (argument)
given the model M (interpretation). This proba-
bility hinges on the similarity between the model
and the data  the closer the data is to the model,
the higher is the probability of the data.
The calculation of the similarity between nu-
merical data and a numerical model is the same
as the calculation of the similarity between a nu-
merical model and background knowledge.
The similarity between structural data and a
structural model is a function of the number and
type of operations required to convert the model
into the data, e.g., node and arc insertions and
4In the rare cases where n > N/2, smaller models do not
yield lower probabilities.
deletions. For the example in Figure 1, to con-
vert the interpretation graph into the argument, we
must delete one node (GreenInGardenAtTimeOf-
Death) and its incident arcs. The more operations
need to be performed, the lower the similarity be-
tween the data and the model, and the lower the
probability of the data given the model.
We now discuss our basic probabilistic formal-
ism, which accounts for interpretation graphs, fol-
lowed by two enhancements: (1) a more complex
model that accounts for suppositions; and (2) in-
creases in background knowledge that yield a pref-
erence for larger interpretation graphs under cer-
tain circumstances, and account for explanatory
extensions.
3.1 Basic formalism: Interpretation graphs
In the basic formalism, the model contains only an
interpretation graph. Thus, Equation 1 is simply
Pr(IGijArg) = α Pr(IGi)  Pr(ArgjIGi) (2)
The difference in the calculations of model
complexity and data  t for numerical and struc-
tural information warrants the separation of struc-
ture and belief, which yields
Pr(IGijArg) = α Pr(bel IGi, struc IGi) 
Pr(bel Arg, struc Argjbel IGi, struc IGi)
After applying the chain rule of probability
Pr(IGijArg) =
α Pr(bel IGijstruc IGi)  Pr(struc IGi)  
Pr(bel Argjstruc Arg, bel IGi, struc IGi)  
Pr(struc Argjbel IGi, struc IGi)
Note that Pr(bel IGijstruc IGi) does not cal-
culate the probability of (or belief in) the nodes
in IGi. Rather, it calculates how probable are
these beliefs in light of the structure of IGi
and the expectations from the background knowl-
edge. For instance, if the belief in a node is p,
it calculates the probability of p. This proba-
bility depends on the closeness between the be-
liefs in IGi and the expected ones. Since the
beliefs in IGi are obtained algorithmically by
means of Bayesian propagation from the back-
ground knowledge, they match precisely the ex-
pectations. Hence, Pr(bel IGijstruc IGi) = 1.
We also make the following simplifying as-
sumptions for situations where the interpretation
is known (given): (1) the probability of the beliefs
in the argument depends only on the beliefs in the
137
Table 1: Probability  Basic formalism
Model complexity (against background)
↓Pr(struc IGi) ↑ structural complexity
(model size)
Data  t with model
↑Pr(struc Argjstruc IGi) ↓ structural discrepancy
Pr(bel Argjbel IGi) numerical discrepancy
interpretation (and not on its structure or the ar-
gument’s structure), and (2) the probability of the
argument structure depends only on the interpreta-
tion structure (and not on its beliefs). This yields
Pr(IGijArg) = α Pr(struc IGi) (3)
Pr(bel Argjbel IGi)  Pr(struc Argjstruc IGi)
Table 1 summarizes the calculation of these
probabilities separated according to model com-
plexity and data  t. It also shows the trade-off
between structural model complexity and struc-
tural data  t. As seen at the start of Section 3,
smaller structures generally have a lower model
complexity than larger ones. However, an increase
in structural model complexity (indicated by the ↑
next to the structural complexity and the ↓ next
to the resultant probability of the model) may re-
duce the structural discrepancy between the argu-
ment structure and the structure of the interpreta-
tion graph (indicated by the ↓ next to the structural
discrepancy and the ↑ next to the probability of the
structural data- t). For instance, the smallest pos-
sible interpretation for the argument in Figure 1
consists of a single node, but this interpretation has
a very poor data  t with the argument.
3.2 A more informed model
In order to postulate suppositions that account for
the beliefs in an argument, we expand the basic
model to include supposition con gurations (be-
liefs attributed to the user in addition to or instead
of the beliefs shared with the system). Now the
model comprises the pair fSCi,IGig, and Equa-
tion 2 becomes
Pr(SCi,IGijArg) = (4)
α Pr(SCi,IGi)  Pr(ArgjSCi,IGi)
Similar probabilistic manipulations to those
performed in Section 3.1 yield
Pr(SCi,IGijArg) = (5)
α Pr(struc IGijSCi) Pr(SCi) 
Pr(bel ArgjSCi, bel IGi) Pr(struc Argjstruc IGi)
Table 2: Probability  More informed model
Model complexity (against background)
Pr(struc IGijSCi) structural complexity
↓Pr(SCi) ↑numerical discrepancy
Data  t with model
Pr(struc Argjstruc IGi) structural discrepancy
↑Pr(bel ArgjSCi,bel IGi) ↓numerical discrepancy
(Recall that suppositions pertain to beliefs only,
i.e., they don’t have a structural component.)
Table 2 summarizes the calculation of these
probabilities separated according to model com-
plexity and data  t (the elements that differ from
the basic model are boldfaced). It also shows the
trade-off between belief model complexity and be-
lief data  t. Making suppositions has a higher
model complexity (lower probability) than not
making suppositions (where SCi matches the be-
liefs in the domain BN). However, as seen in the
example in Figure 1, making a supposition that re-
duces or eliminates the discrepancy between the
beliefs in the argument and those in the interpre-
tation increases the belief data- t considerably, at
the expense of a more complex belief model.
3.3 Additional background knowledge
An increase in our background knowledge means
that we take into account additional factors about
the world. This extra knowledge in turn may
cause us to prefer interpretations that were pre-
viously discarded. We have considered two ad-
ditions to background knowledge: dialogue his-
tory, and users’ preferences regarding inference
patterns.
Dialogue history
Dialogue history in uences the salience of a
node, and hence the probability that it was in-
cluded in a user’s argument. We have modeled
salience by means of an activation function that
decays with time (Anderson, 1983), and used this
function to moderate the probability of including
a node in an interpretation (instead of using a uni-
form distribution). We have experimented with
two activation functions: (1) a function where the
level of activation of a node is based on the fre-
quency and recency of the direct activation of this
node; and (2) a function where the level of activa-
tion of a node depends on its similarity with all the
(activated) nodes, together with the frequency and
recency of their activation (Zukerman and George,
138
2005).
To illustrate the in uence of salience, com-
pare the preferred interpretation graph in
Figure 1 (in the light gray bubble) with
an alternative path through NbourHeard-
Green&BodyArgueLastNight and GreenVisit-
BodyLastNight. The preferred path has 4 nodes,
while the alternative one has 5 nodes, and hence
a lower probability. However, if the nodes in the
longer path had been recently mentioned, their
salience could overcome the size disadvantage.
Thus, although the chosen interpretation graph
may have a worse data  t than the smallest graph,
it still may have the best overall probability in
light of the additional background knowledge.
Inference patterns
In a formative evaluation of an earlier version
of our system, we found that people objected
to inferences that had increases in certainty or
large changes in certainty (Zukerman and George,
2005). An example of an increase in certainty is
A [Probably] implies B [VeryProbably].
A large change in certainty is illustrated by
A [VeryProbably] implies B [EvenChance].
We then conducted another survey to deter-
mine the types of inferences considered acceptable
by people (from the standpoint of the beliefs in
the antecedents and the consequent). The results
from our preliminary survey prompted us to dis-
tinguish between three types of inferences: Both-
Sides, SameSide and AlmostSame.
 BothSides inferences have antecedents with be-
liefs on both  sides of the consequent (in
favour and against), e.g.,
A[VeryProbably] & B[ProbablyNot] implies
C[EvenChance].
 All the antecedents in SameSide inferences
have beliefs on  one side of the consequent,
but at least one antecedent has the same belief
level as the consequent, e.g.,
A[VeryProbably] & B[Possibly] implies
C[Possibly].
 All the antecedents in AlmostSame inferences
have beliefs on one side of the consequent, but
the closest antecedent is one level  up from
the consequent, e.g.,
A[VeryProbably] & B[Possibly] implies
C[EvenChance].
Our survey contained six evaluation sets, which
were done by 50 people. Each set contained an ini-
tial statement (we varied the polarity of the state-
ment in the various sets), three alternative argu-
ments that explain this statement, and the option
to say that no argument is a good explanation. The
respondents were asked to rank these options in
order of preference.
All the evaluation sets contained one argument
that was objectionable according to our prelim-
inary survey (there was an increase in belief or
a large change in belief from the antecedent to
the consequent). The two other arguments, each
of which comprises a single inference, were dis-
tributed among the six evaluation sets as follows.
 Three sets had one BothSides inference and
one SameSide inference, each with two an-
tecedents.
 Two sets had one SameSide inference, and
one AlmostSame inference, each with two an-
tecedents.
 One set had one SameSide inference with two
antecedents, and one BothSides inference com-
prising three antecedents.
In order to reduce the effect of the respondents’
domain bias, we generated two versions of the sur-
vey, where for each evaluation set we swapped the
antecedent propositions in one of the inferences
with the antecedent propositions in the other.
Our survey showed that people prefer BothSides
inferences (which contain antecedents for and
against the consequent). They also prefer Same-
Side to AlmostSame for antecedents with beliefs
in the negative range (VeryProbNot, ProbNot and
PossNot); and they did not distinguish between
SameSide and AlmostSame for antecedents with
beliefs in the positive range. Further, BothSides in-
ferences with three antecedents were preferred to
SameSide inferences with two antecedents. This
indicates that persuasiveness carries more weight
than parsimony.
These general preferences are incorporated into
our background knowledge as expectations for
a range of acceptable beliefs in the consequents
of inferences in light of their antecedents. The
farther the actual beliefs in the consequents are
from the expectations, the lower the probability
of these beliefs. Hence, it is no longer true that
Pr(bel IGijSCi, struc IGi) = 1 (Section 3.1), as
we now have a belief expectation that goes beyond
Bayesian propagation. As done at the start of Sec-
tion 3, the probability of the beliefs in an inter-
pretation is a function of the discrepancy between
139
these beliefs and expected beliefs. We calculate
this probability using a variant of the Zipf distri-
bution adjusted for ranges of beliefs.
Explanatory extensions are added to an inter-
pretation in order to overcome these belief dis-
crepancies, yielding an expanded model that com-
prises the tuple fSCi,IGi,EEig. Equation 2 now
becomes
Pr(SCi,IGi,EEijArg) = (6)
α Pr(SCi,IGi,EEi)  Pr(ArgjSCi,IGi,EEi)
We make simplifying assumptions similar to
those made in Section 3.1, i.e., given the interpre-
tation graph and supposition con guration, the be-
liefs in the argument depend only on the beliefs in
the interpretation, and the argument structure de-
pends only on the interpretation structure. These
assumptions, together with probabilistic manipu-
lations similar to those performed in Section 3.1,
yield
Pr(SCi,IGi,EEijArg) = (7)
α Pr(struc IGijSCi) Pr(SCi) 
Pr(bel IGijSCi, struc IGi, bel EEi, struc EEi) 
Pr(struc EEijSCi, struc IGi, bel EEi) 
Pr(bel EEijSCi, struc IGi, struc EEi) 
Pr(bel ArgjSCi, bel IGi) Pr(struc Argjstruc IGi)
The calculation of the probability of an ex-
planatory extension is the same as the calcu-
lation for structural model complexity at the
start of Section 3. However, the nodes in
an explanatory extension are selected from the
nodes directly connected to the interpretation
graph. In addition, as for the basic model (Sec-
tion 3.1), the beliefs in the nodes in explana-
tory extensions are obtained algorithmically by
means of Bayesian propagation. Hence, there
is no discrepancy with expected beliefs, i.e.,
Pr(bel EEijSCi, struc IGi, struc EEi) = 1.
Table 3 summarizes the calculation of these
probabilities (the elements that differ from the ba-
sic model and the enhanced model are boldfaced).
It also shows the trade-off between structural and
belief model complexity. Presenting explana-
tory extensions has a higher structural complex-
ity (lower probability) than not presenting them.
However, explanatory extensions can reduce the
numerical discrepancy between the beliefs in an
interpretation and the beliefs expected from the
background knowledge, thereby increasing the be-
lief probability of the interpretation. For instance,
Table 3: Probability  Additional background
knowledge
Model complexity (against background)
Pr(struc IGijSCi) structural complexity
Pr(SCi) numerical discrepancy
↓Pr(struc EEijSCi,
struc IGi, bel EEi) ↑ structural complexity
↑Pr(bel IGijSCi, struc IGi,
bel EEi, struc EEi) ↓ numerical discrepancy
Data  t with model
Pr(struc Argjstruc IGi) structural discrepancy
Pr(bel ArgjSCi, bel IGi) numerical discrepancy
Table 4: Summary of Trade-offs
↓ Pr model structure (IG) ) ↑ Pr struct. data  t
↓ Pr model belief (SC) ) ↑ Pr belief data  t
↓ Pr model structure (EE)) ↑ Pr model belief
in the example in Figure 1, the added explanatory
extension eliminates the unacceptable jump in be-
lief.
Table 4 summarizes the trade-offs discussed in
this section.
4 Evaluation
We evaluated separately each component of an
interpretation  interpretation graph, supposition
con guration and explanatory extensions.
4.1 Interpretation graph
We prepared four evaluation sets, each of which
was done by about 20 people (Zukerman and
George, 2005). In three of the sets, the partici-
pants were given a simple argument and a few can-
didate interpretations (ranked highly by our sys-
tem). The fourth set featured a complex argument,
and only one interpretation (other candidates had
much lower probabilities). The participants were
asked to give each interpretation a score between
1 (Very UNreasonable) and 5 (Very reasonable).
Table 5 shows the results obtained for the inter-
pretation selected by our formalism for each set,
which was the top scoring interpretation. The  rst
Table 5: Evaluation results: Interpretation graph
Set # 1 2 3 4
Avg. score 3.38 3.68 3.35 4.00
Std. dev. 1.45 1.11 1.39 1.02
Stat. sig. (p) 0.08 0.15 0.07 NA
140
row shows the average score given by our subjects
to this interpretation, the second row shows the
standard deviation, and the third row the statistical
signi cance, derived using a paired Z-test against
alternative options (no alternatives were presented
for the fourth set). Our results show that the inter-
pretations generated by our system were generally
acceptable, but that some people gave low scores.
Our subjects’ feedback indicated that these scores
were mainly due to mismatches between beliefs in
the argument and in its interpretation, and due to
belief discontinuities. This led to the addition of
suppositions and explanatory extensions.
4.2 Supposition con guration
We prepared four evaluation sets, each of which
was done by 34 people (George et al., 2005). Each
set consisted of a short argument, plus a list of sup-
position options as follows: (a) four suppositions
that had a reasonably high probability according
to our formalism, (b) the option to make a free-
form supposition in line with the domain BN, and
(c) the option to suppose nothing. We then asked
our subjects to indicate which of these options was
required for the argument to make sense. Specif-
ically, they had to rank their preferred options in
order of preference (but they did not have to rank
options they disliked). Overall, there was strong
support for the supposition preferred by our for-
malism. In three of the evaluation sets, it was
ranked  rst by most of the trial subjects (30/34,
19/34, 20/34), with no other option a clear second.
Only in the fourth set, the supposition preferred by
our formalism was equal- rst with another option,
but still was ranked  rst 10 times (out of 34).
4.3 Explanatory extensions
We constructed two evaluation sets, each of which
was done by 20 people. Each set consisted of a
short argument and two alternative interpretations
(with and without explanatory extensions). There
was strong support for the explanatory extensions
proposed by our formalism, with 57.5% of our
trial subjects favouring the interpretations with ex-
planatory extensions, compared to 37.5% of the
subjects who preferred the interpretations without
such extensions, and 5% who were indifferent.
5 Related Research
An important aspect of discourse understanding
involves  lling in information that was omitted by
the interlocutor. In this paper, we have presented
a probabilistic formalism that balances con icting
factors when  lling in three types of information
omitted from an argument. Interpretation graphs
 ll in details in the argument’s inferences, sup-
position con gurations make sense of the beliefs
in the argument, and explanatory extensions over-
come belief discontinuities.
Our approach resembles the work of Hobbs et
al. (1993) in several respects. They employed
an abductive approach where a model (interpre-
tation) is inferred from evidence (sentence); they
made assumptions as necessary; and used guid-
ing criteria pertaining to the model and the data
for choosing between candidate models. There are
also signi cant differences between our work and
theirs. Their interpretation focused on problems of
reference and disambiguation in single sentences,
while ours focuses on a longer discourse and the
relations between the propositions therein. This
distinction also determines the nature of the task,
as they try to  nd a concise model that explains
as much of the data as possible (e.g., one refer-
ent that  ts many clues), while we try to  nd a
representation for a user’s argument. Additionally,
their domain knowledge is logic-based, while ours
is Bayesian; and they used weights to apply their
hypothesis selection criteria, while our criteria are
embodied in a probabilistic framework.
Plan recognition systems also generate one or
more interpretations of a user’s utterances, em-
ploying different resources to  ll in information
omitted by the user, e.g., (Allen and Perrault,
1980; Litman and Allen, 1987; Carberry and Lam-
bert, 1999; Raskutti and Zukerman, 1991). These
plan recognition systems used a plan-based ap-
proach to propose interpretations. The  rst three
systems applied different types of heuristics to se-
lect an interpretation, while the fourth system used
a probabilistic approach moderated by heuristics
to select the interpretation with the highest prob-
ability. We use a probabilistic domain repre-
sentation in the form of a BN (rather than plan
libraries), and apply a probabilistic mechanism
that represents explicitly the contribution of back-
ground knowledge, model complexity and data  t
to the generation of an interpretation. Our mech-
anism, which can be applied to other domain rep-
resentations, balances different types of complex-
ities and discrepancies to select the interpretation
with the highest posterior probability.
Several researchers used maximum posterior
141
probability as the criterion for selecting an inter-
pretation (Charniak and Goldman, 1993; Gertner
et al., 1998; Horvitz and Paek, 1999). They used
BNs to represent a probability distribution over the
set of possible explanations for the observed facts,
and selected the explanation (a node in the BN or
a value of a node) with the highest probability. We
also use BNs as our domain representation, but our
 explanation of the facts (the user’s argument) is
a Bayesian subnet (rather than a single node) sup-
plemented by suppositions. Additionally, we cal-
culate the probability of an interpretation on the
basis of the  t between the argument and the inter-
pretation, and the complexity of the interpretation
in light of the background knowledge.
Our work on positing suppositions is related to
research on presuppositions (Kaplan, 1982; Gur-
ney et al., 1997)  a type of supposition implied
by the wording of a statement. Like our sup-
positions, presuppositions are necessary to make
sense of what is being said, but they operate at
a different knowledge level than our suppositions.
This aspect of our work is also related to research
on the recognition of  awed plans (Quilici, 1989;
Pollack, 1990; Chu-Carroll and Carberry, 2000).
These researchers used a plan-based approach to
identify erroneous beliefs that account for a user’s
statements or plan, while we use a probabilistic ap-
proach. Our approach supports the consideration
of many possible options, and integrates supposi-
tions into a broader reasoning context.
Finally, the research reported in (Joshi et al.,
1984; van Beek, 1987; Zukerman and Mc-
Conachy, 2001) considers the addition of informa-
tion to planned discourse to prevent a user’s erro-
neous inferences from this discourse. Our mech-
anism adds explanatory extensions to an interpre-
tation to prevent inferences that are objectionable
due to discontinuities in belief. Since such non-
sequiturs may also be present in system-generated
arguments, the approach presented here may be in-
corporated into argument-generation systems.
6 Conclusion
We have offered a probabilistic approach to the in-
terpretation of arguments that casts the selection
of an interpretation as a model selection task. In
so doing, our formalism balances con icting fac-
tors: model complexity against data  t, and struc-
ture complexity against belief reasonableness. We
have demonstrated the use of our basic formalism
for the selection of an interpretation graph, and
shown how a more complex model and additional
background knowledge account respectively for
the inclusion of suppositions and explanatory ex-
tensions in an interpretation. Our user evaluations
show that the interpretation graphs produced by
our formalism are generally acceptable, and that
there is strong support for the suppositions and ex-
planatory extensions it proposes.

References
J.F. Allen and C.R. Perrault. 1980. Analyzing inten-
tion in utterances. Arti cial Intelligence, 15(3):143 
178.
J. R. Anderson. 1983. The Architecture of Cogni-
tion. Harvard University Press, Cambridge, Mas-
sachusetts.
G.E.P. Box, G.M. Jenkins, and G.C. Reinsel. 1994.
Time Series Analysis: Forecasting and Control.
Prentice Hall.
S. Carberry and L. Lambert. 1999. A process model
for recognizing communicative acts and modeling
negotiation subdialogues. Computational Linguis-
tics, 25(1):1 53.
E. Charniak and R. Goldman. 1993. A Bayesian
model of plan recognition. Arti cial Intelligence,
64(1):53 79.
J. Chu-Carroll and S. Carberry. 2000. Con ict res-
olution in collaborative planning dialogues. In-
ternational Journal of Human Computer Studies,
6(56):969 1015.
C. Elsaesser. 1987. Explanation of probabilistic infer-
ence for decision support systems. In Proceedings
of the AAAI-87 Workshop on Uncertainty in Arti -
cial Intelligence, pages 394 403, Seattle, Washing-
ton.
M.E. Epstein. 1996. Statistical Source Channel Mod-
els for Natural Language Understanding. Ph.D. the-
sis, Department of Computer Science, New York
University, New York, New York.
S. George, I. Zukerman, and M. Niemann. 2004. An
anytime algorithm for interpreting arguments. In
PRICAI2004  Proceedings of the Eighth Paci c
Rim International Conference on Arti cial Intelli-
gence, pages 311 321, Auckland, New Zealand.
S. George, I. Zukerman, and M. Niemann. 2005. Mod-
eling suppositions in users’ arguments. In UM05  
Proceedings of the 10th International Conference on
User Modeling, pages 19 29, Edinburgh, Scotland.
A. Gertner, C. Conati, and K. VanLehn. 1998. Pro-
cedural help in Andes: Generating hints using a
Bayesian network student model. In AAAI98  Pro-
ceedings of the Fifteenth National Conference on Ar-
ti cial Intelligence, pages 106 111, Madison, Wis-
consin.
J. Gurney, D. Perlis, and K. Purang. 1997. Interpreting
presuppositions using active logic: From contexts to
utterances. Computational Intelligence, 13(3):391 
413.
J. R. Hobbs, M. E. Stickel, D. E. Appelt, and P. Martin.
1993. Interpretation as abduction. Arti cial Intelli-
gence, 63(1-2):69 142.
E. Horvitz and T. Paek. 1999. A computational archi-
tecture for conversation. In UM99  Proceedings of
the Seventh International Conference on User Mod-
eling, pages 201 210, Banff, Canada.
A. Joshi, B. L. Webber, and R. M. Weischedel. 1984.
Living up to expectations: Computing expert re-
sponses. In AAAI84  Proceedings of the Fourth Na-
tional Conference on Arti cial Intelligence, pages
169 175, Austin, Texas.
S. J. Kaplan. 1982. Cooperative responses from a
portable natural language query system. Arti cial
Intelligence, 19:165 187.
D. Litman and J.F. Allen. 1987. A plan recognition
model for subdialogues in conversation. Cognitive
Science, 11(2):163 200.
F.J. Och and H. Ney. 2002. Discriminative training and
maximum entropy models for statistical machine
translation. In ACL’02  Proceedings of the An-
nual Meeting of the Association for Computational
Linguistics, pages 295 302, Philadelphia, Pennsyl-
vania.
M.E. Pollack. 1990. Plans as complex mental atti-
tudes. In P. Cohen, J. Morgan, and M.E. Pollack, ed-
itors, Intentions in Communication, pages 77 103.
MIT Press.
A. Quilici. 1989. Detecting and responding to
plan-oriented misconceptions. In A. Kobsa and
W. Wahlster, editors, User Models in Dialog Sys-
tems, pages 108 132. Springer-Verlag.
B. Raskutti and I. Zukerman. 1991. Generation and se-
lection of likely interpretations during plan recogni-
tion. User Modeling and User Adapted Interaction,
1(4):323 353.
P. van Beek. 1987. A model for generating better ex-
planations. In Proceedings of the Twenty-Fifth An-
nual Meeting of the Association for Computational
Linguistics, pages 215 220, Stanford, California.
C.S. Wallace. 2005. Statistical and Inductive Inference
by Minimum Message Length. Springer, Berlin,
Germany.
I. Zukerman and S. George. 2005. A probabilistic
approach for argument interpretation. User Model-
ing and User-Adapted Interaction, Special Issue on
Language-Based Interaction, 15(1-2):5 53.
I. Zukerman and R. McConachy. 2001. WISHFUL:
A discourse planning system that considers a user’s
inferences. Computational Intelligence, 1(17):1 61.
