File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1028_intro.xml

Size: 4,806 bytes

Last Modified: 2025-10-06 14:00:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1028">
  <Title>Explaining away ambiguity: Learning verb selectional preference with Bayesian networks*</Title>
  <Section position="4" start_page="188" end_page="189" type="intro">
    <SectionTitle>
3 Bayesian networks
</SectionTitle>
    <Paragraph position="0"> A Bayesian network (Pearl, 1988), or Bayesian 1)el|el nel;work (BBN), eonsisi;s of a sol; of variables and a sel; of directed edges (:onneel;ing the w~riat)les. The variables and tile edges detine, a dire,(:te, d acyclic graph (DAG) where each wtrial)le is rei)resented 1)y ~ node.</Paragraph>
    <Paragraph position="1"> Ea(:h vmi~d)le is asso(:iated with a finite number of (nmi;u;dly ex('lusive) sl;ates. '1)) each wu:ial)le A with \]);n'eni;s \]31,..., I7~ is ;t|;l;;mll(',(l ;t conditio'n, al probability tabh', (CPT) l'(A\[131, ...,Hn). Given a BBN, Bayesiml int~rence (:~m 1)e used 1;o esi;ilm~l;e marginal and posterior probabilities given the evidence at hand ~md (;It(', infornlation six)red in the CPTs, the prior probabilities, by means of B~yes' rule, P(HIE ) =</Paragraph>
    <Paragraph position="3"> Baye, sian nel;works display ml exl;remely intereslfing t)roi)ert;y called explaining away. Word sense mnbiguity in the 1)recess of learning SP de,tines a 1)rot)lem that nlight, l)e solved by a model that imt)lements an explaining away strategy.</Paragraph>
    <Paragraph position="4"> Sul)t)ose we ;~re learning the, selectional 1)referen(:e of drink, and the network ill Figure 4 is the 'As a nmtt;er of fi*cl;, for this HMM there are (infinitely) many i)aramel;er vahles that nmxinfize the likelihood of t;he training data; i.e., l;he i)arame, l;ers are, not; idenl;ifiable. The intuil;ively correct solution is one of l;helll, \])ILL SO are infinitely lilalty ()|;her, intuitively incorre(:t; ones. Thus il, is no surprise l;hat the EM algorithm emmet; lind the intuitively correct sohlt;ion.</Paragraph>
    <Paragraph position="5">  as a Bayesian network. The variables ISLAND and BEVERAGE represent concepts in a semantic hierarchy. The wtriables java and water stand for I)ossible instantiations of the concet)ts.</Paragraph>
    <Paragraph position="6"> All the w,riables are Boolean; i.e., they are associated with two states, true or false. Suppose the tbllowing CPTs define the priors associated with each node. 2 The unconditional probabili-</Paragraph>
    <Paragraph position="8"> CPTs for the child nodes are</Paragraph>
    <Paragraph position="10"> w false. 0.01 0.01 0.99 0.99 These vahms mean that the occu,'rence of either concept is a priori unlikely. If either concept is true the word java is likely to occur. Similarly, if BE VERA GE occurs it; is likely to observe also the word water. As the posterior probabilities show, if java occurs, the belief~ in both concepts increase: P(II.j) = P(BIj ) = 0.3355. However, 'water provides evidence for BEVERAGE only.</Paragraph>
    <Paragraph position="11"> Overall there is more evidence for the hypothesis that the concept being expressed is BEVERAGE and not ISLAND. Bayesian networks implement this inference scheme; if we compute the conditional probabilities given that both words occurred, we obtain P(BIj , w) = 0.98 and P(I\[j, w) = 0.02. The new evidence caused the &amp;quot;island&amp;quot; hyt)othesis to be explained away!</Paragraph>
    <Section position="1" start_page="189" end_page="189" type="sub_section">
      <SectionTitle>
3.1 The relevance of priors
</SectionTitle>
      <Paragraph position="0"> Explaining away seems to depend on the specification of the prior prolmt)ilities. The priors 2I, 13, j and w abbreviate ISLAND, 1lEVERAGE, java and water, respectively.</Paragraph>
      <Paragraph position="1"> (-)coaNmo,v ) roe, \[0 ii~ \] L_ m Figm'e 5: A Bayesian network for the simple example.</Paragraph>
      <Paragraph position="2"> define the background knowledge awdlable to the model relative to the conditional probabilities of the events represented by the variables, but also about the joint distributions of several events. In the simple network above, we defined the probat)ility that either concept is selected (i.e., that the correst)onding variable is true) to be extremely small. Intuitively, there are many concepts and the probability of observing any particular one is small. This means that the joint probability of the two events is much higher in the case in which only one of them is true (0.0099) than in the case in which they are both true (0.0001). Therefore, via the priors, we introduced a bias according to which the hypothesis that one concept is selected will be t/wored over two co-occurring ones. This is a general pattern of Bayesian networks; the prior causes simpler explanations to be preferred over more complex ones, and thereby the explaining away effect.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML