File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1028_metho.xml
Size: 7,173 bytes
Last Modified: 2025-10-06 14:07:11
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1028"> <Title>Explaining away ambiguity: Learning verb selectional preference with Bayesian networks*</Title> <Section position="5" start_page="189" end_page="191" type="metho"> <SectionTitle> 4 A Bayesian network approach to </SectionTitle> <Paragraph position="0"> learning selectional preference</Paragraph> <Section position="1" start_page="189" end_page="190" type="sub_section"> <SectionTitle> 4.1 Structure and parameters of the </SectionTitle> <Paragraph position="0"> model The hierarchy of nouns in Wordnet defines a DAG. Its mapping into a BBN is straightibrward. Each word or synset in Wordnet is a node in the network. If A is a hyponym of B there is an are in the network from B to A. All the variables are Boolean. A synset node is true if the verb selects for that class. A word node is true if the word can appear as an argument of the verb. The priors are defined tbllowing two intuitive principles. First, it is unlikely that a verb a priori selects for troy particular synset. Second, if a verb does select for a synset, say FOOD, then it; is likely that it also selects tbr i(;s hyl)onyms, say FR.UFI'. The sam(', l>rin(:il)les ;H)t)Iy (;o words: it is likely l;h~t a wor(t ;q)I)ears as an m:gmnent of t,h(; verl) if the vert) seh;(:l;s for any of il;s possible senses. On l;he other h~m(t, if (,he verb does nol; selecl; for a synsel;, it; is 'u, nlikely that the words insl;anl;b~l;ing (;he synse(; occur ~s its ;~rgumen~s. &quot;Likely&quot; and &quot;unlikely&quot; are given mml(;rical values l;h~l; Sllill 1l t) 1;o 1. The following l;at)le defines l;\]te scheme for the CPTs associated wil;h each node in the nel;work; pi(X) (lcnot;cs 1;12(: il.h, t2ar(:n(, of (;h(: no(t(: X.</Paragraph> <Paragraph position="2"> uncondil;ion~d t)rol)al/iliI;y of (;h(: node. Now WO, (;}UI I;(:Si; (;he mo(M on (;h(: siml)l(: ex~mq)\](: seen (:~rli(:r. W + is th(: set of wor(ls l;ha( o(:(:m'rc(t with I;he v(:rl). 'l.'\]l(: \]2o(l(:s (:orr(:st)on(ling (;o |;he wor(ls in l/l/q ;w(: s(:l, 1;o lr,,,c ;rod (;h(: o(;hers l(:f(; mls(:l;. For 1;12(: l)revious example W -I = {mca/,~ apph'., bagel, ch, c~c'.sc'.}, mt(\] l;h(; (:orrest)on(ting no(t(:s m:c sol; (;o I/rue, as (t(:t)i(:l;e(t in Figm'(; 5. Wil;h lil,:(dy ;m(t ',.nhT~:c'.ly r(:sl)(:(:l;iv(:ly equal Ix) 0.9.0 mM 0.01, tit(: i)osl;(:rior l)rol);d)ilil;i(:s are a /)(/i'\[m, a., b, c.) = 0.9899 mM .P(Cl,m,, a, b, c) = 0.0101. Expla.ining away works. Th(; t)osl;(;rior 1)rol);fl)ilil;y of COGNITION g(;l;s as low as i(;s prior, whcr(',as l;h(; 1)rol)al)ilil;y of FOOD goes u t) to almost; 1. A 13~y(;sim~ n(;l;work ~q)t/roa(:\]~ seems 1;o ;l(:l;u~dl.y imt)leancn(; (;he. conse.rva/,'ive, stral;egy w(: l;hough(; (;o 1)e (;he corr(;(:I; one. for unsupervisc(t l(;~mfing of sehx-t, ional resi;ri(:tions.</Paragraph> </Section> <Section position="2" start_page="190" end_page="190" type="sub_section"> <SectionTitle> 4.2 Computational issues in building </SectionTitle> <Paragraph position="0"> BBNs based on Wordnet Th(: imt)l(:m(:ni;~d;ion of a BBN for (;h(; whole of W()r(lnel; fax:as (:oml)ul;al;ional (:oml)lexi|;y pro|)l(:ms (;ypi(:al of graphi(:al too(Ms. A (l(:ns(:ly (:ommcI;ed \]IBN presents (;wo kinds of l)rol)l(:ms.</Paragraph> <Paragraph position="1"> The tirst is (;he storage of the CPTs. The size of a CPT grows extIonenl;ially with the nmnber of parents of the node. 4 This prol)lem can lie aF, C, m, a, b and c respec(;ively stand for FOOD, COGNITION, meat, apph'., bagd and chces(: solved 1)y ot)i;infizing the r(:l)r(:s(;nl;~ti;ion of thes(: (;;d)l(:s. In our case most of l,h(: (:ntri(:s h~w(: l;he s~tln('~ v;~luos~ ~ttl(l ~t COlll\])a,(:(; rel)resenl;al;ion for l;lmln (;&n l)(; fOllll(\] (ltlll(;h like l;h(', ()he llS(Xl in (;h(', noisy-OR too(tel (Pearl, 1{)88)).</Paragraph> <Paragraph position="2"> A h;~r(l(:r lirol)lem is lmrforming inf(;rc\]me.</Paragraph> <Paragraph position="3"> The gr;q)hi(:al sl;rlt('i;llr( ~, of & BBN r(;t)resenl;s l;h(: (l(:t)(:n(len(:y r(:lal;ions among th(: rml(lom wtrial)lcs of the, nel;work, r\]?he ~dgoril;hlns use(t wil;lt B\]INs usmdly l)(M'orm inference t)y (tymmfi(&quot; t)rogrmmning on the tri~mgul~d;e(t lnor~d gr~q)h. A low(n&quot; 1)(mn(t on l, he mmfl)er of (:oml)ul;al;ions l;h;~(; are n(:(:(:ssa.ry I;() mo(t(:I l;h(; joint (lisl;ritmi;ion ov(:r l;h(: wn'bd)h:s using su(:h ~dgoril;hms is 21&quot;1t I wh(:r(: 'r~, is t;\]m size of (:h(; ma.ximal l)(mn(tary s(:l; a(:(:or(ling (;o t;hc visil;a,tion s(:h(:(lul(:.</Paragraph> </Section> <Section position="3" start_page="190" end_page="191" type="sub_section"> <SectionTitle> 4.3 Subnetworks and balancing </SectionTitle> <Paragraph position="0"> B(,(:mls(; of l;h(:s(, 1)rol)h:ms w(: (:(told not t)uihl a singl(: BBN for Wor(hmI;. Insl;e~M w(: simt/litie(t (;he sl;rll('l;ur(: of 1;12(: model by building a smaller sutmei;work for each 1)re(ticate-argumenl; pair. A sulm(:twork consis(;s of (;he mlion of the s(:ts of ml(:(;sl;ors of the words in W +. Figure. 6 provid(:s ml example of the union of these :%ncestral sul)grat)hs&quot; of Wordne(; for (;he words java ~m(l drink (COml)~We i(; wil;h Figure 1).</Paragraph> <Paragraph position="1"> This siml)liti(:ation (toes not atfe(:t the (:ompul;~tion of the (tistril)ui;ions we are inl;(;resl;ed in; l;h;fl; is, the marginals of the synset nodes.</Paragraph> <Paragraph position="2"> A BBN provi(tes a coral)act representation tbr the joinI; disl;rit)ution over the set of variables senses. The size of its OPT is therefor('. 2 2(~. Six)ring a (:a1)Ie of tloa(; numbers tbr l;his node alone requires around (2'-)~)8 = 537 MBytes of memory.</Paragraph> <Paragraph position="3"> in the network. If N = X1, ..., Xn i8 a Bayesian network with variables X1,..., Xn, its joint distribution P(N) is the product of all the conditional probabilities specified in the network,</Paragraph> <Paragraph position="5"> where pa(X) is the set of parents of X. A BBN generates a factorization of the joint distribution over its variables. Consider a network of three nodes A, B~ C with arcs fl'om A to \]5 and C. Its joint distribution can be characterized as P(A, B, C) = P(A)P(BIA)P(CIA ). If there is no evidence for C the joint distribution is</Paragraph> <Paragraph position="7"> The node C gets marginalized out. Marginalizing over a childless node is equivalent to removing it with its connections from the network.</Paragraph> <Paragraph position="8"> Therefore the subnetworks are equivalent to the whole network; i.e., they have the same joint distribution.</Paragraph> <Paragraph position="9"> Our model comtmtes the value of P(c\[p,r), lint we did not compute the prior P(c) for all n(mns in the cortms. We assumed this to be a constant, equal to the 'u'nlihcly wflue, for all classes. In a BBN the wdues of the marginals increase with their distance fl'om the root nodes. To avoid undesired bias (see table of results) we defined a balancing formula that adjusted the conditional probabilities of the CPTs in such a way that we got; all tim marginals to have approximately the same wdue)</Paragraph> </Section> </Section> class="xml-element"></Paper>