File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/c00-1028_abstr.xml
Size: 5,095 bytes
Last Modified: 2025-10-06 13:41:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1028"> <Title>Explaining away ambiguity: Learning verb selectional preference with Bayesian networks*</Title> <Section position="2" start_page="0" end_page="187" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This t)at)er presents a Bayesian lnodel for unsut)(;rvised learning of v(;rb s(;le(-t;ional t)refer(;nc('s. For each vert) the model creates a 13~Wo.sian n(~twork whose archii;e(:lan'(~ is (l(fl;(wmin(;d t).5' the h',xical hicr~m:hy of W()r(hmt mtd whos('~ 1)ar~mmt;(;rs are ('~sl;im;~l:(~d from a list; of v('d'l)ol)je('t pairs \]'cram from a tort)us. &quot;lgxl)laining away&quot;, t~ well-known t)rop('xi;y of Baycsi~m networks, helps the moth;1 (teal in a natural fashion with word sense aml)iguity in tlw, training (tat~L. ()n a, word sense disamt)igu;~tion Lest our model t)erformed l)ctl;c,r than ot;h(',r stal;(~ of tim art systems for unSUl)crvis(~d learning ()t7 seh'x:tionM t)r(d'er(mces. Coml)utational (:Oml)lcxity l)rol)lems, wn.ys of improving tiffs ;tl)l)roa(:h mM methods for iml)h'menting &quot;('xt)laining away&quot; in oth(;r graphical frameworks are dis('ussed.</Paragraph> <Paragraph position="1"> 1 Selectional preference and sense ambiguity R('.gularil;i('~s of avcrt) with rcsl)e(:t to t;lw. semantic class of its m:guments (sul)j('.cl:, ol)j('.(:l; mM indirect o\])je(:l;) arc called selectional preferenees (S1)) (Katz and Fodor, 1964; Chomsky, 1965; Johnson-Laird, 1983). The verb pilot carries the information thal; its ol)jecl; will likely 1)e some kind of veh, icle; sut)jects of tim vert) th, int,: t(md to 1)e h,'uman; ;rod sul)jects of the verb bark l;end l;o l)c, dogs. For the sake of simt)licity we will locus on the verl)-ot)je(:t relation all;hough the techniques we will describe can be at)t)li(;d to other verb-argument pairs.</Paragraph> <Paragraph position="2"> * We wouhl like to ttmnk the Brown Lal)orat;ory for Linguistic Inibrmation Processing; Thomas IIoflnann; Elie Bienenstoek; 1)hilip Resnik, who provided us with training and test data; and Daniel Oarcia for his hel l) with the SMILE lil)rary of (:lasses tbr Bayesian networks that we used for our exl)eriments. This research was SUl)l)orted</Paragraph> <Paragraph position="4"> Models of the acquisition of SP arc impor|;ant in their own right and h;w(', at)plic~tt.ions in N~tural l,anguage l~ro(:essing (NIA'). The selcctional l)rclhr(;nc(;s of ~L verb can b(; used t;o inti;r I;he l)ossil)\]c meanings of an mlknown re'gum(mr of a known verb; e.g., it; might be possibh; to infer that xzzz is ~ kind ot! dog front the tbllowing sentence: &quot;The :rzJ:z barked all night&quot;. In p~rsing ;~ sentence seh,ctional l)refe, rcn(:es can 1)(; used to rank competing parses, providing a partim nlt',asur(; of scmmlt;ic well-forlnedness, inv('stigating SI ) might hel l) us to understand the structure of the mental lexicon.</Paragraph> <Paragraph position="5"> Systems for mlsupervised learning of SP usually combine statistical aim knowledge-1)ased approaches. The knowledge-base component is typicMly a database that groups words into classes. In the models w(' will see. the knowledge base is Wordnet (Miller, 1990). Word-net groups nouns into c, lasses of synonyms ret)resenting concel)ts , called synsets, e.g., {car,,,'ld, o,a,'utomobilc,...}. A noun that lmhmgs to sew:ral synsets is ambiguous. Atran- null sitive and asymmetrical relation, hyponymy, is defined between synsets. A synset is a hyponym of another synset if the tbrmer has the latter as a broader concept; for example, BEVERAGE is a hyponym of LIQ UID. Figure 1 depicts a portion of the hierarchy.</Paragraph> <Paragraph position="6"> The statistical component consists of predicate-argument pairs extracted from a corpus in which the semantic class of the words is not indicated. A trivial algorithm might get a list of words that occurred as objects of the verb and output the semantic classes the words belong to according to Wordnet.</Paragraph> <Paragraph position="7"> For example, if the verb drink occurred with 'water and water E LIQUID, the nlodel would learn that drink selects tbr LIQUID. As Resnik (1997) and abney and Light (1999) have found, the main problem these systems face is the presence of ambiguous words in the training data. If the word java also occurred as an object of drink, since java C BEVERAGE and java C ISLAND, this model would learn that drink selects tbr both BEVERAGE and ISLAND.</Paragraph> <Paragraph position="8"> More complex models have been t)roposed.</Paragraph> <Paragraph position="9"> These models, though, deal with word sense ambiguity by applying an unselective strategy similar to the one above; i.e., they assmne that anfl)iguous words provide equal evidence tbr all their senses. These models choose as the coneepts the verb selects tbr those that are in common among several words (e.g., BEVERAGE above). This strategy works to the extent that these overlapping senses are also the concepts the verb selects tbr.</Paragraph> </Section> class="xml-element"></Paper>