File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2094_metho.xml

Size: 7,052 bytes

Last Modified: 2025-10-06 14:07:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2094">
  <Title>Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution</Title>
  <Section position="3" start_page="649" end_page="650" type="metho">
    <SectionTitle>
2 Lexicon Induction via EM-Based
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="649" end_page="649" type="sub_section">
      <SectionTitle>
Clustering
2.1 EM-Based Clustering
</SectionTitle>
      <Paragraph position="0"> For clustering, we used the method described in Rooth et al. (1999). There classes are derived from distributional data a sample of pairs of verbs and nouns, gathered by parsing an unannotated corpus and extracting tile fillers of grammatical relations. The semantically smoothed probability of a pair (v,n) is calculated in a latent class (LC) model as pLC(V, n) = ~&lt;cPLC(C, v,'n). The joint distribution is defined by PLC(C, v, n) = PLC(C)PLc(V\[C)PLC(nIC ).</Paragraph>
      <Paragraph position="1"> By construction, conditioning of v and n on each other is solely made through the classes c. The parameters PLC(C), PLC(V\[C), PLC(n\[c) are estilnated by a particularily silnple version of tile EM algorithm for context-free models.</Paragraph>
      <Paragraph position="2"> Input to our clustering algorithm was a training corpus of 1,178,698 tokens (608,850 types) of verb-noun pairs participating in the grammatical relations of intransitive and transitive verbs and their subject- and object-fillers. Fig.</Paragraph>
      <Paragraph position="3"> 1 shows an induced class froln a model with 35 classes. Induced classes often have a basis in lexical semantics; class 19 can be interpreted as locative, involving location nouns &amp;quot;room&amp;quot;, &amp;quot;arePS', and &amp;quot;world&amp;quot; and verbs as &amp;quot;enter&amp;quot; and &amp;quot;cross&amp;quot;.</Paragraph>
    </Section>
    <Section position="2" start_page="649" end_page="650" type="sub_section">
      <SectionTitle>
2.2 Probabilistic Labeling with Latent
Classes using EM-estimation
</SectionTitle>
      <Paragraph position="0"> To induce latent classes tbr the object slot; of a fixed transitive verb v, another statistical inference step was performed. Given a latent class modal PLC(') Ibr verb-noun pairs, and a sample nl,..., nM of objects for a fixed transitive verb, we calculate tile probability of ml arbitrary object noun ,I, I~ N by p(n) = ~&lt;cP(C, ~;,) = ~&lt;c P(c)pLc(n'Ic)&amp;quot; This fine-tuning of the class parameters p(c) to tile sample of objects for a fixed verb is formalized again as a simple instance of the EM algorithm. In an experiment with English data, we used a clustering model with 35 classes. From the maximum probabil- null ity pm:ses derived fl)r the British National Corpus with the head-lexicalized parser of Carroll and Rooth (1.998), we extracted frequency tables tbr transitive verb-noun pairs. These tables were used to induce a small class-labeled lexicon</Paragraph>
      <Paragraph position="2"> sagen, dass man cinch Pass habcn muss, wcnn man dic Grenze iiberschreitct. There are some old provisions rega.rding passports which state that people crossing the {border/ frontier/ boundary/ limit/ periphery/ edge} shoukI have their 1)assl)ort on them.</Paragraph>
      <Paragraph position="3"> lID 201946) Es 9ibt sehlie.sslich keinc L5sung ohne die Mobilisierung der bii~yerlichen Gesellschaft und die Solidaritiit dcr Dcmok,nten in der 9anzcn Welt.</Paragraph>
      <Paragraph position="4"> Ttmrc can be no solution, tinally, mflcss civilian {company/ society/companionship/party/associate} is mobilized and solidarity demonstrated by democrats throughout the world.</Paragraph>
      <Paragraph position="5">  of' the transitive verbs cross and mobilize Fig. 2 shows the topmost parts of the lexical entries for the transitive verbs cross and mobilize. Class 19 is the most prol)abh ~, class-label for the ol)jeet-slot of cross (prol)al)ility 0.692); tl~e objects of mobilize belong with prol)ability 0.386 to class 16, which is the most probable (:lass for this slot. Fig. 2 shows for each verb the tell llOllllS 'It with highest estimated frequencies .l',,('n,) = f (n)p(cln), where .flu)is the fre(\]ll(~.ll(:y of n in the sample v,l, * * * , 'n,M. For example, the Dequency of seeing mind as object of c,ro.ss is estimated as 74.2 times, and the most fl'equent object of mobilize is estimated to be force.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="650" end_page="651" type="metho">
    <SectionTitle>
3 Disambiguation with Probabilistic
Cluster-Based Lexicons
</SectionTitle>
    <Paragraph position="0"> Ii:t the following, we will des(:ril)e the simt}le and natural lexicon look-up mechanism which is eml)loyed in our disambiguation at)t)roach.</Paragraph>
    <Paragraph position="1"> Consider Fig. 3 which shows two bilingual sentences taken from our evaluation corlms (see Sect. 4). The source-words and their corresponding target-words are highlighted in bold thee. The correct translation of the source-noun (e.g.</Paragraph>
    <Paragraph position="2"> Gre.nzc) as deternfined by the actual trmlslators is replaced by the set of alterlmtive translations (e.g. { border, frontier, b(mndary, limit, peril)hcry, edge }) as proposed by the word-to-word dictionary of Fig. 5 (see Sect. LI).</Paragraph>
    <Paragraph position="3"> The prol)lem to be solved is to lind a correct l;ranslation of the source-word using only minimal contextual intbrmation. In our apt)roach , the decision between alternative target-nouns is done by llSillg only int'ormal,ion provided by the governing target-verb. The key idea is to back up this nfinimal information with the condensed and precise information of a probabilistic class-based lexicon. The criterion for choosing an alterlmtive target-noun is thus the best fit of the lexical and semantic information of the target:noun to the semantics of the argument-slot of the target-verb. This criterion is checked by a silnple lexicon look-up where the target-noun with highest estinmted class-based fl'equeney is determined. Fornmlly, choose l;11(; tm'get-nom~ gt,</Paragraph>
    <Paragraph position="5"> where L-(-.) = f(-,)v(d-.) is the estimated frequency of 'n, in tile sample of objects of a fixed target-verb, p(cl,n ) is the class-melnbershi t) probability of'n in c as determined by the probabilistic lexicon, and f(n) is the frequency of n in the combined sample of objects and trmlslation alternatives1.</Paragraph>
    <Paragraph position="6"> Consider example ID 160867 fron, Fig. 3. The mnbiguity to be resolved concerns the direct objects of the verb cross whose lexical entry is partly shown in Fig. 2. Class 19 and the noun border is the pair yielding a higher estimated trequency than any other combination of a class and an alternative translation such as boundary.</Paragraph>
    <Paragraph position="7"> Similarly, for example ID 301946, the pair of the</Paragraph>
    <Paragraph position="9"> target-noun society and class 6 gives highest estimated frequency of the objects of mobilize.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML