File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0207_intro.xml

Size: 6,177 bytes

Last Modified: 2025-10-06 14:06:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0207">
  <Title>Measuring Semantic Entropy Dept.</Title>
  <Section position="4" start_page="41" end_page="42" type="intro">
    <SectionTitle>
2 Method
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="41" end_page="41" type="sub_section">
      <SectionTitle>
2.1 Translational Distributions
</SectionTitle>
      <Paragraph position="0"> The first step in measuring semantic entropy is to compute the translational distribution Pr(T\[s) of each source word s in a bitext. A relatively simple method for estimating this distribution is described in (Me196b). Briefly, the method works as follows:  1. Extract a set of aligned text segment pairs from a parallel corpus, e.g. using the techniques in (G&amp;Cgla) or in (Me196a).</Paragraph>
      <Paragraph position="1"> 2. Construct an initial translation lexicon with likelihood scores attached to each entry, e.g. using the method in (Mel95) or in (G&amp;Cgl).</Paragraph>
      <Paragraph position="2"> 3. Assume that words always translate one-to-one. 4. Armed with the current lexicon, greedily &amp;quot;link&amp;quot; each word token with its most likely translation in each pair of aligned segments.</Paragraph>
      <Paragraph position="3"> 5. Discard lexicon entries representing word pairs that are never linked.</Paragraph>
      <Paragraph position="4"> 6. Estimate the parameters of a maximum-likelihood word translation model.</Paragraph>
      <Paragraph position="5"> 7. Re-estimate the likelihood of each lexicon en- null try, using the number of times n its components co-occur, the number of times k that they are linked, and the probability Pr(kln, model).</Paragraph>
      <Paragraph position="6"> 8. Repeat from Step 4 until the lexicon converges. After the lexicon converges, Step 4 is repeated one last time, keeping track of how many times each English (source) word is linked to each French (target) word. Using the link frequencies F(s, t) and the frequencies F(s) of each English source word s, the maximum likelihood estimates of Pr(t\[s), the probability that s translates to the French target word t, can be computed in the usual way: Pr(tls ) = F(8,0/F(s).</Paragraph>
    </Section>
    <Section position="2" start_page="41" end_page="42" type="sub_section">
      <SectionTitle>
2.2 Translational Entropy
</SectionTitle>
      <Paragraph position="0"> The above method constructs translation lexicons containing only word-to-word correspondences. The best it can do for compound words like &amp;quot;au chaurange&amp;quot; and &amp;quot;right away&amp;quot; is to link their translation to the most representative part of the compound. For example, a typical translation lexicon may contain the entries &amp;quot;unemployed/chaumage&amp;quot; and &amp;quot;right/imm~liatement.&amp;quot; This behavior is quite suitable for our purposes, because we are interested only in the degree to which the translational probability mass is scattered over different target words,  not in the particular target words over which it is scattered.</Paragraph>
      <Paragraph position="1"> The translational inconsistency of words can be computed following the principles of information theory z. In information theory, inconsistency is called entropy. Entropy is a functional of probability distribution functions (pdf's). If P is a pdf over the random variable X, then the entropy of P is defined as2</Paragraph>
      <Paragraph position="3"> Since probabilities are always between zero and one, their logarithms are always negative; the minus sign in the formula ensures that entropies are always positive. null The translational inconsistency of a source word s is proportional to the entropy H(T\]s) of its translational pdf P(TIs):</Paragraph>
      <Paragraph position="5"> Note that H(T\[s) is not the same as the conditional entropy H(TIS ). The latter is a functional of the entire pdf of source words, whereas the former is a function of the particular source word s. The conditional entropy is actually a weighted sum of the individual translational entropies:</Paragraph>
      <Paragraph position="7"/>
    </Section>
    <Section position="3" start_page="42" end_page="42" type="sub_section">
      <SectionTitle>
2.3 Null Links
</SectionTitle>
      <Paragraph position="0"> All languages have words that don't translate easily into other languages, and paraphrases are common in translation. Most bitexts contain a number of word tokens in each text for which there is no obvious counterpart in the other text. Semantically light words are more likely to be paraphrased or translated non-literally. So, the frequency with which a particular word gets linked to nothing is an important factor in estimating its semantic entropy.</Paragraph>
      <Paragraph position="1"> Ideally, a measure of translational inconsistency should be sensitive to which null links represent the same sense of a given source word and which ones represent different senses. Given that algorithms for making this distinction are currently beyond the state of the art, the simplest way to account for &amp;quot;null&amp;quot; links is to invent a special NULL word, and to pretend that all null links are actually links to NULL (BD-{-93). This heuristic produces undesired results, however, since it implies that the transla- null tion of a word which is never linked to anything is perfectly consistent. A better solution lies at the opposite extreme, in the assumption that each null link represents a different sense of the source word i See (C&amp;T91) for a good introduction.</Paragraph>
      <Paragraph position="2"> 21t is standard to use the shorthand notation P(x) for Prp(X = x).</Paragraph>
      <Paragraph position="3">  in question. Under this assumption, the contribution to the semantic entropy of s made by each null link is --F--~ log F--~&amp;quot; If F(NULLIs) represents the number of times that s is linked to nothing, then the total contribution of all these null links to the semantic entropy of s is</Paragraph>
      <Paragraph position="5"> The semantic entropy E(s) of each word s accounts for both the null links and the non-null links</Paragraph>
      <Paragraph position="7"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML