File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1027_metho.xml

Size: 7,328 bytes

Last Modified: 2025-10-06 14:07:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1027">
  <Title>Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p 2</Title>
  <Section position="3" start_page="183" end_page="184" type="metho">
    <SectionTitle>
6. Neighborhoods (Near)
</SectionTitle>
    <Paragraph position="0"> Florian and Yarowsky's example, &amp;quot;It is at least on the Serb side a real setback to the x,&amp;quot; provides a nice motivation for neighborhoods. Suppose the context (history) mentions a number of words related to a peace process, but doesn't mention the word &amp;quot;peace.&amp;quot; Intuitively, there should still be some adaptation. That is, the probability of &amp;quot;peace&amp;quot; should go up quite a bit (positive adaptation), and the probability of many other words such as &amp;quot;piece&amp;quot; should go down a little (negative adaptation).</Paragraph>
    <Paragraph position="1"> We start by partitioniug the vocabulary into three exhaustive and mutually exclusive sets: hist, near and other (abbreviations for history, neighborhood and otherwise, respectively). The first set, hist, contains the words that appear in the first half of the document, as before. Other is a catchall for the words that are in neither of the first two sets.</Paragraph>
    <Paragraph position="2"> The interesting set is near. It is generated by query expansion. The history is treated as a query in an information retrieval document-ranking engine. (We implemented our own ranking engine using simple IDF weighting.) The neighborhood is the set of words that appear in the k= 10 or k = 100 top documents returned by the retrieval engine. To ensure that the three sets partition the vocabulary, we exclude the history fiom the neighborhood: near = words in query expansion of hist - hist The adaptation probabilities are estimated using a contingency table like before, but we now have a three-way partition (hist, near and other) of the vocabulary instead of the two-way partition, as illustrated below.</Paragraph>
    <Paragraph position="3">  In estilnating adaptation probabilities, we continue to use a, b, c and d as before, but four new variables are introduced: e, f, g and h, where c=e+g andd=f+h.</Paragraph>
    <Paragraph position="4">  other The table below shows that &amp;quot;Kennedy&amp;quot; adapts more than &amp;quot;except&amp;quot; and that &amp;quot;peace&amp;quot; adapts more than &amp;quot;piece.&amp;quot; That is, &amp;quot;Kennedy&amp;quot; has a larger spread than &amp;quot;except&amp;quot; between tile history and tile otherwise case.</Paragraph>
    <Paragraph position="5"> .l)Jior hist near other src w  When (.\]/' is small (d/'&lt; 100), I\]O smoothing is used to group words into bins by {!/\] Adaptation prol)abilities are computed for each bill, rather than for each word. Since these probabilities are implicitly conditional on ,qJ; they have ah'eady been weighted by (!fin some sense, and therefore, it is unnecessary to introduce an additional explicit weighting scheme based on (!/'or a simple transl'orm thereof such as II)1:.</Paragraph>
    <Paragraph position="6"> The experiments below split tile neighborhood into four chisses, ranging fronl belier nei.g, hbors to wmwe neighbotw, del)ending oil expansion frequency, e/\] el'(1) is a number between 1 and k, indicating how many of the k top scoring documents contain I. (Better neighbors appear in more of the top scoring documents, and worse neighbors appear in fewer.) All the neighborhood classes fall between hist and other, with better neighbors adapting tllore than ~,OlSe neighbors.</Paragraph>
  </Section>
  <Section position="4" start_page="184" end_page="185" type="metho">
    <SectionTitle>
7. Experimental Results
</SectionTitle>
    <Paragraph position="0"> Recall that the task is to predict the test portion (the second half) of a document given the histoo, (the first half). The following table shows a selection of words (sorted by the third cohunn) from the test portion of one of the test doculnents.</Paragraph>
    <Paragraph position="1"> The table is separated into thirds by horizontal lines. The words in the top third receive nmch higher scores by the proposed method (S) than by a baseline (B). These words are such good keywords that one can faMy contidently guess what the story is about. Most of these words receive a high score because they were mentioned in the history portion of the document, but &amp;quot;laid-off&amp;quot; receives a high score by the neighl)orhood mechanism. Although &amp;quot;hiid-off&amp;quot; is not mentioned explicitly in the history, it is obviously closely related to a number of words that were, especially &amp;quot;layoffs,&amp;quot; but also &amp;quot;notices&amp;quot; and &amp;quot;cuts.&amp;quot; It is reassuring to see tile neighborhood mechanism doing what it was designed to do.</Paragraph>
    <Paragraph position="2"> The middle third shows words whose scores are about the same as the baseline. These words tend to be function words and other low content words that give tts little sense of what the document is about. The bottoln third contains words whose scores are much lower than the baseline. These words tend to be high in content, but misleading.</Paragraph>
    <Paragraph position="3"> The word ' al us,&amp;quot; for example, might suggest that story is about a military conflict.</Paragraph>
    <Paragraph position="4">  The proposed score, S, shown in colunln 1, is:</Paragraph>
    <Paragraph position="6"> where near I through near 4 are four neighborl\]oods (k=100). Words in near4 are the best neighbors (e\[&gt;10) and words in near I are the worst neighbors (e/'= 1). Tile baseline, B, shown in column 2, is: Prl~(w)=df/D. Colun\]i\] 3 con\]pares the first two cohnnns.</Paragraph>
    <Paragraph position="7"> We applied this procedure to a year of the AP news and found a sizable gain in information on  average: 0.75 bits per word type per doculnent. In addition, there were many more big winners (20% of the documents gained 1 bit/type) than big losers (0% lost 1 bit/type). The largest winners include lists of major cities and their temperatures, lists of major currencies and their prices, and lists of commodities and their prices. Neighborhoods are quite successful in guessing the second half of such lists.</Paragraph>
    <Paragraph position="8"> On the other hand, there were a few big losers, e.g., articles that summarize the major stories of the clay, week and year. The second half of a summary article is almost never about the same subject its the first half. There were also a few end-of-document delimiters that were garbled in translnission causing two different documents to be treated as if they were one. These garbled documents tended to cause trouble for the proposed method; in such cases, the history comes fi'om one document and the test comes from another.</Paragraph>
    <Paragraph position="9"> In general, the proposed adaptation method performed well when the history is helpful for predicting the test portion of the document, and it performed poorly when the history is misleading. This suggests that we ought to measure topic shifts using methods suggested by Hearst (1994) and Florian &amp; Yarowsky (1999). We should not use the history when we believe that there has been a major topic shift.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML