File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1615_intro.xml

Size: 2,876 bytes

Last Modified: 2025-10-06 14:04:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1615">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Domain Adaptation with Structural Correspondence Learning</Title>
  <Section position="4" start_page="0" end_page="121" type="intro">
    <SectionTitle>
2 A Motivating Example
</SectionTitle>
    <Paragraph position="0"> Figure 1 shows two PoS-tagged sentences, one each from the Wall Street Journal (hereafter WSJ) and MEDLINE. We chose these sentences for two reasons. First, we wish to visually emphasize the difference between the two domains. The vocabularies differ significantly, and PoS taggers suffer accordingly. Second, we want to focus on the  Corresponding words are in bold, and pivot features are italicized phrase &amp;quot;with normal signal transduction&amp;quot; from the MEDLINE sentence, depicted in Figure 2(a). The word &amp;quot;signal&amp;quot; in this sentence is a noun, but a tagger trained on the WSJ incorrectly classifies it as an adjective. We introduce the notion of pivot features. Pivot features are features which occur frequently in the two domains and behave similarly in both. Figure 2(b) shows some pivot features that occur together with the word &amp;quot;signal&amp;quot; in our biomedical unlabeled data. In this case our pivot features are all of type &lt;the token on the right&gt;. Note that &amp;quot;signal&amp;quot; is unambiguously a noun in these contexts. Adjectives rarely precede past tense verbs such as &amp;quot;required&amp;quot; or prepositions such as &amp;quot;from&amp;quot; and &amp;quot;for&amp;quot;.</Paragraph>
    <Paragraph position="1"> We now search for occurrences of the pivot features in the WSJ. Figure 2(c) shows some words that occur together with the pivot features in the WSJ unlabeled data. Note that &amp;quot;investment&amp;quot;, &amp;quot;buy-outs&amp;quot;, and &amp;quot;jail&amp;quot; are all common nouns in the financial domain. Furthermore, since we have labeled WSJ data, we expect to be able to label at least some of these nouns correctly.</Paragraph>
    <Paragraph position="2"> This example captures the intuition behind structural correspondence learning. We want to use pivot features from our unlabeled data to put domain-specific words in correspondence. That is,  Input: labeled source data {(xt, yt)Tt=1}, unlabeled data from both domains {xj} Output: predictor f : X - Y  1. Choose m pivot features. Create m binary prediction problems, plscript(x), lscript = 1 . . . m 2. For lscript = 1 to m</Paragraph>
    <Paragraph position="4"> we want the pivot features to model the fact that in the biomedical domain, the word signal behaves similarly to the words investments, buyouts and jail in the financial news domain. In practice, we use this technique to find correspondences among all features, not just word features.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML