File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0630_intro.xml

Size: 7,218 bytes

Last Modified: 2025-10-06 14:07:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0630">
  <Title>Automatically Merging Lexicons that have Incompatible Part-of-Speech Categories</Title>
  <Section position="3" start_page="247" end_page="249" type="intro">
    <SectionTitle>
2 Basics
</SectionTitle>
    <Paragraph position="0"> Our general strategy is to inspect the co-occurrence of tags on those lemmas that are found in both lexicons, and to use that information as a basis for generalizing, thus yielding POS mapping rules. To do this requires several steps, as described in the following subsections. As a preliminary step, we will introduce a way to represent POS tags using feature vectors. We then use these vectors to generate mapping rules. To obtain better accuracy, we can restrict the training examples to entries that occur in both lexicons.</Paragraph>
    <Paragraph position="1"> The generation algorithm also requires us to define a similarity metric between POS feature vectors.</Paragraph>
    <Section position="1" start_page="247" end_page="247" type="sub_section">
      <SectionTitle>
2.1 Part-of-speech feature vector
</SectionTitle>
      <Paragraph position="0"> A necessary preliminary step of our method is to introduce POS feature vectors. A feature vector is a useful representation of a POS tag, because it neatly summarizes all the information we need about which lemmas can and cannot have that POS tag, illustrated as follows.</Paragraph>
      <Paragraph position="1"> Given: * a lemma set .h4 ={ &amp;quot;apple&amp;quot;, &amp;quot;boy&amp;quot;, &amp;quot;calculate&amp;quot;} * a set of POS tags T' ={&amp;quot;NN&amp;quot;,&amp;quot;VB&amp;quot;} A tiny example lexicon consisting of lemma and POS tag pairs might be as follows, where each cell with * indicates the existence of that lemma-POS pair in the lexicon: II apple boy calculate</Paragraph>
      <Paragraph position="3"> which, when represented as POS feature vectors, will be: pl: &lt; 1, 1, 0, &gt; p2: &lt; 0, 0 1, &gt; where p1 here is the &amp;quot;NN&amp;quot; POS represented by the set of words that can be nouns in a given lexicon, in this example { &amp;quot;apple&amp;quot;, &amp;quot;boy&amp;quot; } and p2 similarly is the &amp;quot;VB&amp;quot; POS. The feature value for feature f in g can be either: * 0 to indicate that we are not sure whether p is a tag of f; * 1 to indicate that p is a tag for f; * 2 to indicate that p can never be a tag for lemma f.</Paragraph>
      <Paragraph position="4"> Obtaining information about the last of these (the value 2) is a non-trivial problem, which we will return to later in this paper. With ordinary lexicons, we only directly obtain feature vectors containing 0 and 1 values. null</Paragraph>
    </Section>
    <Section position="2" start_page="247" end_page="248" type="sub_section">
      <SectionTitle>
2.2 Mapping rule learning algorithm
</SectionTitle>
      <Paragraph position="0"> Given a feature vector for every POS tag in both lexicons--say, Brill's lexicon and the Moby lexicon--we use the following algorithm to learn mapping rules from POS tags in Brill's tagset to POS tags in the Moby tagset. The idea is to assume that a mapping rule between two POS tags holds if the similarity between their feature vectors exceeds a preset threshold, called a sim-threshold T.</Paragraph>
      <Paragraph position="1"> The similarity metric (SimScore) will be described later, but let's first look at the learning algorithm, as described in algorithm 1.</Paragraph>
      <Paragraph position="2"> This algorithm does not exclude m-to-n mappings; that is, any Brill POS tag could in principle get mapped to any number of Moby POS tags.</Paragraph>
      <Paragraph position="3">  foreach ~ in P do foreach ~ e Q do if SimScore(~,~) &gt; sire_threshold ~- then B Bu &gt;}; end end end.</Paragraph>
      <Paragraph position="4"> Algorithm 1: Mapping rule learning algorithm</Paragraph>
    </Section>
    <Section position="3" start_page="248" end_page="248" type="sub_section">
      <SectionTitle>
2.3 Improving the training set by
</SectionTitle>
      <Paragraph position="0"> intersecting the lexicons We can obtain better results by considering only those lemmas that occur in both lexicons. This has the effect of eliminating unreliable features in the POS feature vectors, since lemmas that do not occur in both lexicons cannot be relied upon when judging similarity. This results in pruned versions of both lexicons.</Paragraph>
      <Paragraph position="1"> For example, pretend that the following are the only entries in the Brill and Moby lexicons:  In this case, intersecting the lexicons would result in the following pruned lexicons: null  After pruning, the only remaining lemma is &amp;quot;boy&amp;quot;, and the new POS feature vectors for &amp;quot;NN&amp;quot; and &amp;quot;N&amp;quot; have just one dimension corresponding to &amp;quot;boy&amp;quot;: NN: &lt; 1 &gt; N: &lt; 1 &gt; Of course, in reality the lexicons are much bigger and the effect is not so drastic. In all experiments in this paper, we used lexicon intersection to prune the lexicons.</Paragraph>
    </Section>
    <Section position="4" start_page="248" end_page="249" type="sub_section">
      <SectionTitle>
2.4 Similarity metric
</SectionTitle>
      <Paragraph position="0"> The similarity function we use calculates a similarity score between two feature vectors by counting the number of features with the same feature value 1 or 2, indicating that a lemma either can or cannot belong to that POS category. (Recall that the value  0 means &amp;quot;don't know&amp;quot;, so we simply ignore any features with value 0.) The score is normalized by the length of the feature vector. We also require that there be at least one positive match in the sense that some lemma is shared by both of the POS categories; otherwise, if there are only negative matches (i.e., lemmas that cannot belong to either POS category), we consider the evidence to be too weak and the similarity score is then defined to be zero. The whole algorithm is described in algorithm 2.</Paragraph>
    </Section>
    <Section position="5" start_page="249" end_page="249" type="sub_section">
      <SectionTitle>
2.5 The &amp;quot;complete lexicon&amp;quot;
</SectionTitle>
      <Paragraph position="0"> assumption As mentioned earlier, ordinary lexicons do not explicitly contain information about which parts of speech a lemma can not be used as. We have two choices. In the examples up till now, we used a value of 0 for any lemma-tag pair not explicitly listed in the lexicon, signifying that we don't know whether the POS category can include that lemma. However, having many &amp;quot;don't know&amp;quot; values significantly weakens our similarity scoring method. Alternatively, we can choose to assume that our lexicons are complete--a kind of closed world assumption. In this case, we assume that any lemma-tag pair not found in the lexicon is not merely an omission, but really can never occur. This means we use the value 2 instead of the value 0.</Paragraph>
      <Paragraph position="1"> The &amp;quot;complete lexicon&amp;quot; assumption only makes sense when we are dealing with large, broad coverage lexicons (as is the case in this paper). It is not reasonable when dealing with small or specialized sublexicons.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML