File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1059_metho.xml

Size: 15,001 bytes

Last Modified: 2025-10-06 14:14:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1059">
  <Title>Finite State Transducers Approximating Hidden Markov Models</Title>
  <Section position="5" start_page="460" end_page="460" type="metho">
    <SectionTitle>
2 n-Type Approximation
</SectionTitle>
    <Paragraph position="0"> This section presents a method that approximates a (lst order) HMM by a transducer, called n-type approximation 3.</Paragraph>
    <Paragraph position="1"> Like in an HMM, we take into account initial probabilities ~r, transition probabilities a and class (i.e. observation symbol) probabilities b. We do, however, not estimate probabilities over paths. The tag of the first word is selected based on its initial and class probability. The next tag is selected on its transition probability given the first tag, and its class probability, etc. Unlike in an HMM, once a decision on a tag has been made, it influences the following decisions but is itself irreversible.</Paragraph>
    <Paragraph position="2"> A transducer encoding this behaviour can be generated as sketched in figure 1. In this example we have a set of three classes, Cl with the two tags tn and t12, c2 with the three tags t21, t22 and t23 , and c3 with one tag t31. Different classes may contain the same tag, e.g. t12 and t2s may refer to the same tag.</Paragraph>
    <Paragraph position="3"> For every possible pair of a class and a tag (e.g.</Paragraph>
    <Paragraph position="4"> Cl :t12 or I'ADJ,NOUN\] :NOUN) a state is created and labelled with this same pair (fig. 1). An initial state which does not correspond with any pair, is also created. All states are final, marked by double circles. For every state, as many outgoing arcs are created as there are classes (three in fig. 1). Each such arc for a particular class points to the most probable pair of this same class. If the arc comes from the initial state, the most probable pair of a class and a tag (destination state) is estimated by: argrnkaxpl(ci,tih ) ---- 7r(tik) b(ciltik) (2) If the arc comes from a state other than the initial state, the most probable pair is estimated by:</Paragraph>
    <Paragraph position="6"> In the example (fig. 1) cl :t12 is the most likely pair of class cl, and c2:t23 the most likely pair of class e2 aName given by the author.</Paragraph>
    <Paragraph position="7"> when coming from the initial state, and c2 :t21 the most likely pair of class c2 when coming from the state of c3 :t31.</Paragraph>
    <Paragraph position="8"> Every arc is labelled with the same symbol pair as its destination state, with the class symbol in the upper language and the tag symbol in the lower language. E.g. every arc leading to the state of cl :t12 is labelled with Cl :t12.</Paragraph>
    <Paragraph position="9"> Finally, all state labels can be deleted since the behaviour described above is encoded in the arc labels and the network structure. The network can be minimized and determinized.</Paragraph>
    <Paragraph position="10"> We call the model an nl-type model, the resulting FST an nl-type transducer and the algorithm leading from the HMM to this transducer, an nl-type approximation of a 1st order HMM.</Paragraph>
    <Paragraph position="11"> Adapted to a 2nd order HMM, this algorithm would give an n2-type approximation. Adapted to a zero order HMM, which means only to use class probabilities b, the algorithm would give an nO-type approximation.</Paragraph>
    <Paragraph position="12"> n-Type transducers have deterministic states only.</Paragraph>
  </Section>
  <Section position="6" start_page="460" end_page="462" type="metho">
    <SectionTitle>
3 s-Type Approximation
</SectionTitle>
    <Paragraph position="0"> This section presents a method that approximates an HMM by a transducer, called s-type approximation 4.</Paragraph>
    <Paragraph position="1"> Tagging a sentence based on a 1st order HMM includes finding the most probable tag sequence T given the class sequence C of the sentence. The joint probability of C and T can be estimated by:</Paragraph>
    <Paragraph position="3"> The decision on a tag of a particular word cannot be made separately from the other tags. Tags can influence each other over a long distance via transition probabilities. Often, however, it is unnecessary to decide on the tags of the whole sentence at once.</Paragraph>
    <Paragraph position="4"> In the case ofa 1st order HMM, unambiguous classes (containing one tag only), plus the sentence beginning and end positions, constitute barriers to the propagation of HMM probabilities. Two tags with one or more barriers inbetween do not influence each</Paragraph>
    <Section position="1" start_page="461" end_page="461" type="sub_section">
      <SectionTitle>
3.1 s-Type Sentence Model
</SectionTitle>
      <Paragraph position="0"> To tag a sentence, one can split its class sequence at the barriers into subsequences, then tag them separately and concatenate them again. The result is equivalent to the one obtained by tagging the sentence as a whole.</Paragraph>
      <Paragraph position="1"> We distinguish between initial and middle subsequences. The final subsequence of a sentence is equivalent to a middle one, if we assume that the sentence end symbol (. or ! or ?) always corresponds to an unambiguous class c~. This allows us to ignore the meaning of the sentence end position as an HMM barrier because this role is taken by the un-ambiguous class cu at the sentence end.</Paragraph>
      <Paragraph position="2"> An initial subsequence Ci starts with the sentence initial position, has any number (incl. zero) of ambiguous classes ca and ends with the first unambiguous class c~ of the sentence. It can be described by the regular expressionS:</Paragraph>
      <Paragraph position="4"> The joint probability of an initial class subsequence Ci of length r, together with an initial tag subsequence ~, can be estimated by:</Paragraph>
      <Paragraph position="6"> A middle subsequence Cm starts immediately after an unambiguous class cu, has any number (incl.</Paragraph>
      <Paragraph position="7"> SRegular expression operators used in this section are explained in the annex* zero) of ambiguous classes ca and ends with the following unambiguous class c~ :</Paragraph>
      <Paragraph position="9"> For correct probability estimation we have to include the immediately preceding unambiguous class cu, actually belonging to the preceding subsequence Ci or Cm. We thereby obtain an extended middle</Paragraph>
      <Paragraph position="11"> The joint probability of an extended middle class subsequence C~ of length s, together with a tag sub-sequence Tr~ , can be estimated by:</Paragraph>
      <Paragraph position="13"/>
    </Section>
    <Section position="2" start_page="461" end_page="462" type="sub_section">
      <SectionTitle>
3.2 Construction of an s-Type Transducer
</SectionTitle>
      <Paragraph position="0"> To build an s-type transducer, a large number of initial class subsequences Ci and extended middle class subsequences C~n are generated in one of the following two ways: (a) Extraction from a corpus Based on a lexicon and a guesser, we annotate an untagged training corpus with class labels. From every sentence, we extract the initial class subsequence Ci that ends with the first unambiguous class c~ (eq. 5), and all extended middle subsequences C~n ranging from any unambiguous class cu (in the sentence) to the following unambiguous class (eq. 8).</Paragraph>
      <Paragraph position="1">  A frequency constraint (threshold) may be imposed on the subsequence selection, so that the only subsequences retained are those that occur at least a certain number of times in the training corpus 6. (b) Generation of possible subsequences Based on the set of classes, we generate all possible initial and extended middle class subsequences, Ci and C,e, (eq. 5, 8) up to a defined length.</Paragraph>
      <Paragraph position="2"> Every class subsequence Ci or C~ is first disambiguated based on a 1st order HMM, using the Viterbi algorithm (Viterbi, 1967; Rabiner, 1990) for efficiency, and then linked to its most probable tag subsequence ~ or T~ by means of the cross product operationS: Si -- Ci .x. T/ ---- c 1 :tl c2 :t2 ...... Cn :tn (10) 01) e. e S~ = C~ .x. 7~ = el.t1 c2:t2 ...... c, :t, In all extended middle subsequences S~n, e.g.:</Paragraph>
      <Paragraph position="4"/>
    </Section>
  </Section>
  <Section position="7" start_page="462" end_page="462" type="metho">
    <SectionTitle>
\[DET\] \[ADJ,NOUN\] \[ADJ, NOUN\] \[NOUN\]
DET ADJ ADJ NOUN
</SectionTitle>
    <Paragraph position="0"> the first class symbol on the upper side and the first tag symbol on the lower side, will be marked as an extension that does not really belong to the middle sequence but which is necessary to disambiguate it correctly. Example (12) becomes: s deg = = (13) TO</Paragraph>
  </Section>
  <Section position="8" start_page="462" end_page="463" type="metho">
    <SectionTitle>
O.\[DET\] \[ADJ,NOUN\] \[ADJ, NOUN\] \[NOUN\]
O.DET ADJ ADJ NOUN
</SectionTitle>
    <Paragraph position="0"> We then build the union uS i of all initial subsequences Si and the union uS~n of all extended middle subsequences S,e=, and formulate a preliminary sentence model: uS deg = ~S, uSdeg~* (14) in which all middle subsequences S deg are still marked and extended in the sense that all occurrences of all unambiguous classes are mentioned twice: Once unmarked as cu at the end of every sequence Ci or COn, 0 at the beginning and the second time marked as c u of every following sequence C deg . The upper side of the sentence model uSdeg describes the complete (but  of rare subsequences which would encrease the size of the transducer without contributing much to the tagging accuracy.</Paragraph>
    <Paragraph position="1"> extended) class sequences of possible sentences, and the lower side of uSdeg describes the corresponding (extended) tag sequences.</Paragraph>
    <Paragraph position="2"> To ensure a correct concatenation of initial and middle subsequences, we formulate a concatenation constraint for the classes:</Paragraph>
    <Paragraph position="4"> stating that every middle subsequence must begin 0 with the same marked unambiguous class % (e.g.</Paragraph>
    <Paragraph position="5"> 0.\[DET\]) which occurs unmarked as c~ (e.g. \[DET\]) at the end of the preceding subsequence since both symbols refer to the same occurrence of this unambiguous class.</Paragraph>
    <Paragraph position="6"> Having ensured correct concatenation, we delete all marked classes on the upper side of the relation by means of and all marked tags on the lower side by means of By composing the above relations with the preliminary sentence model, we obtain the final sentence modelS:</Paragraph>
    <Paragraph position="8"> We call the model an s-type model, the corresponding FST an s-type transducer, and the whole algorithm leading from the HMMto the transducer, an s-type approximation of an HMM.</Paragraph>
    <Paragraph position="9"> The s-type transducer tags any corpus which contains only known subsequences, in exactly the same way, i.e. with the same errors, as the corresponding HMM tagger does. However, since an s-type transducer is incomplete, it cannot tag sentences with one or more class subsequences not contained in the union of the initial or middle subsequences.</Paragraph>
    <Section position="1" start_page="462" end_page="463" type="sub_section">
      <SectionTitle>
3.3 Completion of an s-Type Transducer
</SectionTitle>
      <Paragraph position="0"> An incomplete s-type transducer S can be completed with subsequences from an auxiliary, complete n-type transducer N as follows: First, we extract the union of initial and the union of extended middle subsequences, u u e Si and s Sm from the primary s-type transducer S, and the unions ~Si  and ~S,~ from the auxiliary n-type transducer N. To extract the union degS i of initial subsequences we use the following filter:</Paragraph>
      <Paragraph position="2"> where the transducer N is first converted into llevel format 7, then composed with the filter Fs, (eq.</Paragraph>
      <Paragraph position="3"> 19). We extract the lower side of this composition, where every sequence of N.1L remains unchanged from the beginning up to the first occurrence of an unambiguous class c,. Every following symbol is mapped to the empty string by means of \[? :\[ \]\].</Paragraph>
      <Paragraph position="4"> (eq. 19). Finally, the extracted lower side is again converted into 2-level format 7.</Paragraph>
      <Paragraph position="5"> The extraction of the union uSe of extended middie subsequences is performed in a similar way.</Paragraph>
      <Paragraph position="6"> We then make the joint unions of initial and extended middle subsequences 5 :</Paragraph>
      <Paragraph position="8"> In both cases (eq. 21 and 22) we union all subsequences from the principal model S, with all those subsequences from the auxiliary model N that are not in S.</Paragraph>
      <Paragraph position="9"> Finally, we generate the completed s+n-typc transducer from the joint unions of subsequences uSi and uS~n , as decribed above (eq. 14-18).</Paragraph>
      <Paragraph position="10"> A transducer completed in this way, disambiguates all subsequences known to the principal incomplete s-type model, exactly as the underlying HMM does, and all other subsequences as the auxiliary n-type model does.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="463" end_page="463" type="metho">
    <SectionTitle>
4 An Implemented Finite-State
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="463" end_page="463" type="sub_section">
      <SectionTitle>
Tagger
</SectionTitle>
      <Paragraph position="0"> The implemented tagger requires three transducers which represent a lexicon, a guesser and any above mentioned approximation of an HMM.</Paragraph>
      <Paragraph position="1"> All three transducers are sequential, i.e. deterministic on the input side.</Paragraph>
      <Paragraph position="2"> Both the lexicon and guesser unambiguously map a surface form of any word that they accept to the corresponding class of tags (fig. 2, col. 1 and 2): ~l-Level and 2-level format are explained in the anflex. null First, the word is looked for in the lexicon. If this fails, it is looked for in the guesser. If this equally fails, it gets the label \[UNKNOWN\] which associates the word with the tag class of unknown words. Tag probabilities in this class are approximated by tags of words that appear only once in the training corpus. null As soon as an input token gets labelled with the tag class of sentence end symbols (fig. 2: \[SENT\]), the tagger stops reading words from the input. At this point, the tagger has read and stored the words of a whole sentence (fig. 2, col. 1) and generated the corresponding sequence of classes (fig. 2, col. 2). The class sequence is now deterministically mapped to a tag sequence (fig. 2, col. 3) by means of the HMM transducer. The tagger outputs the stored word and tag sequence of the sentence, and continues in the same way with the remaining sentences of the corpus.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML