File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1205_metho.xml
Size: 14,334 bytes
Last Modified: 2025-10-06 14:15:15
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-1205"> <Title>Look-Back and Look-Ahead in the Conversion of Hidden Markov Models into Finite State Transducers</Title> <Section position="4" start_page="0" end_page="11" type="metho"> <SectionTitle> 2 b-Type Approximation </SectionTitle> <Paragraph position="0"> This section presents a method that approximates a (first order) Hidden Markov Model (HMM) by a finite-state transducer (FST), called b-type approximation s. Regular expression operators used in this section are explained in the annex.</Paragraph> <Paragraph position="1"> Looking up, in a lexicon the word sequence of a sentence produces a unique sequence of ambiguity classes. Tagging the sentence by means of a (first order) ttMM consists of finding the most probable tag sequence T given this class sequence C (eq. 1, fig. 1). The joint probability of the sequences C and T can be estimated by: p(C, T) = p(ci .... c.,tz .... tn) =</Paragraph> <Paragraph position="3"> (2)</Paragraph> <Section position="1" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 2.1 Basic Idea </SectionTitle> <Paragraph position="0"> The determination of a tag of a particular word cannot be made separately from the other tags. Tags can influence each other over a long distance via transition probabilities.</Paragraph> <Paragraph position="1"> In this approach, an ambiguity class is disambiguated with respect to a context. A context consists of a sequence of ambiguity classes limited at both ends by some selected tag 4. For the left context of length/3 we use the term look-back, and for the right context of length a we use the term lookahead. null two selected tags a look-ahead distance of a = 2. Actually, the two selected tags t~_ 2 and t~+ 2 allow not only the disambiguation of the class ci but of all classes inbetween, i.e. ci-t, ci and ci+l.</Paragraph> <Paragraph position="2"> We approximate the tagging of a whole sentence by tagging subsequences with selected tags at both ends (fig. 1), and then overlapping them. The most probable paths in the tag space of a sentence, i.e. valid paths according to this approach, can be found as sketched in figure 2.</Paragraph> <Paragraph position="3"> space of a sentence A valid path consists of an ordered set of overlapping sequences .in which each member overlaps with its neighbour except for the first or last tag. There can be more than one valid path in the tag space of a sentence (fig. 2). Sets of sequences that do not overlap in such a way are incompatible according to this model, and do not constitute valid paths (fig. 3). In figure 1, the tag t~ can be selected from the class ci because it is between two selected tags d which are t~_ 2 at a look-back distance of fl = 2 and t~2+2 at ZName given by the author, to distinguish the algorithm from n-type and s-type approximation (Kempe, 1997). 4The algorithm is explained for a first order HMM. In the case of a second order HMM, b-type sequences must begin and end with two selected tags rather than one.</Paragraph> </Section> <Section position="2" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 2.2 b-Type Sequences </SectionTitle> <Paragraph position="0"> Given a length ~ of look-back and a length a of lookahead, we generate for every class co, every look-back sequence t_~ c-a+1 ... c-z, and every look-ahead sequence ci ... ca-1 ta, a b-type sequenced: t_~ c-,+z ... c-z co cl ... c~-z t~ (3) Kempe 30 Look-Back and Look-Ahead in the Conversion of HMMs</Paragraph> <Paragraph position="2"/> </Section> </Section> <Section position="5" start_page="11" end_page="11" type="metho"> <SectionTitle> CONJ \[DET, PRON\] lAD J, NOUN, VERB\] \[NOUI~, VERB\] VERB (4) </SectionTitle> <Paragraph position="0"> Each such original b-type sequence (eq. 3,4; fig. 4) is disambiguated based on a first order HMM. Here we use the Viterbi algorithm (Viterbi, 1967; Rabiner, 1990) for efficiency.</Paragraph> <Paragraph position="1"> look-back look-ahead -~, ~-I ... -I 0 I ... a-I a positions z:-.a/'~J\a a I a~a J a a(I/~; lt~-~ t_~-V, ...---- t_, ~ to----- t,----...--~ to.~-:r=~to 3 ~s transition probabili~&quot; b cla~ probabili~ (~r - &quot;(~r&quot; &quot;(~r&quot; &quot;(~ original b-t~pe sequence For an original b-type sequence, the joint probability of its class sequence C with its tag sequence T (fig. 4), can be estimated by:</Paragraph> <Paragraph position="3"> At every position in the look-back sequence and in the look-ahead sequence, a boundary # may occur, i.e. a sentence beginning or end. No look-back (~? = 0) or no look-ahead (a = 0) is also allowed.</Paragraph> <Paragraph position="4"> The above probability estimation (eq. 5) can then be expressed more generally (fig. 4) as:</Paragraph> <Paragraph position="6"> When the most likely tag sequence is found for an original b-type sequence, the class co in the middle position (eq. 3) is associated with its most likely tag to. We formulate constraints for the other tags t_ z and ta and classes c_z+1...c_ z and Cl...ca_ I of the original b-type sequence. Thus we obtain a tagged b-type sequence s.</Paragraph> <Paragraph position="7"> &quot; (14) - c_/~+l .-.C_ 2 C0:~0 C2- '&quot;~a-1 ta stating that to is the most probable tag in the class co if it is preceded by t B~ cS(Z-z)...cB2 c m and followed by c al cA:...c A(~-I) ta% In expression 14 the subscripts --/3 -B+I...0...~-I a denote the position of the tag or class in the b-type sequence, and the superscripts Bfl B(/~-I)...B1 and A1...A(o-1) Aa express constraints for preceding and following tags and classes which are part of other b-type sequences. In the exampleS:</Paragraph> <Paragraph position="9"> ADJ is the most likely tag in the class \[PS1~J,IY0trN,vFalB\] if it is preceded by the tag C0NJ two positions back (B2), by the class \[DET,PRON'I one position back (B1), and followed by the class I'NOUlY,VEI~\] one position ahead (A1) and by the tag VERB two positions ahead (A2).</Paragraph> <Paragraph position="10"> Boundaries are denoted by a particular symbol and can occur at the edge of the look-back and look-</Paragraph> <Paragraph position="12"> SRegular expression operators used in this article are explained in the annex.</Paragraph> <Paragraph position="13"> Kempe 31 Look-Back and Look-Ahead in the Conversion of HMMs</Paragraph> <Paragraph position="15"> Note that look-back of length ,3 and look-ahead of length a also include all sequences shorter than 3 or ~, respectively, that are limited by #.</Paragraph> <Paragraph position="16"> For a given length 3 of look-back and a length a of look-ahead, we generate every possible original b-type sequence (eq. 3), disambiguate it statistically (eq. 5-13), and encode the tagged b-type sequence Bi (eq. 14) as an FST. All sequences Bi are then</Paragraph> <Paragraph position="18"> and we generate a preliminary tagger model B&quot;</Paragraph> <Paragraph position="20"> where all sequences Bi can occur in any order and number (including zero times) because no constraints have yet been applied.</Paragraph> <Section position="1" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 2.3 Concatenation Constraints </SectionTitle> <Paragraph position="0"> To ensure a correct concatenation of sequences Bi, we have to make sure that every Bi is preceded and followed by other Bi according to what is encoded in the look-back and look-ahead constraints. E.g.</Paragraph> <Paragraph position="1"> the sequence in example (21) must be preceded by a sentence beginning, #, and the class \[DET,PRON\] and followed by the class \[NOON, VERB\] and the tag VERB.</Paragraph> <Paragraph position="2"> We create constraints for preceding and following tags, classes and sentence boundaries. For the lookback, a particular tag ti or class cj is required for a particular distance of 6 < -1, byS:</Paragraph> <Paragraph position="4"> for 6 < -1 with degt and degc being the union of all tags and all classes respectively.</Paragraph> <Paragraph position="5"> A sentence beginning, #, is required for a particular look-back distance of 6<-1, on the side of the tags, by: R'(#) =-\[ &quot;\[ \[\~t\], \[~t \[\~t\],\]'(-~-1)\] #8(-~ ?,\] (2r) for J < -1 In the case of look-ahead we require for a particular distance of 6 > 1, a particular tag ti or class cj or a sentence end, #, on the side of the tags, in a similar way by:</Paragraph> <Paragraph position="7"> All tags ti are required for the look-back only at the distance of 6 = -3 and for the look-ahead only at the distance of 6 = a. All classes cj are required for distances of 6 E \[-3 + 1, -1\] and 6 E \[1, a, - 1\]. Sentence boundaries, #, are required for distances of 6 E \[-3,-1\] and 6 E \[1, a\].</Paragraph> <Paragraph position="8"> We create the intersection Rt of all tag constraints, the intersection Re of all class constraints, and the intersection R# of all sentence boundary constraints:</Paragraph> <Paragraph position="10"> All constraints are enforced by composition with the preliminary tagger model B&quot; (eq. 24). The class constraint Rc is composed on the upper side of B&quot; which is the side of the classes (eq. 14), and both the tag constraint Rt and the boundary constraint 6 R# are composed on the lower side of B', which is the side of the tagsS: B'&quot; = Rc .o. B&quot; .o. Rt .o. R# (34) Having ensured correct concatenation, we delete all symbols r that have served to constrain tags, computed for and composed on the side of the classes. The transducer which encodes R# would then, however, be bigger because the number of classes is bigger than the number of tags.</Paragraph> <Paragraph position="11"> Kempe 32 Look-Back and Look-Ahead in the Conversion of HMMs</Paragraph> <Paragraph position="13"> By composing r B'&quot; (eq. 34) on the lower side with Dr and on the upper side with the inverted relation Dr.i, we obtain the final tagger model B:</Paragraph> <Paragraph position="15"> We call the model a b-type model, the corresponding FST a b-type transducer, and the whole algorithm leading from the HMM to the transducer, a b-type approximation of an HMM.</Paragraph> </Section> <Section position="2" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 2.4 Properties of b-Type Transducers </SectionTitle> <Paragraph position="0"> There are two groups of b-type transducers with different properties: FSTs without look-back and/or without look-ahead (19-a = 0) and FSTs with both look-back and look-ahead (8&quot;a > 0). Both accept any sequence of ambiguity classes.</Paragraph> <Paragraph position="1"> b-Type FSTs with $.cr =0 are always sequential.</Paragraph> <Paragraph position="2"> They map a class sequence that corresponds to the word sequence of a sentence, always to exactly one tag sequence. Their tagging accuracy and similarity with the underlying HMM increases with growing fl + or. A b-type FST with $ = 0 and a = 0 is equivalent to an nO-type FST, and with $ = 1 and a = 0 it is equivalent to an nl-type FST (Kempe, 1997).</Paragraph> <Paragraph position="3"> b-Type FSTs with $.a > 0 are in general not sequential. For a class sequence they deliver a set of different tag sequences, which means that the tagging results are ambiguous. This set is never empty, and the most probable tag sequence according to the underlying HMM is always in this set. The longer the look-back distance $ and the look-ahead distance a are, the larger the FST and the smaller the set of resulting tag sequences. For sufficiently large $+a, this set may contain always only one tag sequence.</Paragraph> <Paragraph position="4"> In this case the FST is equivalent to the underlying HMM. For reasons of size however, this FST may not be computable for particular HMMs (see. 4).</Paragraph> </Section> </Section> <Section position="6" start_page="11" end_page="11" type="metho"> <SectionTitle> 3 An Implemented Finite-State Tagger </SectionTitle> <Paragraph position="0"> The implemented tagger requires three transducers which represent a lexicon, a guesser and an approximation of an HMM mentioned above.</Paragraph> <Paragraph position="1"> Both the lexicon and guesser are sequential, i.e.</Paragraph> <Paragraph position="2"> deterministic on the input side. They both unambiguously map a surface form of any word that they accept to the corresponding ambiguity class (fig. 5, col. 1 and 2): First of all, the word is looked for in the rFor efficiency reasons, we actually do not delete the constraint symbols r by composition. We rather traverse the network, and overwrite every symbol r with the empty string symbol e. In the following determinization of the network, all ~ are eliminated.</Paragraph> <Paragraph position="3"> lexicon. If this fails, it is looked for in the guesser. If this equally fails, it gets the label \[UNKNOWN\] which denotes the ambiguity class of unknown words. Tag probabilities in this class are approximated by tags of words that appear only once in the training corpus. null As soon as an input token gets labeled with the tag class of sentence end symbols (fig. 5: \[SENT\] ), the tagger stops reading words from the input. At this point, the tagger has read and stored the words of a whole sentence (fig. 5, col. 1) and generated the corresponding sequence of classes (fig. 5, col. 2).</Paragraph> <Paragraph position="4"> The class sequence is now mapped to a tag sequence (fig. 5, col. 3) using the HMM transducer. A b-type FST is not sequential in general (sec. 2.4), so to obtain a unique tagging result, the finite-state tagger can be run in a special mode, where only the first, result found is retained, and the tagger does not look for other results s. Since paths through an FST have no particular order, the result retained is random.</Paragraph> <Paragraph position="5"> The tagger outputs the stored word and tag sequence of the sentence, and continues in the same way with the remaining sentences of the corpus.</Paragraph> <Paragraph position="6"> The tagger can be run in a statistical mode ,, here the number of tag sequences found per sentence is counted. These numbers give an overview of the degree of non-sequentiality of the concerned b-type transducer (sec. 2.4).</Paragraph> </Section> class="xml-element"></Paper>