File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/j03-1002_metho.xml

Size: 29,929 bytes

Last Modified: 2025-10-06 14:08:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-1002">
  <Title>c(c) 2003 Association for Computational Linguistics A Systematic Comparison of Various Statistical Alignment Models</Title>
  <Section position="5" start_page="23" end_page="29" type="metho">
    <SectionTitle>
). (For
</SectionTitle>
    <Paragraph position="0"> the sake of simplicity, we shall drop the index th if it is not explicitly needed.) Later in the article, we evaluate the quality of this Viterbi alignment by comparing it to a manually produced reference alignment. The parameters of the statistical alignment models are optimized with respect to a maximum-likelihood criterion, which is not necessarily directly related to alignment quality. Such an approach, however, requires training with manually defined alignments, which is not done in the research presented in this article. Experimental evidence shows (Section 6) that the statistical alignment models using this parameter estimation technique do indeed obtain a good alignment quality.</Paragraph>
    <Paragraph position="1"> In this paper, we use Models 1 through 5 described in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model described in Vogel, Ney, and Tillmann (1996) and Och and Ney (2000), and a new alignment model, which we call Model 6. All these models use a different decomposition of the probability  use a function of the similarity between the types of the two languages (Smadja, Mc-Keown, and Hatzivassiloglou 1996; Ker and Chang 1997; Melamed 2000). Frequently, variations of the Dice coefficient (Dice 1945) are used as this similarity function. For each sentence pair, a matrix including the association scores between every word at every position is then obtained: dice(i, j)=</Paragraph>
    <Paragraph position="3"> Computational Linguistics Volume 29, Number 1 C(e, f) denotes the co-occurrence count of e and f in the parallel training corpus. C(e) and C(f) denote the count of e in the target sentences and the count of f in the source sentences, respectively. From this association score matrix, the word alignment is then obtained by applying suitable heuristics. One method is to choose as alignment a</Paragraph>
    <Paragraph position="5"> A refinement of this method is the competitive linking algorithm (Melamed 2000).</Paragraph>
    <Paragraph position="6"> In a first step, the highest-ranking word position (i, j) is aligned. Then, the corresponding row and column are removed from the association score matrix. This procedure is iteratively repeated until every source or target language word is aligned. The advantage of this approach is that indirect associations (i.e., words that co-occur often but are not translations of each other) occur less often. The resulting alignment contains only one-to-one alignments and typically has a higher precision than the heuristic model defined in equation (7).</Paragraph>
    <Section position="1" start_page="24" end_page="24" type="sub_section">
      <SectionTitle>
2.1.3 A Comparison of Statistical Models and Heuristic Models. The main advan-
</SectionTitle>
      <Paragraph position="0"> tage of the heuristic models is their simplicity. They are very easy to implement and understand. Therefore, variants of the heuristic models described above are widely used in the word alignment literature.</Paragraph>
      <Paragraph position="1"> One problem with heuristic models is that the use of a specific similarity function seems to be completely arbitrary. The literature contains a large variety of different scoring functions, some including empirically adjusted parameters. As we show in Section 6, the Dice coefficient results in a worse alignment quality than the statistical models.</Paragraph>
      <Paragraph position="2"> In our view, the approach of using statistical alignment models is more coherent. The general principle for coming up with an association score between words results from statistical estimation theory, and the parameters of the models are adjusted such that the likelihood of the models on the training corpus is maximized.</Paragraph>
    </Section>
    <Section position="2" start_page="24" end_page="26" type="sub_section">
      <SectionTitle>
2.2 Statistical Alignment Models
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> Using this decomposition, we obtain three different probabilities: a length probability  ). In the hidden Markov alignment model, we assume a first-order dependence for the alignments a j and that the lexicon probability depends only on the word at position a</Paragraph>
      <Paragraph position="4"> Och and Ney Comparison of Statistical Alignment Models Later in the article, we describe a refinement with a dependence on e a j[?]1 in the alignment model. Putting everything together and assuming a simple length model</Paragraph>
      <Paragraph position="6"> with the alignment probability p(i  |i prime , I) and the translation probability p(f  |e). To make the alignment parameters independent of absolute word positions, we assume that the alignment probabilities p(i  |i prime , I) depend only on the jump width (i[?]i prime ). Using a set of non-negative parameters {c(i[?]i prime )}, we can write the alignment probabilities in the form</Paragraph>
      <Paragraph position="8"> This form ensures that the alignment probabilities satisfy the normalization constraint for each conditioning word position i</Paragraph>
      <Paragraph position="10"> = 1,..., I. This model is also referred to as a homogeneous HMM (Vogel, Ney, and Tillmann 1996). A similar idea was suggested by Dagan, Church, and Gale (1993).</Paragraph>
      <Paragraph position="11"> In the original formulation of the hidden Markov alignment model, there is no empty word that generates source words having no directly aligned target word. We introduce the empty word by extending the HMM network by I empty words e</Paragraph>
      <Paragraph position="13"> has a corresponding empty word e i+I (i.e., the position of the empty word encodes the previously visited target word). We enforce the following constraints on the transitions in the HMM network (i [?] I, i prime [?] I) involving the empty word e</Paragraph>
      <Paragraph position="15"> is the probability of a transition to the empty word, which has to be optimized on held-out data. In our experiments, we set p  = 0.2.</Paragraph>
      <Paragraph position="16"> Whereas the HMM is based on first-order dependencies p(i = a</Paragraph>
      <Paragraph position="18"> Hence, the word order does not affect the alignment probability.</Paragraph>
      <Paragraph position="20"> ) is the Kronecker function, which is one if i = i prime and zero otherwise.</Paragraph>
      <Paragraph position="21">  Computational Linguistics Volume 29, Number 1 To reduce the number of alignment parameters, we ignore the dependence on J in the alignment model and use a distribution p(a</Paragraph>
      <Paragraph position="23"/>
    </Section>
    <Section position="3" start_page="26" end_page="29" type="sub_section">
      <SectionTitle>
2.3 Fertility-Based Alignment Models
</SectionTitle>
      <Paragraph position="0"> In the following, we give a short description of the fertility-based alignment models of Brown, Della Pietra, Della Pietra, and Mercer (1993). A gentle introduction can be found in Knight (1999b).</Paragraph>
      <Paragraph position="1"> The fertility-based alignment models (Models 3, 4, and 5) (Brown, Della Pietra, Della Pietra, and Mercer 1993) have a significantly more complicated structure than the simple Models 1 and 2. The fertility ph</Paragraph>
      <Paragraph position="3"> in position i is defined as the number of aligned source words:</Paragraph>
      <Paragraph position="5"> The fertility-based alignment models contain a probability p(ph  |e) that the target word e is aligned to ph words. By including this probability, it is possible to explicitly describe the fact that for instance the German word &amp;quot;ubermorgen produces four English words (the day after tomorrow). In particular, the fertility ph = 0 is used for prepositions or articles that have no direct counterpart in the other language.</Paragraph>
      <Paragraph position="6"> To describe the fertility-based alignment models in more detail, we introduce, as an alternative alignment representation, the inverted alignments, which define a mapping from target to source positions rather than the other way around. We allow several positions in the source language to be covered; that is, we consider alignments B of the form</Paragraph>
      <Paragraph position="8"> [?]{1,..., j,..., J}. (20) An important constraint for the inverted alignment is that all positions of the source sentence must be covered exactly once; that is, the B i have to form a partition of the set {1,..., j,..., J}. The number of words ph</Paragraph>
      <Paragraph position="10"> refers to the kth element of B i in ascending order. The inverted alignments B</Paragraph>
      <Paragraph position="12"> As might be seen from this equation, we have tacitly assumed that the set B  of words aligned with the empty word is generated only after the nonempty positions have 2 The original description of the fertility-based alignment models in Brown, Della Pietra, Della Pietra, and Mercer (1993) includes a more refined derivation of the fertility-based alignment models.  Och and Ney Comparison of Statistical Alignment Models been covered. The distribution p(B</Paragraph>
      <Paragraph position="14"> ) is different for Models 3, 4, and 5: * In Model 3, the dependence of B</Paragraph>
      <Paragraph position="16"> We obtain an (inverted) zero-order alignment model p(j  |i, J).</Paragraph>
      <Paragraph position="17"> * In Model 4, every word is dependent on the previous aligned word and on the word classes of the surrounding words. First, we describe the dependence of alignment positions. (The dependence on word classes is for now ignored and will be introduced later.) We have two (inverted) first-order alignment models: p  ([?]j |***) and p  ([?]j |***). The difference between this model and the first-order alignment model in the HMM lies in the fact that here we now have a dependence along the j-axis instead of a dependence along the i-axis. The model p</Paragraph>
      <Paragraph position="19"> denotes the average of all elements in B r(i) .</Paragraph>
      <Paragraph position="20"> * Both Model 3 and Model 4 ignore whether or not a source position has been chosen. In addition, probability mass is reserved for source positions outside the sentence boundaries. For both of these reasons, the probabilities of all valid alignments do not sum to unity in these two models. Such models are called deficient (Brown, Della Pietra, Della Pietra, and Mercer 1993). Model 5 is a reformulation of Model 4 with a suitably refined alignment model to avoid deficiency. (We omit the specific formula. We note only that the number of alignment parameters for Model 5 is significantly larger than for Model 4.) Models 3, 4, and 5 define the probability p(B  is associated with the number of words that are aligned with the empty word. There are ph  ! ways to order the ph  words produced by the empty word, and hence, the alignment model of the empty word is nondeficient. As we will  Computational Linguistics Volume 29, Number 1 see in Section 3.2, this creates problems for Models 3 and 4. Therefore, we modify Models 3 and 4 slightly by replacing ph  As a result of this modification, the alignment models for both nonempty words and the empty word are deficient.</Paragraph>
      <Paragraph position="21"> 2.3.1 Model 6. As we shall see, the alignment models with a first-order dependence (HMM, Models 4 and 5) produce significantly better results than the other alignment models. The HMM predicts the distance between subsequent source language positions, whereas Model 4 predicts the distance between subsequent target language positions. This implies that the HMM makes use of locality in the source language, whereas Model 4 makes use of locality in the target language. We expect to achieve better alignment quality by using a model that takes into account both types of dependencies. Therefore, we combine HMM and Model 4 in a log-linear way and call the resulting model Model 6:  Here, the interpolation parameter a is employed to weigh Model 4 relative to the hidden Markov alignment model. In our experiments, we use Model 4 instead of Model 5, as it is significantly more efficient in training and obtains better results. In general, we can perform a log-linear combination of several models p</Paragraph>
      <Paragraph position="23"> are determined in such a way that the alignment quality on held-out data is optimized.</Paragraph>
      <Paragraph position="24"> We use a log-linear combination instead of the simpler linear combination because the values of Pr(f, a  |e) typically differ by orders of magnitude for HMM and Model 4. In such a case, we expect the log-linear combination to be better than a linear combination.</Paragraph>
      <Paragraph position="25">  5, it is straightforward to extend the alignment parameters to include a dependence on the word classes of the surrounding words (Och and Ney 2000). In the hidden Markov alignment model, we allow for a dependence of the position a j on the class of the preceding target word C(e</Paragraph>
      <Paragraph position="27"> )). Similarly, we can include dependencies on source and target word classes in Models 4 and 5 (Brown, Della Pietra, Della Pietra, and Mercer 1993). The categorization of the words into classes (here: 50 classes) is performed automatically by using the statistical learning procedure described in Kneser and Ney (1993).</Paragraph>
      <Paragraph position="28"> 2.3.3 Overview of Models. The main differences among the statistical alignment models lie in the alignment model they employ (zero-order or first-order), the fertility model they employ, and the presence or absence of deficiency. In addition, the models differ with regard to the efficiency of the E-step in the EM algorithm (Section 3.1). Table 1 offers an overview of the properties of the various alignment models.</Paragraph>
    </Section>
    <Section position="4" start_page="29" end_page="29" type="sub_section">
      <SectionTitle>
2.4 Computation of the Viterbi Alignment
</SectionTitle>
      <Paragraph position="0"> We now develop an algorithm to compute the Viterbi alignment for each alignment model. Although there exist simple polynomial algorithms for the baseline Models 1 and 2, we are unaware of any efficient algorithm for computing the Viterbi alignment for the fertility-based alignment models.</Paragraph>
      <Paragraph position="1"> For Model 2 (also for Model 1 as a special case), we obtain  different alignments decomposes into J maximizations of (I + 1) lexicon probabilities. Similarly, the Viterbi alignment for Model 2 can be computed with a complexity of O(I * J).</Paragraph>
      <Paragraph position="2"> Finding the optimal alignment for the HMM is more complicated than for Model 1 or Model 2. Using a dynamic programming approach, it is possible to obtain the Viterbi alignment for the HMM with a complexity of O(I  *J) (Vogel, Ney, and Tillmann 1996).</Paragraph>
      <Paragraph position="3"> For the refined alignment models, however, namely, Models 3, 4, 5, and 6, maximization over all alignments cannot be efficiently carried out. The corresponding search problem is NP-complete (Knight 1990a). For short sentences, a possible solution could be an A* search algorithm (Och, Ueffing, and Ney 2001). In the work presented here, we use a more efficient greedy search algorithm for the best alignment, as suggested in Brown, Della Pietra, Della Pietra, and Mercer (1993). The basic idea is to compute the Viterbi alignment of a simple model (such as Model 2 or HMM). This alignment is then iteratively improved with respect to the alignment probability of the refined alignment model. (For further details on the greedy search algorithm, see Brown, Della Pietra, Della Pietra, and Mercer [1993].) In the Appendix, we present methods for performing an efficient computation of this pseudo-Viterbi alignment.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="29" end_page="32" type="metho">
    <SectionTitle>
3. Training
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="29" end_page="31" type="sub_section">
      <SectionTitle>
3.1 EM Algorithm
</SectionTitle>
      <Paragraph position="0"> In this section, we describe our approach to determining the model parameters th.</Paragraph>
      <Paragraph position="1"> Every model has a specific set of free parameters. For example, the parameters th for</Paragraph>
      <Paragraph position="3"> To train the model parameters th, we use a maximum-likelihood approach, as described in equation (4), by applying the EM algorithm (Baum 1972). The different models are trained in succession on the same data; the final parameter values of a simpler model serve as the starting point for a more complex model.</Paragraph>
      <Paragraph position="4"> In the E-step of Model 1, the lexicon parameter counts for one sentence pair (e, f) are calculated:</Paragraph>
      <Paragraph position="6"> Here, N(e, f) is the training corpus count of the sentence pair (f, e). In the M-step, the lexicon parameters are computed:</Paragraph>
      <Paragraph position="8"> Similarly, the alignment and fertility probabilities can be estimated for all other alignment models (Brown, Della Pietra, Della Pietra, and Mercer 1993). When bootstrapping from a simpler model to a more complex model, the simpler model is used to weigh the alignments, and the counts are accumulated for the parameters of the more complex model.</Paragraph>
      <Paragraph position="9"> In principle, the sum over all (I+1) J alignments has to be calculated in the E-step.</Paragraph>
      <Paragraph position="10"> Evaluating this sum by explicitly enumerating all alignments would be infeasible. Fortunately, Models 1 and 2 and HMM have a particularly simple mathematical form such that the EM algorithm can be implemented efficiently (i.e., in the E-step, it is possible to efficiently evaluate all alignments). For the HMM, this is referred to as the Baum-Welch algorithm (Baum 1972).</Paragraph>
      <Paragraph position="11"> Since we know of no efficient way to avoid the explicit summation over all alignments in the EM algorithm in the fertility-based alignment models, the counts are collected only over a subset of promising alignments. For Models 3 to 6, we perform the count collection only over a small number of good alignments. To keep the training fast, we consider only a small fraction of all alignments. We compare three different methods for using subsets of varying sizes: * The simplest method is to perform Viterbi training using only the best alignment found. As the Viterbi alignment computation itself is very time consuming for Models 3 to 6, the Viterbi alignment is computed only approximately, using the method described in Brown, Della Pietra, Della Pietra, and Mercer (1993).</Paragraph>
      <Paragraph position="12"> * Al-Onaizan et al. (1999) suggest using as well the neighboring alignments of the best alignment found. (For an exact definition of the neighborhood of an alignment, the reader is referred to the Appendix.) * Brown, Della Pietra, Della Pietra, and Mercer (1993) use an even larger set of alignments, including also the pegged alignments, a large set of alignments with a high probability Pr(f  Och and Ney Comparison of Statistical Alignment Models In Section 6, we show that by using the HMM instead of Model 2 in bootstrapping the fertility-based alignment models, the alignment quality can be significantly improved. In the Appendix, we present an efficient training algorithm of the fertility-based alignment models.</Paragraph>
    </Section>
    <Section position="2" start_page="31" end_page="31" type="sub_section">
      <SectionTitle>
3.2 Is Deficiency a Problem?
</SectionTitle>
      <Paragraph position="0"> When using the EM algorithm on the standard versions of Models 3 and 4, we observe that during the EM iterations more and more words are aligned with the empty word.</Paragraph>
      <Paragraph position="1"> This results in a poor alignment quality, because too many words are aligned to the empty word. This progressive increase in the number of words aligned with the empty word does not occur when the other alignment models are used. We believe that this is due to the deficiency of Model 3 and Model 4.</Paragraph>
      <Paragraph position="2"> The use of the EM algorithm guarantees that the likelihood increases for each iteration. This holds for both deficient and nondeficient models. For deficient models, however, as the amount of deficiency in the model is reduced, the likelihood increases.</Paragraph>
      <Paragraph position="3"> In Models 3 and 4 as defined in Brown, Della Pietra, Della Pietra, and Mercer (1993), the alignment model for nonempty words is deficient, but the alignment model for the empty word is nondeficient. Hence, the EM algorithm can increase likelihood by simply aligning more and more words with the empty word.</Paragraph>
      <Paragraph position="4">  Therefore, we modify Models 3 and 4 slightly, such that the empty word also has a deficient alignment model. The alignment probability is set to p(j  |i, J)=1/J for each source word aligned with the empty word. Another remedy, adopted in Och and Ney (2000), is to choose a value for the parameter p  of the empty-word fertility and keep it fixed.</Paragraph>
    </Section>
    <Section position="3" start_page="31" end_page="32" type="sub_section">
      <SectionTitle>
3.3 Smoothing
</SectionTitle>
      <Paragraph position="0"> To overcome the problem of overfitting on the training data and to enable the models to cope better with rare words, we smooth the alignment and fertility probabilities. For the alignment probabilities of the HMM (and similarly for Models 4 and 5), we perform an interpolation with a uniform distribution p(i  |j, I)=1/I using an interpolation</Paragraph>
      <Paragraph position="2"> For the fertility probabilities, we assume that there is a dependence on the number of letters g(e) of e and estimate a fertility distribution p(ph  |g) using the EM algorithm.</Paragraph>
      <Paragraph position="3"> Typically, longer words have a higher fertility. By making this assumption, the model can learn that the longer words usually have a higher fertility than shorter words.</Paragraph>
      <Paragraph position="4"> Using an interpolation parameter b, the fertility distribution is then computed as</Paragraph>
      <Paragraph position="6"> Here, n(e) denotes the frequency of e in the training corpus. This linear interpolation ensures that for frequent words (i.e., n(e) greatermuch b), the specific distribution p(ph  |e) dominates, and that for rare words (i.e., n(e) lessmuch b), the general distribution p(ph  |g(e)) dominates.</Paragraph>
      <Paragraph position="7"> The interpolation parameters a and b are determined in such a way that the</Paragraph>
    </Section>
    <Section position="4" start_page="32" end_page="32" type="sub_section">
      <SectionTitle>
3.4 Bilingual Dictionary
</SectionTitle>
      <Paragraph position="0"> A conventional bilingual dictionary can be considered an additional knowledge source that can be used in training. We assume that the dictionary is a list of word strings (e, f). The entries for each language can be a single word or an entire phrase.</Paragraph>
      <Paragraph position="1"> To integrate a dictionary into the EM algorithm, we compare two different methods: * Brown, Della Pietra, Della Pietra, Goldsmith, et al. (1993) developed a multinomial model for the process of constructing a dictionary (by a human lexicographer). By applying suitable simplifications, the method boils down to adding every dictionary entry (e, f) to the training corpus with an entry-specific count called effective multiplicity, expressed as u(e, f):</Paragraph>
      <Paragraph position="3"> In this section, l(e) is an additional parameter describing the size of the sample that is used to estimate the model p(f  |e). This count is then used instead of N(e, f) in the EM algorithm as shown in equation (35).</Paragraph>
      <Paragraph position="4"> * Och and Ney (2000) suggest that the effective multiplicity of a dictionary entry be set to a large value u + greatermuch 1 if the lexicon entry actually occurs in one of the sentence pairs of the bilingual corpus and to a low value  As a result, only dictionary entries that indeed occur in the training corpus have a large effect in training. The motivation behind this is to avoid a deterioration of the alignment as a result of out-of-domain dictionary entries. Every entry in the dictionary that does co-occur in the training corpus can be assumed correct and should therefore obtain a high count. We set u</Paragraph>
      <Paragraph position="6"/>
    </Section>
  </Section>
  <Section position="7" start_page="32" end_page="32" type="metho">
    <SectionTitle>
4. Symmetrization
</SectionTitle>
    <Paragraph position="0"> In this section, we describe various methods for performing a symmetrization of our directed statistical alignment models by applying a heuristic postprocessing step that combines the alignments in both translation directions (source to target, target to source).</Paragraph>
    <Paragraph position="1"> The baseline alignment model does not allow a source word to be aligned with more than one target word. Therefore, lexical correspondences like that of the German compound word Zahnarzttermin with the English dentist's appointment cause problems, because a single source word must be mapped to two or more target words. Therefore, the resulting Viterbi alignment of the standard alignment models has a systematic loss in recall.</Paragraph>
    <Paragraph position="2"> To solve this problem, we perform training in both translation directions (source to target, target to source). As a result, we obtain two alignments a</Paragraph>
    <Paragraph position="4"> denote the sets of alignments in the two Viterbi alignments. To increase the quality of the alignments, we combine A  horizontal and vertical neighbors.</Paragraph>
    <Paragraph position="5"> Obviously, the intersection of the two alignments yields an alignment consisting of only one-to-one alignments with a higher precision and a lower recall than either one separately. The union of the two alignments yields a higher recall and a lower precision of the combined alignment than either one separately. Whether a higher precision or a higher recall is preferred depends on the final application for which the word alignment is intended. In applications such as statistical machine translation (Och, Tillmann, and Ney 1999), a higher recall is more important (Och and Ney 2000), so an alignment union would probably be chosen. In lexicography applications, we might be interested in alignments with a very high precision obtained by performing an alignment intersection.</Paragraph>
  </Section>
  <Section position="8" start_page="32" end_page="32" type="metho">
    <SectionTitle>
5. Evaluation Methodology
</SectionTitle>
    <Paragraph position="0"> In the following, we present an annotation scheme for single-word-based alignments and a corresponding evaluation criterion.</Paragraph>
    <Paragraph position="1"> It is well known that manually performing a word alignment is a complicated and ambiguous task (Melamed 1998). Therefore, in performing the alignments for the research presented here, we use an annotation scheme that explicitly allows for ambiguous alignments. The persons conducting the annotation are asked to specify alignments of two different kinds: an S (sure) alignment, for alignments that are unambiguous, and a P (possible) alignment, for ambiguous alignments. The P label is used especially to align words within idiomatic expressions and free translations and missing function words (S [?] P).</Paragraph>
    <Paragraph position="2"> The reference alignment thus obtained may contain many-to-one and one-to-many relationships. Figure 2 shows an example of a manually aligned sentence with S and P labels.</Paragraph>
    <Paragraph position="3"> The quality of an alignment A = {(j, a</Paragraph>
    <Paragraph position="5"> redefined precision and recall measures:  A manual alignment with S (filled squares) and P (unfilled squares) connections. These definitions of precision, recall and the AER are based on the assumption that a recall error can occur only if an S alignment is not found and a precision error can occur only if the found alignment is not even P.</Paragraph>
    <Paragraph position="6"> The set of sentence pairs for which the manual alignment is produced is randomly selected from the training corpus. It should be emphasized that all the training of the models is performed in a completely unsupervised way (i.e., no manual alignments are used). From this point of view, there is no need to have a test corpus separate from the training corpus.</Paragraph>
    <Paragraph position="7"> Typically, the annotation is performed by two human annotators, producing sets  . To increase the quality of the resulting reference alignment, the annotators are presented with the mutual errors and asked to improve their alignments where possible. (Mutual errors of the two annotators A and B are the errors in the alignment of annotator A if we assume the alignment of annotator B as reference and the errors in the alignment of annotator B if we assume the alignment of annotator A as reference.) From these alignments, we finally generate a reference alignment that contains only those S connections on which both annotators agree and all P connections from both annotators. This can be accomplished by forming the intersection of the sure alignments (S = S  ), respectively. By generating the reference alignment in this way, we obtain an alignment error rate of 0 percent when we compare the S alignments of every single annotator with the combined reference alignment.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML