File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0904_metho.xml
Size: 7,857 bytes
Last Modified: 2025-10-06 14:10:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0904"> <Title>Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 25-32, Ann Arbor, June 2005. c(c)2005 Association for Computational Linguistics Syntactic Features for Evaluation of Machine Translation</Title> <Section position="3" start_page="25" end_page="27" type="metho"> <SectionTitle> 2 Evaluating Machine Translation with Syntactic Features </SectionTitle> <Paragraph position="0"> In order to give a clear and direct evaluation for the uency of a sentence, syntax trees are used to generate metrics based on the similarity of the MT hypothesis's tree and those of the references. We can't expect that the whole syntax tree of the hypothesis can always be found in the references, thus our approach is to be based on the fractions of the subtrees which also appear in the reference syntax trees. This idea is intuitively derived from BLEU, but with the consideration of the sparse subtrees which lead to zero fractions, we average the fractions in the arithmetic mean, instead of the geometric mean used in BLEU. Then for each hypothesis, the fractions of subtrees with different depths are calculated and their arithmetic mean is computed as the syntax tree based metric, which we denote as subtree metric STM:</Paragraph> <Paragraph position="2"> where D is the maximum depth of subtrees considered, count(t) denotes the number of times sub-tree t appears in the candidate's syntax tree, and countclip(t) denotes the clipped number of times t appears in the references' syntax trees. Clipped here means that, for a given subtree, the count computed from the hypothesis syntax tree can not exceed the maximum number of times the subtree occurs in any single reference's syntax tree. A simple example with one hypothesis and one reference is shown in Figure 2. Setting the maximum depth to 3, we go through the hypothesis syntax tree and compute the fraction of subtrees with different depths. For the 1-depth subtrees, we get S, NP, VP, PRON, V, NP which also appear in the reference syntax tree.</Paragraph> <Paragraph position="3"> Since PRON only occurs once in the reference, its clipped count should be 1 rather than 2. Then we get 6 out of 7 for the 1-depth subtrees. For the 2-depth subtrees, we get S-NP VP, NP-PRON, and VP-V NP which also appear in the reference syntax tree. For the same reason, the subtree NP-PRON can only be counted once. Then we get 3 out of 4 for the 2-depth subtree. Similarly, the fraction of 3-depth subtrees is 1 out of 2. Therefore, the nal score of STM is (6/7+3/4+1/2)/3=0.702.</Paragraph> <Paragraph position="4"> While the subtree overlap metric de ned above considers only subtrees of a xed depth, subtrees of other con gurations may be important for discriminating good hypotheses. For example, we may want to look for the subtree:</Paragraph> <Paragraph position="6"> to nd sentences with transitive verbs, while ignoring the internal structure of the subject noun phrase.</Paragraph> <Paragraph position="7"> In order to include subtrees of all con gurations in our metric, we turn to convolution kernels on our trees. Using H(x) to denote the vector of counts of all subtrees found in tree x, for two trees T1 and T2, the inner product H(T1)*H(T2) counts the number of matching pairs of subtrees of T1 and T2. Collins and Duffy (2001) describe a method for ef ciently computing this dot product without explicitly computing the vectors H, which have dimensionality exponential in the size of the original tree. In order to derive a similarity measure ranging from zero to one, we use the cosine of the vectors H:</Paragraph> <Paragraph position="9"> we can compute the cosine similarity using the kernel method, without ever computing the entire of vector of counts H. Our kernel-based subtree metric TKM is then de ned as the maximum of the cosine measure over the references:</Paragraph> <Paragraph position="11"> The advantage of using the tree kernel is that it can capture the similarity of subtrees of different shapes; the weak point is that it can only use the reference trees one by one, while STM can use them simultaneously. The dot product also weights individual features differently than our other measures, which compute overlap in the same way as does BLEU. For example, if the same subtree occurs 10 times in both the hypothesis and the reference, this contributes a term of 100 to the dot product, rather than 10 in the clipped count used by BLEU and by our subtree metric STM.</Paragraph> <Section position="1" start_page="26" end_page="27" type="sub_section"> <SectionTitle> 2.1 Dependency-Based Metrics </SectionTitle> <Paragraph position="0"> Dependency trees consist of trees of head-modi er relations with a word at each node, rather than just at the leaves. Dependency trees were found to correspond better across translation pairs than constituent trees by Fox (2002), and form the basis of the machine translation systems of Alshawi et al. (2000) and Lin (2004). We derived dependency trees from the constituent trees by applying the deterministic headword extraction rules used by the parser of Collins (1999). For the example of the reference syntax tree in Figure 2, the whole tree with the root S represents a sentence; and the subtree NP-ART N represents a noun phrase. Then for every node in the syntax tree, we can determine its headword by its syntactic structure; from the subtree NP-ART N, for example, the headword selection rules chose the headword of NP to be word corresponding to the POS N in the subtree, and the other child, which corresponds to ART, is the modi er for the headword.</Paragraph> <Paragraph position="1"> The dependency tree then is a kind of structure constituted by headwords and every subtree represents the modi er information for its root headword. For example, the dependency tree of the sentence I have a red pen is shown as below.</Paragraph> <Paragraph position="2"> The dependency tree contains both the lexical and syntactic information, which inspires us to use it for the MT evaluation.</Paragraph> <Paragraph position="3"> Noticing that in a dependent tree the child nodes are the modi er of its parent, we propose a dependency-tree based metric by extracting the headwords chains from both the hypothesis and the reference dependency trees. A headword chain is a sequence of words which corresponds to a path in the dependency tree. Take the dependency tree in Figure 2 as the example, the 2-word headword chains include have I, have pen, pen a, and pen red. Before using the headword chains, we need to extract them out of the dependency trees. Figure 3 gives an algorithm which recursively extracts the headword chains in a dependency tree from short to long. Having the headword chains, the headword chain based metric is computed in a manner similar to BLEU, but using n-grams of dependency chains rather than n-grams in the linear order of the sentence. For every hypothesis, the fractions of head-word chains which also appear in the reference dependency trees are averaged as the nal score. Using HWCM to denote the headword chain based metric, it is computed as follows:</Paragraph> <Paragraph position="5"> where D is chosen as the maximum length chain considered.</Paragraph> <Paragraph position="6"> We may also wish to consider dependency relations over more than two words that are contiguous but not in a single ancestor chain in the dependency tree. For this reason, the two methods described in section 3.1 are used to compute the similarity of dependency trees between the MT hypothesis and its references, and the corresponding metrics are denoted DSTM for dependency subtree metric and DTKM for dependency tree kernel metric.</Paragraph> </Section> </Section> class="xml-element"></Paper>