File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1410_intro.xml

Size: 5,169 bytes

Last Modified: 2025-10-06 14:01:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1410">
  <Title>Machine Translation with Grammar Association: Some Improvements and the Loco C Model</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
a13a37a3a38a5a39a32a14a0a35a7 of
</SectionTitle>
    <Paragraph position="0"> the input sentence a5 , given the output one a0 . In practice, it is computed by using a statistical model of the reverse translation process. null This decomposition has the advantage of modularity in the modelling. An ad hoc statistical language model encapsulates the features that are inherent to the output language, while the reverse translation model can focus on relations between input and output words, assigning scores to sentence pairs without taking into account if the output sentence is well-formed.2 An alternative, direct statistical approach with a model for computing null  a13a48a3a49a0a50a32a21a5a34a7 seems to require this single model to be complex enough to assign high scores only to pairs where the output sentence verifies two conditions: it is well-formed and means the same that the input one. Hence, for the sake of simplified modelling, Bayes' decomposition has become a typical choice in Machine Translation.</Paragraph>
    <Paragraph position="1"> However, in the Grammar Association context, when developing (using Bayes' decomposition) the basic equations of the system presented in (Vidal et al., 1993), it is said that the reverse model for  a13a37a3a38a5a39a32a21a0a35a7 &amp;quot;does not seem to admit a simple factorization which is also correct and convenient&amp;quot;, so &amp;quot;crude heuristics&amp;quot; were adopted in the mathematical development of the expression to be maximized. We are going to show that, by means of a direct modelling, Grammar Association can be set into a rigorous statistical framework without renouncing a convenient factorization for the search of the optimal translation to be efficient. Moreover, the main advantage of Bayes' decomposition, modularity, is inherently present in Grammar Association systems: relations between input and output are mainly modelled by a (direct) statistical association model and structural features of the output language are modelled by a grammar, which restricts the search space for the best translation.</Paragraph>
    <Paragraph position="2"> 2Note that model behaviour for syntactically incorrect input sentences is not important because input sentence is known and the search is just over the output language. Let us begin assuming there are unambiguous grammars a51a52a45 and a51a53a43 describing, respectively, the input language a42a4a45 and the output one a42a10a43 . Thus, there is a one-to-one correspondence in each language relating sentences to their derivations and we can write</Paragraph>
    <Paragraph position="4"> where a56 a57 a3a49a63a10a7 denotes the only derivation of sentence a63 in grammar a51 . Moreover, let us suppose the output grammar is context-free and rewriting probability of an output non-terminal using a certain rule is independent of which other output rules have been employed in the output derivation. Then, it follows that the probability of an output derivation a56 a43 given an input one a56 a45 can be expressed as</Paragraph>
    <Paragraph position="6"> with a term in the sum for each participation of a rule a68 a43 in the derivation a56 a43 , and a69a79a71a31a73a80a74a76a3a38a68 a43 a7 denoting the left-hand side non-terminal of that rule. So, finally, we can find the most probable translation a0 a1 a3a31a5a8a7 of an input sentence a5 as the sentence associated to the output derivation given by</Paragraph>
    <Paragraph position="8"> a7 stands for the set of all possible derivations in a51 a43 .</Paragraph>
    <Paragraph position="9"> In practice, input and output grammars will be approximations inferred from samples and, more specifically, they will be acyclic finite-state automata. The restriction from context-free grammars to regular ones is due to the wide availability of inference techniques for these formal machines and to computational convenience. On the other hand, the output grammar has to be acyclic because of a more subtle point: the most probable derivation in the grammar will never make use of a cycle (no matter how high its probability is, avoiding the cycle always makes the derivation more probable). Hence, if we allowed the inference algorithm to model some features of the output language using cycles, system translations would never exhibit such features. Finally, for the sake of homogeneity, we choose to force input grammar to be acyclic too.</Paragraph>
    <Paragraph position="10"> We can conclude this section saying that, inferring deterministic and acyclic finite-state automata, if we are able to learn association models for estimating, for each output rule, the probability of using that rule conditioned on having employed its left-hand side and the identity of the input derivation, then an efficient Dynamic Programming search for the optimal output derivation3 can be used in order to provide the most probable translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML