File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1006_intro.xml
Size: 4,906 bytes
Last Modified: 2025-10-06 14:06:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1006"> <Title>Automatic Acquisition of Hierarchical Transduction Models for Machine Translation</Title> <Section position="3" start_page="0" end_page="42" type="intro"> <SectionTitle> 2 Overview </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="41" type="sub_section"> <SectionTitle> 2.1 Lexical head transducers </SectionTitle> <Paragraph position="0"> In our training method, we follow the simple lexical head transduction model described by Alshawi (1996b) which can be regarded as a type of statistical dependency grammar trans- null duction. This type of transduction model consists of a collection of head transducers; the purpose of a particular transducer is to translate a specific source word w into a target word v, and further to translate the pair of sequences of dependent words to the left and right of w to sequences of dependents to the left and right of c. When applied recursively, a set of such transducerb effects a hierarchical transduction of the source string into the target string.</Paragraph> <Paragraph position="1"> A distinguishing property of head transducers, as compared to 'standard' finite state transducers is that they perform a transduction outwards from a 'head' word in the input string rather than by traversing the input string from left to right. A head transducer for translating source word tt, to target word v consists of a set of states qo(w : v),ql(w : v),q~(w : v),.., and transitions of the form: (qi(w : v), qj(w : v), Wd, Vd, a', ~) where the transition is from state qi(w : v) to state qj(w : v), reading the next source dependent Wd at position c~ relative to w and writing a target dependent vd at position fi relative to v. Positions left of a head (in the source or target) are indicated with negative integers, while those right of the head are indicated with positive integers.</Paragraph> <Paragraph position="2"> The head transducers we use also include the following probability parameters for start, transition, and stop events:</Paragraph> <Paragraph position="4"> In the present work, when a model is applied to translate a source sentence, the chosen derivation of the target string is the derivation that maximizes the product of the above transducer event probabilities. The transduction search algorithm we use to apply the translation model is a bottom-up dynamic programruing algorithm similar to the analysis algorithm for relational head acceptors described by A1shawl (1996a).</Paragraph> </Section> <Section position="2" start_page="41" end_page="42" type="sub_section"> <SectionTitle> 2.2 Training method </SectionTitle> <Paragraph position="0"> The training method is organized into two main stages, an alignment stage followed by a transducer construction stage as shown in Figure 1.</Paragraph> <Paragraph position="2"> around a head w with respect to f The single input to the training process is a bitext corpus, constructed by taking each utterance in a corpus of transcribed speech and having it manually translated. We use the term bitext in what follows to refer to a pair consisting of the transcription of a single utterance and its translation.</Paragraph> <Paragraph position="3"> The steps in the training procedure are as follows: null 1. For each bitext, compute an alignment function f from source words to target words, using the method described in Section 3.</Paragraph> <Paragraph position="4"> 2. Partition the source into a head word w and substrings to the left and right of w (as shown in Figure 2). The extents of the partitions projected onto the target by f must not overlap. Any selection of the head satisfying this constraint is valid but the selection method used influences accuracy (Section 5).</Paragraph> <Paragraph position="5"> 3. Continue partitioning the left and right sub-strings recursively around sub-heads wl and wr. 4. Trace hypothesized head-transducer transitions that would output the translations of the left and right dependents of w (i.e. wl and wr) at the appropriate positions in the target string, indicated by f. This step is described in more detail below in Section 4.</Paragraph> <Paragraph position="6"> 5. Apply step 4 recursively to partitions headed by wl and wr, and then their dependents, until all left and right partitions have at most one word.</Paragraph> <Paragraph position="7"> 6. Aggregate hypothesized transitions to form the counts of a maximum likelihood head transduction model.</Paragraph> <Paragraph position="8"> The recursive partioning of the source and target strings gives the hierarchical decomposition for head transduction. In step 2, the constraint on target partitions ensures that the transduction hypothesized in training does not contain crossing dependency structures in the target.</Paragraph> </Section> </Section> class="xml-element"></Paper>