File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1204_metho.xml
Size: 13,425 bytes
Last Modified: 2025-10-06 14:10:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1204"> <Title>Using Information about Multi-word Expressions for the Word-Alignment Task</Title> <Section position="5" start_page="20" end_page="21" type="metho"> <SectionTitle> 4 Alignment algorithm </SectionTitle> <Paragraph position="0"> In this section, we describe the algorithm for aligning verbs and their dependents in the source language sentence with the words in the target language. Let V be the number of verbs and A be the number of dependents. Let the number of words in the target language be N. If we explore all the ways in which the CE B7 BT words in the source sentence are aligned with words in the target language before choosing the best alignment, the total number of possibilites are C6CEB7BT. This is computationally very expensive. Hence, we use a Beam-search algorithm to obtain the K-best alignments.</Paragraph> <Paragraph position="1"> Our algorithm has three main steps.</Paragraph> <Paragraph position="2"> 1. Populate the Beam : Use the local features (which largely capture the co-occurence information between the source word and the target word) to determine the K-best alignments of verbs and their dependents with words in the target language.</Paragraph> <Paragraph position="3"> 2. Re-order the Beam: Re-order the above alignments using more complex features (which include the global features and the compositionality based feature(s)).</Paragraph> <Paragraph position="4"> 3. Post-processing : Extend the alignment(s) of the verb(s) (on the source side) to include words which can be part of the verbal unit on the target side.</Paragraph> <Paragraph position="5"> For a source sentence, let the verbs and dependents be denoted by D7</Paragraph> </Section> <Section position="6" start_page="21" end_page="21" type="metho"> <SectionTitle> CXCY </SectionTitle> <Paragraph position="0"> . Here CX is the index of the verb (BD BOBP CX BOBP CE ). The variable CY is the index of the dependents (BC BOBP CY BOBP BT) except when CY BP BC which is used to represent the verb itself. Let the source sentences be denoted as CB BP CUD7</Paragraph> <Section position="1" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 4.1 Populate the Beam </SectionTitle> <Paragraph position="0"> The task in this step is to obtain the K-best candidate alignments using local features. The local features mainly contain the coccurence information between a source and a target word and are independent of other alignment links or words in the sentences. Let the local feature vector be denoted B5. The score of a particular alignment link is computed by taking the dot product of the weight vector CF with the local feature vector (of words connected by the alignment link). Hence, the local score will be The total score of an alignment configuration is computed by adding the scores of individual links in the alignment configuration. Hence, the alignment score will be</Paragraph> </Section> </Section> <Section position="7" start_page="21" end_page="21" type="metho"> <SectionTitle> BE AMCP </SectionTitle> <Paragraph position="0"> We propose an algorithm of order C7B4B4CE B7 BTB5C6D0D3CVB4C6B5 B7 C3B5 to compute the K-best alignment configurations. First, the local scores of each verb and its dependents are computed for each word in the target sentence and stored in a local beam denoted by CQ</Paragraph> </Section> <Section position="8" start_page="21" end_page="22" type="metho"> <SectionTitle> CXCY </SectionTitle> <Paragraph position="0"> . The local beams corresponding to all the verbs and dependents are then sorted. This operation has the complexity</Paragraph> <Paragraph position="2"> The goal now is to pick the K-best configurations of alignment links. A single slot in the local beam corresponds to one alignment link. We define a boundary which partitions each local beam into two sets of slots. The slots above the boundary represent the slots which have been explored by the algorithm while slots below the boundary have still to be explored. The figure 3. shows the boundary which cuts across the local beams.</Paragraph> <Paragraph position="3"> We keep on modifying the boundary untill all the K slots in the Alignment Beam are filled with the K-best configurations. At the beginning of the algorithm, the boundary is a straight line passing through the top of all the local beams. The top slot of the alignment beam at the beginning represents the combination of alignment links with the best local scores.</Paragraph> <Paragraph position="4"> The next slot CQ</Paragraph> </Section> <Section position="9" start_page="22" end_page="22" type="metho"> <SectionTitle> CXCY </SectionTitle> <Paragraph position="0"> CJD4CL (from the set of unexplored slots) to be included in the boundary is the slot which has the least difference in score from the score of the slot at the top of its local beam. That is, we pick the slot CQ CJD4CL. The above procedure ensures that the the alignment configurations are K-best and are sorted according to the scores obtained using local features.</Paragraph> <Section position="1" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 4.2 Re-order the beam </SectionTitle> <Paragraph position="0"> We now use global features to re-order the beam.</Paragraph> <Paragraph position="1"> The global features look at the properties of the entire alignment configuration instead of alignment links locally.</Paragraph> <Paragraph position="2"> The global score is defined as the dot product of the weight vector and the global feature vector.</Paragraph> <Paragraph position="4"> The overall score is calculated by adding the local score and the global score.</Paragraph> <Paragraph position="6"> The beam is now sorted based on the overall scores of each alignment. The alignment configuration at the top of the beam is the best possible alignment between source sentence and the target sentence.</Paragraph> </Section> <Section position="2" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 4.3 Post-processing </SectionTitle> <Paragraph position="0"> The first two steps in our alignment algorithm compute alignments such that one verb or dependent in the source language side is aligned with only one word in the target side. But, in the case of compound verbs in Hindi, the verb in English is aligned to all the words which represent the compound verb in Hindi. For example, in Figure 3, the verb &quot;lost&quot; is aligned to both 'khoo' and 'dii'. Our alignment algorithm would have aligned &quot;lost&quot; only to 'khoo'. Hence, we look at the window of words after the word which is aligned to mainee Shyam ki kitaaba khoo dii the source verb and check if any of them is a verb which has not been aligned with any word in the source sentence. If this condition is satisfied, we align the source verb to these words too.</Paragraph> </Section> </Section> <Section position="10" start_page="22" end_page="23" type="metho"> <SectionTitle> 5 Parameters </SectionTitle> <Paragraph position="0"> As the number of training examples (294 sentences) is small, we choose to use very representative features. Some of the features which we used in this experiment are as follows,</Paragraph> <Section position="1" start_page="22" end_page="23" type="sub_section"> <SectionTitle> 5.1 Local features (BY </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> The local features which we consider are mainly co-occurence features. These features estimate the likelihood of a source word aligning to a target word based on the co-occurence information obtained from a large sentence aligned corpora1.</Paragraph> <Paragraph position="3"> 1. DiceWords: Dice Coefficient of the source word and the target word was present in the translation of sentences containing the word D7 CXCY in the parallel corpus.</Paragraph> <Paragraph position="4"> 2. DiceRoots: Dice Coefficient of the lemmatized forms of the source and target words. It is important to consider this feature because the English-Hindi parallel corpus is not large and co-occurence information can be learnt effectively only after we lemmatize the words.</Paragraph> <Paragraph position="5"> 3. Dict: Whether there exists a dictionary entry from the source word D7 CXCY to the target word 150K sentence pairs originally collected as part of TIDES MT project and later refined at IIIT-Hyderabad, India.</Paragraph> <Paragraph position="7"> . For English-Hindi, we used a dictionary available at IIIT - Hyderabad, India.</Paragraph> <Paragraph position="8"> 4. Null: Whether the source word D7</Paragraph> </Section> </Section> <Section position="11" start_page="23" end_page="23" type="metho"> <SectionTitle> CXCY </SectionTitle> <Paragraph position="0"> is aligned to nothing in the target language.</Paragraph> <Section position="1" start_page="23" end_page="23" type="sub_section"> <SectionTitle> 5.2 Global features </SectionTitle> <Paragraph position="0"> The following are the four global features which we have considered, AF AvgDist: The average distance between the words in the target language sentence which are aligned to the verbs in the source language sentence . AvgDist is then normalized by dividing itself by the number of words in the target language sentence. If the average distance is small, it means that the verbs in the source language sentence are aligned with words in the target language sentence which are located at relatively close distances, relative to the length of the target language sentence. null This feature expresses the distribution of predicates in the target language.</Paragraph> <Paragraph position="1"> AF Overlap: This feature stores the count of pairs of verbs in the source language sentence which align with the same word in the target language sentence. Overlap is normalized by dividing itself by the total pairs of verbs.</Paragraph> <Paragraph position="2"> This feature is used to discourage overlaps among the words which are alignments of verbs in the source language sentence.</Paragraph> <Paragraph position="3"> AF MergePos: This feature can be considered as a compositionality based feature. The part of speech tag of a dependent is essential to determine the likelihood of the dependent to align with the same word in the target language sentence as the word to which its verb is aligned.</Paragraph> <Paragraph position="4"> This binary feature is active when the alignment links of a dependent and its verb merge. For example, in Figure 5., the feature 'merge RP' will be active (that is, merge RP = 1).</Paragraph> <Paragraph position="5"> AF MergeMI: This is a compositionality based feature which associates point-wise mutual information (apart from the POS information) with the cases where the dependents which have the same alignment in the target language as their verbs. This features which notes the the compositionality value (represented by point-wise mutual information in our experiments) is active if the alignment links of dependent and its verb merge.</Paragraph> <Paragraph position="6"> The mutual information (MI) is classified into three groups depending on its absolute value. If the absolute value of mutual information rounded to nearest integer is in the range 0-2, it is considered LOW. If the value is in the range 3-5, it is considered MEDIUM and if it is above 5, it is considered HIGH.</Paragraph> <Paragraph position="7"> The feature &quot;merge RP HIGH&quot; is active in the example shown in figure 6.</Paragraph> <Paragraph position="9"/> </Section> </Section> <Section position="12" start_page="23" end_page="24" type="metho"> <SectionTitle> 6 Online large margin training </SectionTitle> <Paragraph position="0"> For parameter optimization, we have used an on-line large margin algorithm called MIRA (Mc-Donald et al., 2005) (Crammer and Singer, 2003).</Paragraph> <Paragraph position="1"> We describe the training algorithm that we used very briefly. Our training set is a set of English-Hindi word aligned parallel corpus. We get the verb based expressions in English by running a dependency parser (Shen, 2006). Let the number of sentence pairs in the training data be D1. We have ceeds the score of each of the predictions in AB by a margin which is equal to the number of mistakes in the predictions when compared to gold alignment.</Paragraph> <Paragraph position="2"> While computing the number of mistakes, the mistakes due to the mis-alignment of head verb could be given greater weight, thus prompting the optimization algorithm to give greater importance to verb related mistakes and thereby improving over-all performance.</Paragraph> <Paragraph position="3"> Step 4 in the algorithm mentioned above can be substituted by the following optimization The above optimization problem is converted to the Dual form using one Lagrangian multiplier for each constraint. In the Dual form, the Lagrangian multipliers are solved using Hildreth's algorithm. Here, prediction of AB is similar to the prediction of C3 A0CQCTD7D8 classes in a multi-class classification problem. Ideally, we need to consider all the possible classes and assign margin constraints based on every class. But, here the number of such classes is exponential and thus we restrict ourselves to the C3 A0CQCTD7D8 classes.</Paragraph> </Section> class="xml-element"></Paper>