File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-1040_intro.xml

Size: 5,189 bytes

Last Modified: 2025-10-06 14:05:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1040">
  <Title>Bilingual Knowledge Acquisition from Korean-English Parallel Corpus Using Alignment Method ( Korean-English Alignment at Word and Phrase Level )</Title>
  <Section position="2" start_page="0" end_page="230" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Studies on parallel corpus consisting of multilingum texts are often guided with the purpose to obtain linguistic resources such as bilingual dictionary, bilingual grammars (Wu 1995) and translation examples. Parallel texts have t)roved to be useful not only in tile development of statistical  in other applications such as word sense disanabiguation (Brown et al. 1991) and bilingnal lexicography (Klavans and Tzoukermann 1990). As the parallel corpora become more and more accessible, many researches based on the bilingual corpora are now encouraged that were once considered impractical.</Paragraph>
    <Paragraph position="1"> Alignment as a study of parallel corpus refers to the process of establishing the correspondences between matching elements in parallel corpus.</Paragraph>
    <Paragraph position="2"> Alignment methods tend to approach the problem differently according to the alignment units the methods adopt. Of various alignment options, the alignlnent of word units is to compute a sequence of the matching pairs of words in a parallel corpus. Figure 1 show the aligned results of a parallel corpus that was originally paired in a sentence level. In figure 1, the right-hand side of pair-wise aligmnent is the corresponding Korean words. Described in the parentheses on the right of each Korean word are corresponding English meaning and syntactic functions of the word.</Paragraph>
    <Paragraph position="3"> The existing methods for the alignment of Indo-European language pairs such as English and French take words as aligning units and restrict the correspondences between words to be one of the functional mappings (one-to-one, one-to-</Paragraph>
    <Paragraph position="5"> many) (l~row,~ ('~t al. 199:1, Sn,ad,ia 1!)!~2). 'rh(,se methods made extensive, us(.&amp;quot; of the position infer marion of words at ltlat(;hillg pairs of sellte/lCeS, which turned out useful (Brown et al. 1993). q'he structural similarity in word order and units between English and l,'rench tIIIISt \[)e ()lie of the \[l|ajot factor to the succ(;ss of th(~ tuethods.</Paragraph>
    <Paragraph position="6"> The Mignment of the pairs of structurally dissimilar languages such as Korean and English rc quires different strategy to comp(~nsate the lack of structural information such word or(ler and to handle the difli~reu('e of aligimwnt units.</Paragraph>
    <Paragraph position="7"> An early ~ttemt)t to align Asian and \[ndol!;uropean l~mguage pairs is found from tim work by Wu and Xia (199d). Their result is promising with the demor, stration of high accuracy o\[' learning 1)ilingual lexicon between English aml (;hin(',se for fl:equently use(1 words without t;he consideration of word order. The C, hinesc-t';nglish alignmeat consists of segmentation of an inl)U/, (',hinese sentence, and aligning the segmented seltteiic(? with the c~mdidate English SelltellCe. The g(eneration of segments to be aligne(l is an additional prol)h~m to the decision of aligning units before 1he aligmnent takes l)b~ce. Wu and Xia (1!)94) used I)ilingual dictionary to segment the sentence, but the selectioii of segment can(lid~ttes is hard to make with rdiable accuracy. 'l'he bilingual dictionari(,s are not always awfilabh', and take. considered)h; resources to build.</Paragraph>
    <Paragraph position="8"> '\['he method we suggest integrates the l)rocedures to solve the two critical /)robh~ms: deci(l ing aligning units and aligning tim candidates of dilferent word orders and accoml)lishes the atigu meat wi|,hout using any dictionary.</Paragraph>
    <Paragraph position="9"> The proposed alignment nmthod assumes it l)re l)roc(,.ssing step t)efore iterative applications of ~fligmnent ste 1) as is illustrated in tigure 2. Partof-sl)eech tagging is don(; I)elbre the actual align meat so that the. word-phrases (a spacing unit in Korean) may be decomposed into prop('.r words attd functional morphemes and the Korean and I:mglish words may be assigned with apl)ropriate tags.</Paragraph>
    <Paragraph position="10"> 'l'he proposed alignment is done first \['or l)hras(~ pairs and then word pairs that eventually induces the bilingual dictionary. The alignment nlethod is realized through ~he rcestimation of its probal)ilistic parameters from tim aligne.d sem,cn('es. In particular, the \])arallleters ;-i,ccotlllt \['or th(! cooctll!ren(:e, probilities el'bilingual word pairs and phrase pairs. The repetitive ai)plicatioil of tim alignmeut m,d reesl, imation h'ads to a convergent stationary state where the tra.ining stops.</Paragraph>
    <Paragraph position="11"> In the folk)wing secl,ion, our t)ropos(~d method for aligning lPSor(,an- t,;nglish sentences is described ~md l)aranmt('.r reestimation algorithm in explained. Section 3 summarizes the results of ex1)erinlents an(l Conclusion is given in section 4.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML