XML Viewer - c00-1064

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1064_metho.xml
Size: 12,030 bytes
Last Modified: 2025-10-06 14:07:10
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1064">
  <Title>Structural Feature Selection For English-Korean Statistical Machine Translation</Title>
  <Section position="4" start_page="439" end_page="440" type="metho">
    <SectionTitle>
3 Problem Setting
</SectionTitle>
    <Paragraph position="0"> In tiffs ctlat)ter, we describe how the features are related to the training data. Let tc be an English tag sequence and tk be a Korean tag sequence. Let Ts be the set of all possible tag sequence niapI)ings in a aligned sentence, S. We define a feature function (or a feature) as follows:</Paragraph>
    <Paragraph position="2"> It indicates co-occurrence information l)etween tags appeared in Ts. f(tC/,tk) expresses the information for predicting that te maps into ta.. A feature means a sort of inforination for predicting something. In our model, co-occurrence information on the same aligned sentence is used for a feature, while context is used as a feature in Inost of systems using maximum entropy. It can be less informative than context. Hence, we considered an initial supervision and feature selection.</Paragraph>
    <Paragraph position="3"> Our model starts with initial seed(active) features for mapI)ing extracted by SUl)ervision. In the next step, thature pool is constructed from training samples fro:n filtering and oifly features with a large gain to the model are added into active feature set. The final outputs of our model are the set of active t'eatures, their gain values, and conditional probabilities of features which maximize the model. Tim results can be embedded in parameters of statistical machine translation and hell) to construct structural bilingual text.</Paragraph>
    <Paragraph position="4"> Most alignment algorithm consists of two steps:  (1) estimate translation probabilities.</Paragraph>
    <Paragraph position="5"> (2) use these probabilities to search for most t)roba null ble alignment path.</Paragraph>
    <Paragraph position="6"> Our study is focused on (1), especially the part of tag string alignments.</Paragraph>
    <Paragraph position="7"> Next, we will explain the concept of the model.</Paragraph>
    <Paragraph position="8"> We are concerned with an ot)timal statistical inodel which can generate the traiifing samples. Nmnely, our task is to construct a stochastic model that pro- null (1) duces outl)ut tag sequenc0, &amp;quot;~k, given a tag sequence ~+~,-~.  To The l)roblem of interest is to use Salnt/les of * --J~\,What .... tagged sentences to observe the/)charier of the ran- u~,,~ (loin t)roeess. 'rile model p estinmtes tile conditional tt'2,Y probability that tile process will outlmt t,~, given t~.. ~,~/~ ,o~!! It is chosen out of a set of all allowed probability o~,~ ~ e}~..C/0,, me (tistributions ....</Paragraph>
    <Paragraph position="9"> The fbllowing steps are emt)loyed for ()tit&amp;quot; model, v~ / Input: a set L of POS-labeled bilingual aligned sentences.</Paragraph>
    <Paragraph position="10"> I. Make a set ~: of corresl)ondence pairs of tag sequences, (t~, tk) from a small portion of L by supervision.</Paragraph>
    <Paragraph position="11">  2. Set 2F into a set of active features, A.</Paragraph>
    <Paragraph position="12"> 3. Maximization of 1)arameters, A of at:tire features 1)y IIS(hnproved Iterative Sealing) algorithm. null 4. Create a feature pool set ?9 of all possible alignnmnts a(t(,, tk) from tag seqllellces of samples. 5. Filter 7 ) using frequency and sintilarity with M. 6. Coml)ute the atit)roximate gains of tkmtm:es in &amp;quot;p. 7. Select new features(A/') with a large gain vahle, and add A.</Paragraph>
    <Paragraph position="13">  Outt)ut: p(tklt~,)whcrc(t(,, t~.) C M and their Ai. We I)egan with training samples comi)osed of English-Korean aligned sentence t)airs, (e,k). Since they included long sentences, w(', 1)roke them into shorter ones. The length of training senl;en(:es was limited to ml(h',r 14 on the basis of English. It is reasona,bh; \])(',(:&amp;llSe we are interested in not lexical alignments lint tag sequence aliglmients. The samples were tagged using brill's tagger and qVIorany' that we iml)lenmnted as a Korean tagger. Figure \] shows the POS tags we considered. For simplicity, we adjusted some part of Brill's tag set.</Paragraph>
    <Paragraph position="14"> In the, sut)ervision step, 700 aligned sentences were used to construct the tag sequences mal)I)ings wlfich are referred to as an active feature set A. As Figure 2 shows, there are several ways in constructing the corresl)ondem;es. We chose the third mapping although (1) can be more useflll to explain Korean with I)redieate-argunmnt structure. Since a subject of a English sentence is always used for a subject tbrln in Korean, we exlcuded a subject case fi'onl argulnents of a l/redicate. For examl)le, 'they' is only used for a subject form, whereas 'me' is used for a object form and a dative form.</Paragraph>
    <Paragraph position="15"> II1 tile next step, training events, (t,:, It.) are constructed to make a feature 1)eel froln training sampies. The event consists of a tag string t,, of a English  phrase level 1)OS-tagged sentence and a tag string tL~ of the corresponding Korean POS-tagged sentence and it Call be represented with indicator functions fi(t~, tk). For a given sequence, the features were drawn fl'om all adjacent i)ossible I)airs and sonic interrupted pairs. Only features (tci, tfii ) out of the feature pool that meet the following conditions are extracted.</Paragraph>
    <Paragraph position="16"> * #(l, ei,t~:i) _&gt; 3, # is count * there exist H:.~,, where (t(,i,tt.~.) in A and the similarity(sanle tag; colin|;) of lki an(1 tkx _&gt; 0.6 Table \] shows possible tL'atures, for a given aligned sentence , 'take her out - g'mdCOrcul baggcuro dcrfleoflara'.</Paragraph>
    <Paragraph position="17"> Since the set of the structural ti;atm'es for alignment modeling is vast, we constructed a maximum entrol)y model for p(tkltc) by the iterative model growing method.</Paragraph>
  </Section>
  <Section position="5" start_page="440" end_page="442" type="metho">
    <SectionTitle>
4 Maximum Entropy
</SectionTitle>
    <Paragraph position="0"> To explain our method, we l)riefly des(:ribe the con(:ept of maximum entrol)y. Recently, many al)lnoaches l)ased on the maximum entroi)y lnodel have t)een applied to natural language processing (Berger eL al., \]994; Berger et al., 1996; Pietra et al., 1997).</Paragraph>
    <Paragraph position="1"> Suppose a model p which assigns a probability to a random variable. If we don't have rely knowledge, a reasonal)le solution for p is the most unifbrnl distribution. As some knowledge to estilnate the model p are added, tile solution st)ace of p are more constrained and the model would lie close to the ol\]timal probability model.</Paragraph>
    <Paragraph position="2"> For the t)url/ose of getting tile optimal 1)robability model, we need to maxi\]nize the unifl)rnlity under some constraints we have. ltere, the constraints are related with features. A feature, fi is usually rel/re sented with a binary indicator funct, ion. The inlportahoe of a feature, fi can be identified by requiring that the model accords with it.</Paragraph>
    <Paragraph position="3"> As a (:onstraint, the expected vahle of fi with respect to tile model P(fi) is supposed to be the same as tile exl)ected value of fi with respect to empiri(:al distril)ution in training saml)les, P(fi).</Paragraph>
    <Paragraph position="4">  interjection verb, past tense verb, past participle WH-pronoun, possessive not be verb, past tense be verb, present participle have verb, past participle do verb, past tense  numeral, cardinal existential there NNIN1 preposition, subordinating NNIN2 adjective, comparative NNDE1 NNDE2 list item marker PN noun, common NU noun, proper, plural VBMA genitive marker AJMA pronoun, possessive CO adverb, comparative AX particle ADCO to or infinitive marker APSE verb, present tense CJ verb, present participle ANCO WH-determiner ANDE WH-adverb ANNU be verb. present tense EX be verb, past participle LQ have verb, present tense RQ do verb, present tense SY do verb, past participle  In sun1, the maxilnunl entropy fralnework finds the model which has highest entropy(most uniform)~ given constraints. It is related to the constrained optimization. To select a model from a constrained set, C of allowed l)rol)ability distributions, the model p, C C with maximum entropy H(p) is chosen.</Paragraph>
    <Paragraph position="5"> In general, for the constrained optimization problem, Lagrange inultipliers of the number of features can be used. However, it was proved that the model with maximum entropy is equivalent to the model that maximizes the log likelihood of the training samples like (2) if we can assume it as an exponential model.</Paragraph>
    <Paragraph position="6"> hi (2), the left side is Lagrangian of the condi-. tional entropy and the right side is inaxilnlHn log-. likelihood. We use the right side equation of (2) to select I. for the best model p,.</Paragraph>
    <Paragraph position="7"> ~,g,,~..~,(- ~.,,. ~(~)v(yl~)logv(vlx)+~,,(v(f,)-~(/,))) (2) :a,'9,,,ax~, ~.,,. ~(x,v)lo.~n,(ylx) Since t, cannot be tbund analytically, we use the tbllowing improved iterative scaling algorithm to colnpute I, of n active features in .4 in total sam-ples. null  1. Start with li = 0 for all i 6 {1,2,...,n} 2. Do for ca.oh i ~ {\],2,...,n} : (a) Let AAi be the solution to the log likelihood null (b) Update the value of Ai into li + A,h, ~. ..... ~(.,,v)A(:~,v) where AAi = log ~,:, ~i.~)v~(?11.~)/~(.~,v)</Paragraph>
    <Paragraph position="9"> 3. Stop if not all the Ai have converged, otherwise go to step 2  Tile exponential model is represented as (3). Here, li is the weight of feature fi. In ore&amp;quot; model, since only one feature is applied to each pair of x and y, it can be represented as (4) and fi is the feature related with x and y.</Paragraph>
    <Paragraph position="11"/>
  </Section>
  <Section position="6" start_page="442" end_page="442" type="metho">
    <SectionTitle>
5 Feature selection
</SectionTitle>
    <Paragraph position="0"> Only a small subset of features will 1)e emph)yed in a model by sele(:ting useflfl feal;m'es from (;tie flmture 1)ool 7 ). Let 1).,4 lie (;tie optimal mo(lel constrained by a set of active features M and A U J'i 1)e ,/lfi. Le(; PAf~ be the ot)timal model in the space of l)rol)ability distribution C(Afi). The optimal model can be tel)resented as (5). Here, the optimal model means a maxilmnn entropy nlodd.</Paragraph>
    <Paragraph position="2"> The imi)rovement of l;he model regarding the addition of a single feature fi can be estiumted by measuring the difference of maximmn log-likelihood between L(pAf~) and L(pA). We denote the gain of t~ature fitiy A(~lfi) an(l it can be r(!t/resented in</Paragraph>
    <Paragraph position="4"> Note that a model PA has a, set of t)arameters A which means weights of teatures. The m(idel P.Afl contains the l)ara.lnetc.rs an(I the new \[)a.l'a, lllCi;('~r (11 with l'eSl)ect t() the t'eal;ure fi. W'hen adding a new feature to A, the optimal values (if all parame(ers of probability (listril)u(,ion change. To make th(; (:omi)utation of feature selection tractal)le, we al)l)roximate that the addition of a feature fi affec(;s only the single 1)aranxeter a, as shown in (5).</Paragraph>
    <Paragraph position="5"> qShe following a.lgoritlnn is used for (;omputing the gain of the model with rest)ect to fi. We referred to the studies of (Berger et al., 1996; Pietra e.t al., 1997). We skip tile detailed contents and 1)root~.</Paragraph>
    <Paragraph position="7"> 2. Set a0 = 0 3. Repeat the following until GAff(%,) has con-</Paragraph>
    <Paragraph position="9"> This algorittun is iteratively comtmted using Net-Well'S method. \Y=e cmt recognize the iml)ortance of a fl;ature with the gain value. As mentioned above, it means how much the feature accords with the model.</Paragraph>
    <Paragraph position="10"> We viewed the feature as tile information that Q. and t, occur together.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML