File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1072_intro.xml
Size: 8,443 bytes
Last Modified: 2025-10-06 14:01:22
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1072"> <Title>A Comparative Evaluation of Data-driven Models in Translation Selection of Machine Translation</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 5. 2 Data-Driven Model </SectionTitle> <Paragraph position="0"> For the data-driven model which does not require additional human-knowledge in acquiring information, Latent Semantic Analysis (LSA) andProbabilisticLSA(PLSA)areappliedtoestimate semantic similarity among words. Next twosubsectionswillexplainhowLSAandPLSA are to be adopted to measuring semantic similarity. null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Latent Semantic Analysis ThebasicideaofLSAisthattheaggregateofall </SectionTitle> <Paragraph position="0"> the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarityofmeaningofwordsandsetsofwordstoeach null other(Landaueretal.,1998)(GotohandRenals, 1997). LSA also extracts and infers relations of expected contextual usage of words in passages of discourse. It uses no human-made dictionaries, knowledge bases, semantic thesaurus, syntactic parser or the like. Only raw text parsed into unique character strings is needed for its input data.</Paragraph> <Paragraph position="1"> The flrst step is to represent the text as a matrix in which each row stands for a unique word and each column stands for a text passage or other context. Each cell contains the occurrence frequency of a word in the text passage. Next, LSA applies singular value decomposition (SVD) to the matrix. SVD is a form of factor analysis and is deflned as</Paragraph> <Paragraph position="3"> nonzero eigen values of AAT or ATA, and U and V are the orthogonal eigenvectors associatedwiththe r nonzeroeigenvaluesof AAT and ATA, respectively. One component matrix (U) describes the original row entities as vectors of derivedorthogonalfactorvalue,another(V)describes the original column entities in the same way, andthethird(SS)isadiagonalmatrixcontaining scaling values when the three components are matrix-multiplied, the original matrix is reconstructed.</Paragraph> <Paragraph position="4"> The singular vectors corresponding to the k(k * r) largest singular values are then used to deflne k-dimensional document space. Using thesevectors,mPSkandnPSkmatricesUk andVk mayberedeflnedalongwith kPSk singularvalue matrix Pk. It is known that Ak = UkSSkV Tk is the closest matrix of rank k to the original matrix. null LSA can represent words of similar meaning insimilarways. Thiscanbeclaimedbythefact thatonecompareswordswithsimilarvectorsas derived from large text corpora. The term-toterm similarity is based on the inner products between two row vectors of A, AAT = USS2UT.</Paragraph> <Paragraph position="5"> One might think of the rows of USS as deflning coordinates for terms in the latent space. To calculate the similarity of coordinates, V1 and V2, cosine computation is used:</Paragraph> <Paragraph position="7"> (2)</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Probabilistic Latent Semantic Analysis </SectionTitle> <Paragraph position="0"> Probabilistic latent semantic analysis (PLSA) is a statistical technique for the analysis of twomodeandco-occurrencedata,andhasproduced null some meaningful results in such applications as language modelling (Gildea and Hofmann, 1999)anddocumentindexingininformationretrieval (Hofmann, 1999b). PLSA is based on aspect model where each observation of the co-occurrence data is associated with a latent class variable z 2 Z = fz1;z2;:::;zKg (Hofmann, 1999a). For text documents, the observation is an occurrence of a word w 2W in a document d 2 D, and each possible state z of the latent class represents one semantic topic.</Paragraph> <Paragraph position="1"> A word-document co-occurrence event, (d;w), is modelled in a probabilistic way where it is parameterized as in</Paragraph> <Paragraph position="3"> Here, w and d are assumed to be conditionally independent given a speciflc z. P(wjz) and P(djz) are topic-speciflc word distribution and document distribution, respectively. The three-way decomposition for the co-occurrence data is similar to that of SVD in LSA. But the objective function of PLSA, unlike that of LSA, is the likelihood function of multinomial sampling. And the parameters P(z);P(wjz), and</Paragraph> <Paragraph position="5"> and this maximization is performed using the EM algorithm as for most latent variable models. Details on the parameter estimation are referred to (Hofmann, 1999a). To compute the similarity of w1 and w2, P(zkjw1)P(zkjw2) should be approximately computed with being derived from</Paragraph> <Paragraph position="7"> And we can evaluate similarities with the low-dimensional representation in the semantic</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Grammatical Relationship </SectionTitle> <Paragraph position="0"> We used grammatical relations stored in the form of a dictionary for translation of words.</Paragraph> <Paragraph position="1"> The structure of the dictionary is as follows</Paragraph> <Paragraph position="3"> where Cooc(Si;Sj) denotes grammatical cooccurrenceofsourcewordsSi andSj,whichone means an input word to be translated and the other means an argument word to be used in translation, and Tj is the translation result of the source word. T(C/) denotes the translation process.</Paragraph> <Paragraph position="4"> Table1showsagrammaticalrelationshipdictionary for an English verb Si ='build' and its objectnounsasaninputwordandanargument word, respectively. The dictionary shows that the word 'build' is translated into flve difierent translated words in Korean, depending on the context. For example, 'build' is translated into 'geon-seol-ha-da' ('construct') when its object noun is a noun 'plant' (='factory'), into 'chechak-ha-da' ('produce') when co-occurring with the object noun 'car', and into 'seol-lip-ha-da' ('establish') in the context of object noun 'company' (Table 2).</Paragraph> <Paragraph position="5"> One of the fundamental di-culties in co-occurrence-based approaches to word sense disambiguation (translation selection in this case) is the problem of data sparseness or unseen words. For example, for an unregistered object noun like 'vehicle' in the dictionary, the correct translation of the verb cannot be selected using the dictionary described above. In the next subsection, we will present k-nearest neighbor method that resolves this problem.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 k-Nearest Neighbor Learning for Translation Selection </SectionTitle> <Paragraph position="0"> The similarity between two words on latent semantic spaces is required when performing k-NN search to select the translation of a word.</Paragraph> <Paragraph position="1"> The nearest instance of a given word is decided by selecting a word with the highest similarity to the given word.</Paragraph> <Paragraph position="2"> 'build a plant' ) 'gong-jang-eul geon-seol-ha-da' 'construct' 'build a car' ) 'ja-dong-cha-reul che-chak-ha-da' 'produce' 'build a company' ) 'hoi-sa-reul seol-lip-ha-da' 'establish' The k-nearest neighbor learning algorithm (Cover and Hart, 1967)(Aha et al., 1991) assumes all instances correspond to points in the n-dimensional space Rn. We mapped the ndimensionalspaceintothe n-dimensionalvector of a word for an instance. The nearest neighbors of an instance are deflned in terms of the standard Euclidean distance.</Paragraph> <Paragraph position="3"> Then the distance between two instances xi and xj, D(xi;xj), is deflned to be</Paragraph> <Paragraph position="5"> and a(xi) denotes the value of instance xi, similarly to cosine computation between two vectors. Let us consider learning discrete-valued target functions of the form f : Rn ! V, where V is the flnite set fv1;:::;vsg. The k-nearest neighbor algorithm for approximating a discrete-valued target function is given in Table The value ^f(xq) returned by this algorithm as its estimate of f(xq) is just the most common value of f among the k training examples nearest to xq.</Paragraph> </Section> </Section> class="xml-element"></Paper>