File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2248_intro.xml
Size: 3,818 bytes
Last Modified: 2025-10-06 14:06:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2248"> <Title>Target Word Selection as Proximity in Semantic Space</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> When should Spanish detener translate to English arrest and when to stop? This paper explores the problem of lexical selection in machine translation (MT): a given source language (SL) word can often be translated into different target language (TL) words, depending on the context.</Paragraph> <Paragraph position="1"> Translation is difficult because the conceptual mapping between languages is generally not one-to-one; e.g. Spanish reloj maps to both watch and clock. A SL word might be translatable by more than one TL option, where the choice is based on stylistic or pragmatic rather than semantic criteria. Alternative TL choices also exist for SL words that are ambiguous from the monolingual point of view; e.g. English firm can be translated by Spanish firme, estricto, s61ido or compa~ia.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Semantic Space Models </SectionTitle> <Paragraph position="0"> In this paper I take a statistical approach to lexical selection, under the working assumption that the translated linguistic context can provide sufficient information for choosing the appropriate target. I define the appropriate target as the candidate &quot;closest&quot; in meaning to the local TL context, where local context refers to a window of words centered on the &quot;missing&quot; TL item.</Paragraph> <Paragraph position="1"> To estimate the similarity in meaning between a word and the bag of words forming a context, the semantic properties of words are first represented as their patterns of co-occurrence in a large corpus. Viewing a word as a vector in high dimensional &quot;semantic space&quot; allows distributional similarity (or &quot;semantic distance&quot;) to be measured using a standard vector similarity metric. The assumption that distributional similarity corresponds to the psychological concept of semantic relatedness has proved useful in NLP (e.g. Schtitze, 1992), and for psycholinguistic modelling (e.g. Landauer & Dumais, 1997).</Paragraph> <Paragraph position="2"> One way to estimate the semantic distance between a local discourse context and a target word is to measure the proximity between the centroid vector created from the words in the context and the target word vector. This approach was used successfully by Schiitze (1992) in a small-scale word sense disambiguation experiment. However, in this approach the distributional properties of the words making up the local context are not taken into account.</Paragraph> <Paragraph position="3"> The centroid method establishes one position (the mean) on each dimension to use in the distance estimate, without considering the variability of the values on all dimensions. If there is a large amount of noise in the context (semantically irrelevant words), the centroid is influenced equally by these words as by words that are relevant to the correct target. Weighting the dimensions of the space according to variability allows a semantic distance measure to be influenced less by irrelevant dimensions (Kozimo & Ito, 1995).</Paragraph> <Paragraph position="4"> It is clear that this method relies on the hypothesis that the region of semantic space defined by the translated context &quot;overlaps&quot; to a greater degree with the preferred target than with the alternative choices. The main purpose of the present investigation was to determine the extent that this hypothesis was supported.</Paragraph> </Section> </Section> class="xml-element"></Paper>