File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1014_intro.xml

Size: 4,528 bytes

Last Modified: 2025-10-06 14:01:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1014">
  <Title>Automatic Augmentation of Translation Dictionary with Database Terminologies in Multilingual Query Interpretation</Title>
  <Section position="2" start_page="1" end_page="2" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In interpreting multilingual queries to databases with domain information such as objects, table names, and attribute names that are described in a particular language, we must address the problem of word sense disambiguation. For example, if we wish to interpret a query in English to a database with domain information described in Korean, lexical items in English must be disambiguated to the matching senses in Korean.</Paragraph>
    <Paragraph position="1"> This problem is similar to that of lexical selection in machine translation domain (Lee et al.,  This work was supported by the Korea Science and Engineering Foundation (KOSEF) through AITrc.</Paragraph>
    <Paragraph position="2"> ip-ta person body color ...</Paragraph>
    <Paragraph position="3">  Mary oy-twu kal-sayk ...</Paragraph>
    <Paragraph position="4"> ... ... ... ...</Paragraph>
    <Paragraph position="5"> sin-ta person foot status ...</Paragraph>
    <Paragraph position="6"> John sin-bal nalk-ta ...</Paragraph>
    <Paragraph position="7"> ... ... ... ...</Paragraph>
    <Paragraph position="8"> sa-ta person object status ...</Paragraph>
    <Paragraph position="9"> John ca-tong-cha nalk-ta ...</Paragraph>
    <Paragraph position="10"> Mary sin-bal nalk-ta ...</Paragraph>
    <Paragraph position="11"> Manny ko-yang-i nulk-ta ...</Paragraph>
    <Paragraph position="12"> ... ... ... ...</Paragraph>
    <Paragraph position="13">  is different in the sense that one is a formal query language and the other is another natural language. This difference prompts us to make use of database information, such as domain database objects, table names, and attribute names, instead of the general semantic classifications (Palmer et al., 1999) for disambiguating the senses of lexical items in the query. Example queries are  shown below: (1) (a) Which shoes does Mary buy? (b) Who wears a brown coat? (c) Who wears old shoes and buys an old car?  Query 1a shows a query made up of unambiguous words having a unique target interpretation. But in 1b, wears may have several interpretations in Korean such as 'ip-ta', 'ssu-ta', 'sin-ta', and 'tti-ta' (cf. Table 3). And old in query 1c also contains several senses  . If we assume a simple database made up of tables such as 'ip-ta' (to put on the body), 'sin-ta' (to put on the foot), and  We notate Korean alphabets in Yale form. 'sa-ta' (buy) in Table 1, wears in 1b can be disambiguated by a lexical item 'coat' and its target 'oy-twu', since 'oy-twu' only appears in the table as related to 'ip-ta'. And wears in 1c is also restricted by 'shoes', but 'shoes' appears in the table as related to 'sin-ta' and 'sa-ta'. As shown, these senses can be disambiguated with the translation dictionary. Since 'sa-ta', or 'buy', is not registered in the translation dictionary, it is simply discarded. old in a query 1c can be interpreted into 'nalk-ta' (not new) and 'nulk-ta' (not young) because it appears in the same table entries for 'sa-ta'. Since it is difficult to disambiguate the senses only with database information, we may utilize co-occurrence information between the collocated words such as (old,shoes) and (old,car) (Park and Cho, 2000; Lee et al., 1999).</Paragraph>
    <Paragraph position="14"> In this paper, we propose a disambiguation method with the database information and co-occurrence information (Park and Cho, 2000; Palmer et al., 1999) for the interpretation of natural language queries (Lee and Park, 2001) in multilingual query interpretation. Although we propose to construct the system without an intermediate representation language, we show that our Combinatory Categorial Grammar (CCG) framework is compatible with the approaches with an intermediate representation (Nelken and Francez, 2000; Androutsopoulos et al., 1998; Klein et al., 1998). We also discuss the advantages and disadvantages of these two approaches. The rest of the paper is organized as follows. A brief introduction to CCGs and natural language database interfaces (NLDBs) will be shown in Section 2. We show the translation process with and without an intermediate representation using CCG in Section 3. The proposed system with multilingual translation is described in Sections 4 and 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML