File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0603_intro.xml

Size: 4,064 bytes

Last Modified: 2025-10-06 14:06:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0603">
  <Title>Representation and Processing of Chinese Nominals and Compounds</Title>
  <Section position="2" start_page="0" end_page="20" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper, we present results of a theoretical and an applied investigation, within a knowledge base framework, on the building and processing of computational semantic lexicons, as reflected by experiments done on Spanish, English and Chinese, with a large scale application on Spanish. The multilingual dictionaries making process (Viegas and Raskin, 1998) has been tested and attested for Mikrokosmos, a machine translation system (Nirenburg et al., 1996) from Spanish and Chinese to English. 2 Here, we focus on Chinese nominals and compounds in terms of representation and processing.</Paragraph>
    <Paragraph position="1"> In Section2, we briefly present the information carried inside Mikrokosmos lexicons. In Section 3, we show how a semantic-based transcategorial approach is best fitted to account for nominalisations and their derived forms. Formally, we use the conceptual tool of lexical rules as described in (Viegas et al., 1996). In Section 4, we address the Iranslation of Chinese nominal compounds into English using semantic information and word order information. We show the advantage of a transcategorial approach to lexicon representation and investigate some trade-offs between an interlingua and transfer approach to nominal compounding.</Paragraph>
    <Paragraph position="2"> XThis work has been supported in part by DoD under contract number MDA-904-92-C-5189.</Paragraph>
    <Section position="1" start_page="0" end_page="20" type="sub_section">
      <SectionTitle>
of Mikrokosmos Lexicons
</SectionTitle>
      <Paragraph position="0"> In Mikrokosmos, the lexical information is distributed among various levels, relevant to phonology, orthography, morphology, syntax, semantics, syntax-semantic linking, stylistics, paradigmatic and syntagmatic information, and also database type management information. 3 Each entry consists of a list of words, stored in the lexicon independently of their POS (the verb and noun form of walk are under the same superentry).</Paragraph>
      <Paragraph position="1"> Each word meaning is identified by a unique identificator, or lexeme (Onyshkevych and Nirenburg, 1994). Homonyms and all meaning shifts of polysemous words are listed under one single superentry. 4 We illustrate in Figure 1 relevant aspects, for this paper, of a lexicon entry via the description of two senses of the Chinese word ~ (activity): WorkActivity and Exercise, which are well defined symbols or concepts in the Mikrokosmos ontology as described in (Mahesh, 1996).</Paragraph>
      <Paragraph position="2"> Word meanings in Mikrokosmos are represented partly in the lexicon and partly in the ontology. We have strived to achieve an intermediate grain size of meaning representation in both the lexicon and the ontology: many word senses have direct mappings to concepts in the ontology; many others must be decomposed and mapped indirectly through composition and modification of ontological concepts. We have developed a set of guidelines and a training methodology that results in acceptable quality and uniformity in lexical and ontological representations (Mahesh, 1996; Viegas and Raskin, 1998). In principle, the separation between ontology and lexicon is as follows: language-neutral meanings are stored in the former; language-specific information in the latter.</Paragraph>
      <Paragraph position="3"> We keep the number of concepts well below the number of lexical items for a given language, such aDeLails on these zones can be found in (Viegas and Raskin, 1998; Meyer et al., 1990).</Paragraph>
      <Paragraph position="4">  that, for instance, the concept Ingest can be lexicalised as ~; (eat.) or ~ (drink) according to the constraints put in the lexicon on the theme: Food for eat and Liquid for drink respectively.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML