XML Viewer - w03-1401

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1401_metho.xml
Size: 9,540 bytes
Last Modified: 2025-10-06 14:08:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1401">
  <Title>Metonymy as a Cross-lingual Phenomenon</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Regular polysemy across languages
</SectionTitle>
    <Paragraph position="0"> The question whether regular polysemy is a cross-linguistic phenomenon has until now only been approached by small scale analyses.</Paragraph>
    <Paragraph position="1"> For instance, Kamei and Wakao (Kamei, 1992) approached the question from the perspective of machine translation and conducted a comparative survey of the acceptability of metonymic expressions in English, Chinese and Japanese consisting of 25 test sentences. The results they report show that in some cases English and Japanese share metonymic patters to the exclusion of Chinese, but that in others English and Chinese team up.</Paragraph>
    <Paragraph position="2"> (Seto1996) performed a study into the lexicalization of the container-content schema in various languages (Japanese, Korean, Mongolian, Javanese, Turkish, Italian, Germanic and English). This pattern is lexicalized in English by 'kettle':  1. A metal pot for stewing or boiling; usually with a lid 2. The quantity a kettle will hold  His observation was that the pattern is observable in all languages, and can be considered cross-linguistic. This small study seems to indicate that the regular polysemic pattern extends over language family boundaries to such an extent that it almost seems universal. This could suggest that the pattern is rooted in general human conceptualisation, and reflects an important non-arbitrary semantic relation between concepts or objects in the world. Indeed, if we describe the relation between container and content in terms of Aristotle's qualia structure (Pustejovsky 1995), we see that it is the function of a container to hold an object or substance (telic role) and that a container is normally brought into existence for this purpose.</Paragraph>
    <Paragraph position="3"> More small-scale studies like the ones described above have been performed, mostly relying on introspection and small-scale dictionary analysis. A limited number of patterns that are valid in more than one language have been identified such as container/content and producer/product (Peters 2000). With the availability of WordNet and EuroWordNet it has become possible to investigate the cross-linguistic nature of metonymy on a large scale.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. EuroWordNet
</SectionTitle>
    <Paragraph position="0"> EuroWordNet (EWN) (Vossen 1997; Peters 1998) is a multilingual thesaurus incorporating wordnets from eight languages: English, Italian, Dutch, German, Spanish, French, Czech, Estonian. The wordnets have been built in various ways.</Paragraph>
    <Paragraph position="1"> Some of them have been created on the basis of language specific resources and matched onto the original Princeton WordNet (Fellbaum 1998) when the interlingual relations were created. They therefore reflect the language specific lexicalization patterns and semantic organization. Others have been built from the start on the basis of a match between WordNet and bilingual dictionaries.</Paragraph>
    <Paragraph position="2"> In this case the conceptual structure is less language specific but can be regarded as the conceptual overlap between the structure of the English WordNet and the ontological structure associated with that particular language.</Paragraph>
    <Paragraph position="3"> EuroWordNet gives us for the first time the opportunity to examine the question of the language independence of regular polysemy in a more systematic and automatic way.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4. Methodology
</SectionTitle>
    <Paragraph position="0"> The following methodology has been followed: First, the hierarchy of WordNet1.6 was analysed in order to obtain English candidates for regular polysemic patterns (section 4.1). Then a process we call lexical triangulation was applied to these data within EuroWordNet (section 4.2). The results were then manually evaluated.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Automatic candidate selection
</SectionTitle>
      <Paragraph position="0"> A technique was developed (Peters 2000) for identifying sense combinations in WordNet where the senses involved potentially display a regular polysemic relation, i.e. where the senses involved are candidates for systematic relatedness.</Paragraph>
      <Paragraph position="1"> In order to obtain these candidate patterns WordNet (WN) has been automatically analysed by exploiting its hierarchical structure. Wherever there are two or more words with senses in one part of the hierarchy, which also have senses in another part of the hierarchy, then we have a candidate pattern of regular polysemy. The patterns are candidates because there seems to be an observed regularity for two or more words.</Paragraph>
      <Paragraph position="2"> This follows the definition of (Apresjan 1973) mentioned in the introduction.</Paragraph>
      <Paragraph position="3"> An example can be found in Figure 1 below.</Paragraph>
      <Paragraph position="4"> fabric covering hypernym combination (something made by weaving or (a natural object that covers or envelops) felting or knitting or crocheting natural or synthetic fibers) fleece words whose senses occur under both hypernyms  We have restricted our experiments to cases where the related meanings are of the same syntactic class (nouns). The procedure does not discover all regular polysemy rela tions, because the outcome is heavily dependent on the consistency of the encoding of these regularities in WordNet.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Lexical triangulation
</SectionTitle>
      <Paragraph position="0"> In order to determine whether regular polysemy is indeed a cross-linguistic phenomenon, one needs to compare languages, preferably from different language families.</Paragraph>
      <Paragraph position="1"> Data will depend heavily on vocabulary coverage in various languages, and until the advent of EuroWordNet no serious lexical data sets were available for analysis. The EuroWordNet database is the most comprehensive multilingual thesaurus to date. This resource not only provides us with an appropriate amount of lexical information in terms of vocabulary coverage, but also has the additional advantages that its taxonomic building blocks are identical for all languages involved and the language specific concepts are all linked to an interlingua which is based on the full set of the original Princeton WordNet (version 1.5), and is referred to as the interlingual index (ILI).</Paragraph>
      <Paragraph position="2"> We started with a comparative analysis of Germanic and Romance languages. The main reason for this is that the size of the corresponding wordnets is large enough to yield significant results. For our analysis we used three languages: English, Dutch and Spanish, hence the term for this process: lexical triangulation.</Paragraph>
      <Paragraph position="3"> Singling out areas where three language-specific lexicalization patterns converge enabled us to identify metonymic patterns that supported the hypothesis that certain metonymic relationships have a higher degree of universality.</Paragraph>
      <Paragraph position="4"> We extracted the sense combinations of Spanish and Dutch words that participate in any of the potential regular polysemic patterns from the initial large set described in section 4.1. In other words, we concentrate here on lexicalization patterns in three different languages: sense combinations that are lexicalized by one language-specific word in English, Spanish and Dutch.</Paragraph>
      <Paragraph position="5"> The first step in this process was the reduction of the search space for regular polysemic patterns in EuroWordNet. First we determined the conceptual overlap for nouns between the English, Dutch and Spanish wordnets. Table 1 below shows the number of nouns in the three wordnets involved.</Paragraph>
      <Paragraph position="6"> Table 1: conceptual coverage of English, Dutch and Spanish wordnets The conceptual overlap between these wordnets is computed simply by determining the intersection of ILI noun concepts covered by each of the wordnets.</Paragraph>
      <Paragraph position="7"> The total overlap is 17007 ILI concepts.</Paragraph>
      <Paragraph position="8"> There are 920 English polysemous nouns with two senses or more within synsets linked to this set of ILI concepts. Their senses have identical language specific lexicalizations in Spanish and Dutch. For example, the English word church has one sense that is a building and another that is an institution. The same sense distinctions apply to the Spanish iglesia and the Dutch kerk. The senses in the different wordnets are linked through the ILI concepts by means of equivalence synonymy or near-synonymy relations (Vossen 1997).</Paragraph>
      <Paragraph position="9"> The second step was to map these noun senses onto the results from the wordnet analysis described in section 4.1, and then to evaluate the cross-linguistic validity of the regular polysemic patterns that have been projected from the English monolingual wordnet onto the Dutch and Spanish wordnets.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML