File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1608_intro.xml
Size: 4,592 bytes
Last Modified: 2025-10-06 14:01:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1608"> <Title>Extending the Coverage of a Valency Dictionary</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> One of the severest problems facing machine translation between Asian languages is the lack of suitable language resources. Even when word-lists or simple bilingual dictionaries exist, it is rare for them to include detailed information about the syntax and meaning of words.</Paragraph> <Paragraph position="1"> In this paper we present a method of adding new entries to a bilingual valency dictionary.</Paragraph> <Paragraph position="2"> New entries are based on existing entries, so have the same amount of detailed information.</Paragraph> <Paragraph position="3"> The method bootstraps from an initial hand built lexicon, and allows new entries to be added cheaply and e ectively. Although we will use Japanese and English as examples, the algorithm is not tied to any particular language pair or dictionary. The core idea is to add new entries to the valency dictionary by using Japanese-English pairs from a plain bilingual dictionary (without detailed information about valency or selectional restrictions), and build new entries for them based on existing entries.</Paragraph> <Paragraph position="4"> It is well known that detailed information about verb valency (subcategorization) and selectional restrictions is useful both for monolingual parsing and selection of appropriate translations in machine translation. As well as being useful for resolving parsing ambiguities, verb valency information is particularly important for complicated processing such as identi cation and supplementation of zero pronouns. However, this information is not encoded in normal human-readable dictionaries, and is hard to enter manually. Shirai (1999) estimates that at least 27,000 valency entries are needed to cover around 80% of Japanese verbs in a typical newspaper, and we expect this to be true of any language. Various methods of creating detailed entries have been suggested, such as the extraction of candidates from corpora (Manning, 1993; Utsuro et al., 1997; Kawahara and Kurohashi, 2001), and the automatic and semi-automatic induction of semantic constraints (Akiba et al., 2000). However, the automatic construction of monolingual entries is still far from reaching the quality of hand-constructed resources. Further, large-scale bilingual resources are still rare enough that it is much harder to automatically build bilingual entries.</Paragraph> <Paragraph position="5"> Our work di ers from corpus-based work such as Manning (1993) or Kawahara and Kurohashi (2001) in that we are using existing lexical resources rather than a corpus. Thus our method will work for rare words, so long as we can nd them in a bilingual dictionary, and know the English translation. It does not, however, learn new frames from usage examples.</Paragraph> <Paragraph position="6"> In order to demonstrate the utility of the valency information, we give an example of a sentence translated with the system default information (basically a choice between transitive and intransitive), and the full valency information. The verb is a0a2a1a4a3a6a5 kamei-suru \order&quot;, which takes a sentential complement. In (1)1 the underlined part is the sentential complement. The verb valency entry is the same as a7 a8 a3a9a5 joushin-suru \report&quot; [NP-ga Cl-to V], except with the clause marked as to-in nitival.2 The translation with the valency information is far from perfect, but it is comprehensible.</Paragraph> <Paragraph position="7"> Without the valency information the translation is incomprehensible.</Paragraph> <Paragraph position="9"> \The king ordered his follower to sally forth.&quot; with: The king ordered a follower that sallied forth.</Paragraph> <Paragraph position="10"> without: * ordered to a follower that the king, sallied forth.</Paragraph> <Paragraph position="11"> In general, translation tends to simplify text, because the target language will not be able to represent exactly the same shades of meaning as the source text, so there is some semantic loss. Therefore, in many cases, a single target language entry is the translation of many similar source patterns. For example, there are 23 Japanese predicates linked to the English entry report in the valency dictionary used by the Japanese-to-English machine translation system ALT-J/E (Ikehara et al., 1991).</Paragraph> </Section> class="xml-element"></Paper>