File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2174_intro.xml
Size: 5,483 bytes
Last Modified: 2025-10-06 14:06:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2174"> <Title>Practical Glossing by Prioritised Tiling</Title> <Section position="2" start_page="0" end_page="1061" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In a lexicalist MT framework such as Shake-and-Bake (Whitelock, 1994), translation * equivalence is defined between collections of (suitably constrained) lexical material in the two languages. Such an approach has been shown to be effective in the description of many types of complex bilingual equivalence.</Paragraph> <Paragraph position="1"> However, the complexity of the associated parsing and generation phases leaves a system of this type some way from commercial exploitation. The parsing phase that is needed to establish adequate constraints on the words is of cubic complexity, while the most general generation algorithm, needed to order the words in the target text, is O(n 4) (Poznanski et al. 1996). In this paper, we show how a novel application domain, glossing, can be explored within such a framework, by omitting generation entirely and replacing syntactic parsing by a simple combination of morphological analysis and tagging. The poverty of constraints established in this way, and the consequent inaccuracy in translation, is mitigated by providing a menu of alternatives for each gloss. The gloss is automatically updated in the light of user choices. While the availability of alternatives is generally desirable in automatic translation, it is the limitation to glossing which makes it feasible to manage the consistency maintenance required.</Paragraph> <Paragraph position="2"> Glossing as a technique for elucidating the grammar and lexis of a second language text is well-known from the linguistics literature.</Paragraph> <Paragraph position="3"> Each morpheme in the object language is provided with its meta-language equivalent aligned beneath it. Such a glosser may be used as a tool for second-language improvement (Nerbonne and Smit, 1996), and thus provide an educational alternative to the passive consumption of a (usually low quality) translation. We envisage the glosser's primary use as a tool for cross-language information gathering, and thus think it best not to display grammatical information. Our glosser improves on the use of printed or even on-line dictionaries in several ways: risk of market, failure owin~ to the intar~ble, ubiquitous, and, above all, indivisible nature of information goods and to the ease with which free riders may have</Paragraph> <Paragraph position="5"> and international intellectual ro err systems responded laconically, if not with indifferencel, to, the compilers&quot; dilemma.7 This indifference stemmedi in part from~ the ~-i=~Vz~..~.,~b~. ' -~:\]~'~ tYb.t\]~ ~4lt~,~, ~J+ --~ inability of the worldwide intellectual Dro#ertv system to ..m.a+.t.c.h...-, compilations of data .t..o.~< the basic subiect matter categories covered, respectively, by the Paris The glosser attempts to find all plausible equivalents for the words and multi-word expressions that constitute a text, displaying the most appropriate consistent subset as its first choice and the remainder within menus.</Paragraph> <Paragraph position="6"> Consistency is maintained by treating source language lexical material as resources that are consumed by the matching of equivalences, so that the latter partially tile the text 1. Our model has much in common with that of Alshawi (1996), though our linguistic representations are relatively impoverished. Our aim is not true translation but the use of large existing bilingual lexicons for very wide-coverage glossing. We have discovered that the effect of tiling with a large ordered set of detailed equivalences is to provide a close approximation to richer schemes for syntactic analysis.</Paragraph> <Paragraph position="7"> An example English-Japanese gloss as produced by our system is shown in Figure 1. Multi-word 1 Equivalences are not only consumers of source language resources but also producers of target language ones. In glossing, the production of target language resources need not be complete - every word needs a translation, but not every word needs a gloss. Tiling thus need only be partial.</Paragraph> <Paragraph position="8"> collocations are underlined and discontinuous ones are also given a number (and colour) to facilitate identification. Note how stemmed ...</Paragraph> <Paragraph position="9"> from is a discontinuous collocation surrounding the continuous collocation in part. The pop-up menu shows the alternatives for fruit, by sense at the top-level with run-offs to synonyms, and at the bottom an option to access the machine-readable version of 'Genius', a published English Japanese dictionary.</Paragraph> <Paragraph position="10"> The structure of this paper is as follows. In 2.1 we outline the basic operation of the system, introducing our representation of natural language collocations as key descriptors, and give a probabilistic interpretation for these in 2.2. Section 3 describes the algorithm for tiling a sentence using key descriptors, and goes on to describe a series of heuristics which approximate the full probabilistic model. Section 4 presents the results of a preliminary evaluation of the glosser' s performance. Finally in section 5 we give our conclusions and make some suggestions for future improvements to the system.</Paragraph> </Section> class="xml-element"></Paper>