File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0616_intro.xml

Size: 3,654 bytes

Last Modified: 2025-10-06 14:03:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0616">
  <Title>An Analogical Learner for Morphological Analysis</Title>
  <Section position="3" start_page="0" end_page="120" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Analogical learning (Gentner et al., 2001) is based on a two-step inductive process. The first step consists in the construction of a structural mapping between a new instance of a problem and solved instances of the same problem. Once this mapping is established, solutions for the new instance can be induced, based on one or several analogs. The implementation of this kind of inference process requires techniques for searching for, and reasoning with, structural mappings, hence the need to properly define the notion of analogical relationships and to efficiently implement their computation.</Paragraph>
    <Paragraph position="1"> In Natural Language Processing (NLP), the typical dimensionality of databases, which are made up of hundreds of thousands of instances, makes the search for complex structural mappings a very challenging task. It is however possible to take advantage of the specific nature of linguistic data to work around this problem. Formal (surface) analogical relationships between linguistic representations are often a good sign of deeper analogies: a surface similarity between the word strings write and writer denotes a deeper (semantic) similarity between the related concepts. Surface similarities can of course be misleading. In order to minimize such confusions, one can take advantage of other specificities of linguistic data: (i) their systemic organization in (pseudo)-paradigms, and (ii) their high level of redundancy. In a large lexicon, we can indeed expect to find many instances of pairs like write-writer: for instance read-reader, review-reviewer...</Paragraph>
    <Paragraph position="2"> Complementing surface analogies with statistical information thus has the potential to make the search problem tractable, while still providing with many good analogs. Various attempts have been made to use surface analogies in various contexts: automatic word pronunciation (Yvon, 1999), morphological analysis (Lepage, 1999a; Pirrelli and Yvon, 1999) and syntactical analysis (Lepage, 1999b). These experiments have mainly focused on linear represen- null tations of linguistic data, taking the form of finite sequences of symbols, using a restrictive and sometimes ad-hoc definition of the notion of an analogy. The first contribution of this paper is to propose a general definition of formal analogical proportions for algebraic structures commonly used in NLP: attribute-value vectors, words on finite alphabets and labeled trees. The second contribution is to show how these formal definitions can be used within an instance-based learning framework to learn morphological regularities.</Paragraph>
    <Paragraph position="3"> This paper is organized as follows. In Section 2, our interpretation of analogical learning is introduced and related to other models of analogical learning and reasoning. Section 3 presents a general algebraic framework for defining analogical proportions as well as its instantiation to the case of words and labeled trees. This section also discusses the algorithmic complexity of the inference procedure.</Paragraph>
    <Paragraph position="4"> Section 4 reports the results of experiments aimed at demonstrating the flexibility of this model and at assessing its generalization performance. We conclude by discussing current limitations of this model and by suggesting possible extensions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML