File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/j96-3002_abstr.xml

Size: 3,441 bytes

Last Modified: 2025-10-06 13:48:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="J96-3002">
  <Title>Machine Learning Comprehension Grammars for Ten Languages</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Our approach to machine learning of language combines psychological, linguistic, and logical concepts. We believe that the five central features of our approach--probabilistic association of words and meanings, grammatical and semantical form generalization, grammar computations, congruence of meaning, and dynamical assignment of denotational value to a word--are either new, or are new in their present combination. An overview of these concepts and related ones is given in Section 2.1. Two prior papers describing this approach, first presented at two conferences in 1991, are Suppes, Liang, and B6ttner (1992) and Suppes, B6ttner, and Liang (1995).</Paragraph>
    <Paragraph position="1"> Using the theory embodying the concepts just listed, we report on our machine learning program of corpora from ten natural languages. Following our earlier work, we use a robotic framework. The computer program based on the theory learns a natural language from examples, which are commands occurring in mechanical assembly tasks (e.g., Go to the screw, Pick up a nut, Put the black screw into the round hole). A major improvement here, in comparison to Suppes, B(Sttner, and Liang (1995), is that the association relation is generalized from a unique correspondence between words and the program's internal representation of their meaning to a many-to-one relation, which permits different words to be associated to the same internal representation.</Paragraph>
    <Paragraph position="2"> This change is particularly important for the purpose of capturing case variation in word forms in inflecting languages such as Russian or German.</Paragraph>
    <Paragraph position="3"> The robotic framework and the associated corpora we test our program on are certainly restricted, although we have implemented our learning program on Robotworld, a standard robot used in academic settings for development purposes. In the present paper, however, we have deliberately formulated the general learning axioms of our theory so they do not depend on the robotic framework. The axioms are meant to apply to many kinds of systematic language use, more or less in the sense of sublanguages, (see Kittredge and Lehrberger \[1982\]). We are already deep into our next * CSLI, Ventura Hall, Stanford CA 94305-4115 @ 1996 Association for Computational Linguistics Computational Linguistics Volume 22, Number 3 area of application, machine understanding of physics word problems, and we have not needed to change the general formulation of our theory to accommodate this quite different problem of language learning.</Paragraph>
    <Paragraph position="4"> In this paper, we first describe our theory of machine learning of natural language (Section 2), and then describe the corpora in ten languages that we used for experimental purposes (Section 3). The languages are: English, Dutch, German, French, Spanish, Catalan, Russian, Chinese, Korean, Japanese. In Section 4 we describe some empirical results, especially the comprehension grammars generated from learning the languages. Finally, in Section 5 we discuss related work and the most pressing unsolved problems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML