File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/a94-1012_intro.xml

Size: 2,731 bytes

Last Modified: 2025-10-06 14:05:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1012">
  <Title>Combination of Symbolic and Statistical Approaches for Grammatical Knowledge Acquisition</Title>
  <Section position="3" start_page="0" end_page="72" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Current technologies in natural language processing are not so mature as to make general purpose systems applicable to any domains; therefore rapid customization of linguistic knowledge to the sub-language of an application domain is vital for the development of practical systems. In the currently working systems, such customization has been carried out manually by linguists or lexicographers with time-consuming effort.</Paragraph>
    <Paragraph position="1"> We have already proposed a mechanism which acquires sublanguage-specific linguistic knowledge from parsing failures and which can be used as a tool for linguistic knowledge customization (Kiyono and Tsujii, 1993; Kiyono and Tsujii, 1994). Our approach is characterized by a mixture of symbolic and statistical approaches to grammatical knowledge acquisition. Unlike probabilistic parsing, proposed by (Fujisaki et al., 1989; Briscoe and Carroll, 1993), *also a staff member of Matsushita Electric Industrial Co.,Ltd., Shinagawa, Tokyo, JAPAN.</Paragraph>
    <Paragraph position="2">  which assumes the prior existence of comprehensive linguistic knowledge, our system can suggest new pieces of knowledge including CFG rules, subcategorization frames, and other lexical features. It also differs from previous proposals on lexical acquisition using statistical measures such as (Church et al., 1991; Brent, 1991; Brown et al., 1993) which either deny the prior existence of linguistic knowledge or use linguistic knowledge in ad hoc ways.</Paragraph>
    <Paragraph position="3"> Our system consists of two components: (1) the rule-based component, which detects incompleteness of the existing knowledge and generates a set of hypotheses of new knowledge and (2) the corpus-based component which selects plausible hypotheses on the basis of their statistical behaviour. As the rule-based component has been explained in our previous papers, in this paper we focus on the corpus-based component.</Paragraph>
    <Paragraph position="4"> After giving a brief explanation of the framework, we describe a data structure called Hypothesis Graph which plays a crucial role in the corpus-based process, and then introduce two statistical measures of hypotheses, Global Plausibility and Local Plausibility, which are iteratively determined to select a set of plausible hypotheses. An experiment which shows the effectiveness of our method is also given.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML