File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1511_intro.xml

Size: 3,651 bytes

Last Modified: 2025-10-06 14:06:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-1511">
  <Title>Exploiting Contextual Information in Hypothesis Selection for Grammar Refinement</Title>
  <Section position="3" start_page="0" end_page="78" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> One of the essential tasks to realize an efficient natural language processing system is to construct a broad-coverage and high-accurate grammar. In most of the currently working systems, such grammars have been derived manually by linguists or lexicographers. Unfortunately, this task requires time-consuming skilled effort and, in most cases, the obtained grammars may not be completely satisfactory and frequently fail to cover many unseen sentences. Toward these problems, there were several attempts developed for automatically learning grammars based on rule-based approach(Ootani and Nakagawa, 1995), corpus-based approach(Srill, 1992)(Mori and Nagao, 1995) or hybrid approach(Kiyono and Tsujii, 1994b)(Kiyono and Tsujii, 1994a).</Paragraph>
    <Paragraph position="1"> Unlike previous works, we have introduced a new framework for grammar development, which is a combination of rule-based and corpus-based approaches where contextual information can be exploited. In this framework, a whole grammar is not acquired from scratch(Mori and Nagao, 1995) or an initial grammar does not need to be assumed(Kiyono and Tsujii, 1994a). Instead, a rough but effective grammar is learned, in the first place, from a large corpus based on a corpus-based method and then later refined by the way of the combination of rule-based and corpus-based methods. We call the former step of the framework partial grammar acquisition and the latter grammar refinement. For the partial grammar acquisition, in our previous works, we have proposed a mechanism to acquire a partial grammar automatically from a bracketed corpus based on local contextual information(Theeramunkong and Okumura, 1996) and have shown the effectiveness of the derived grammar(Theeramunkong and Okumura, 1997). Through some preliminary experiments, we found out that it seems difficult to learn grammar rules which are seldom used in the corpus.</Paragraph>
    <Paragraph position="2"> This causes by the fact that rarely used rules occupy too few events for us to catch their properties. Therefore in the first step, only grammar rules with relatively high occurrence are first learned.</Paragraph>
    <Paragraph position="3"> In this paper, we focus on the second step, grammar refinement, where some new rules can be added to the current grammar in order to accept unparsable sentences. This task is achieved by two components: (1) the rule-based component, which detects incompleteness of the current grammar and generates a set of hypotheses of new rules and (2) the corpus-based component, which selects plausible hypotheses based on local contextual information.</Paragraph>
    <Paragraph position="4"> In addition, this paper also describes a stochastic parsing model which finds the most likely parse of a sentence and then evaluates the hypothesis selection based on the plausible parse.</Paragraph>
    <Paragraph position="5"> In the rest, we give an explanation of our framework and then describe the grammar refinement process and hypothesis selection based on local contextual information. Next, a stochastic parsing model which exploits contextual information is described.</Paragraph>
    <Paragraph position="6"> Finally, the effectiveness of our approach is shown through some experiments investigating the correctness of selected hypotheses and parsing accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML