File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/j00-1003_abstr.xml
Size: 3,206 bytes
Last Modified: 2025-10-06 13:41:42
<?xml version="1.0" standalone="yes"?> <Paper uid="J00-1003"> <Title>Practical Experiments with Regular Approximation of Context-Free Languages</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Several methods of regular approximation of context-free languages have been proposed in the literature. For some, the regular language is a superset of the context-free language, and for others it is a subset. We have implemented a large number of methods, and where necessary, refined them with an analysis of the grammar. We also propose a number of new methods.</Paragraph> <Paragraph position="1"> The analysis of the grammar is based on a sufficient condition for context-free grammars to generate regular languages. For an arbitrary grammar, this analysis identifies sets of rules that need to be processed in a special way in order to obtain a regular language. The nature of this processing differs for the respective approximation methods. For other parts of the grammar, no special treatment is needed and the grammar rules are translated to the states and transitions of a finite automaton without affecting the language.</Paragraph> <Paragraph position="2"> Few of the published articles on regular approximation have discussed the application in practice. In particular, little attention has been given to the following two questions: First, what happens when a context-free grammar grows in size? What is then the increase of the sizes of the intermediate results and the obtained minimal deterministic automaton? Second, how &quot;precise&quot; are the approximations? That is, how much larger than the original context-free language is the language obtained by a superset approximation, and how much smaller is the language obtained by a subset approximation? (How we measure the &quot;sizes&quot; of languages in a practical setting will become clear in what follows.) Some considerations with regard to theoretical upper bounds on the sizes of the intermediate results and the finite automata have already been discussed in Nederhof (1997). In this article we will try to answer the above two questions in a practical setring, using practical linguistic grammars and sentences taken from a spoken-language corpus.</Paragraph> <Paragraph position="3"> * DFKI, Stuhlsatzenhausweg 3, D-66123 Saarbriicken, Germany. E-mail: nederhof@dfki.de (c) 2000 Association for Computational Linguistics Computational Linguistics Volume 26, Number 1 The structure of this paper is as follows: In Section 2 we recall some standard definitions from language theory. Section 3 investigates a sufficient condition for a context-free grammar to generate a regular language. We also present the construction of a finite automaton from such a grammar. In Section 4, we discuss several methods to approximate the language generated by a grammar if the sufficient condition mentioned above is not satisfied. These methods can be enhanced by a grammar transformation presented in Section 5. Section 6 compares the respective methods, which leads to conclusions in Section 7.</Paragraph> </Section> class="xml-element"></Paper>