File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1015_intro.xml
Size: 3,611 bytes
Last Modified: 2025-10-06 14:01:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1015"> <Title>Parsing Non-Recursive Context-Free Grammars</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Preliminaries </SectionTitle> <Paragraph position="0"> In this section we briefly recall some standard notions from formal language theory. For more details we refer the reader to textbooks such as (Harrison, 1978).</Paragraph> <Paragraph position="1"> A context-free grammar is a 4-tuple a12a14a13a16a15a18a17a19a15a21a20a22a15 a23a25a24 , where a13 is a finite set of terminals, called the alphabet,a17 is a finite set of nonterminals, including the start symbola20 , anda23 is a finite set of rules having the forma26a28a27a30a29 witha26a32a31a33a17 anda29a34a31a35a12a36a13a38a37a39a17 a24a0 . Throughout the paper we assume the following conventions:a26 ,a40a41a15a43a42a44a42a43a42 denote nonterminals,a45 ,a46a47a15a43a42a44a42a44a42 denote terminals,a48 , a49 ,a29 are strings in a12a36a13a50a37a33a17 a24a0 and a15a53a52 are strings in a13 a0 . We also assume that each CFG is reduced, i.e., no CFG contains nonterminals that do not occur in any derivation of a string in the language. Furthermore, we assume that the input grammars do not contain epsilon rules and that there is only one rulea20a54a27a55a29 defining the start symbola20 .2 Finally, in Section 3 we will consider parsing gram2Strictly speaking, the assumption about the absence of epsilon rules is not without loss of generality, since without epsilon rules the language cannot contain the empty string. However, this has no practical consequence.</Paragraph> <Paragraph position="2"> mars in Chomsky normal form (CNF), i.e., grammars with rules of the forma26a28a27a56a40a58a57 ora26a28a27a55a45 . Instead of working with non-recursive CFGs, it will be more convenient in the specification of our algorithms to encodea3a8a7 as a push-down automaton (PDA) with stack size bounded by some constant.</Paragraph> <Paragraph position="3"> Unlike many text-books, we assume PDAs do not have states; this is without loss of generality, since states can be encoded in the symbols that occur top-most on the stack. Thus, a PDA is a 5-tuple a12a14a13a16a15a60a59a61a15</Paragraph> <Paragraph position="5"> a13 is the alphabet as above, a59 is a finite set of stack symbols including the initial stack symbola62 a63a65a64a43a63a73a66 and the final stack symbola62a69a68a64a47a70a53a71, anda72 is the set of transitions, having one of the following three forms: a62a75a74a27 a62a77a76 (a push transition),</Paragraph> <Paragraph position="7"> transition, scanning symbola45 ). Throughout this paper we use the following conventions: a81a34a15a62 a15a76 a15a82a79 denote stack symbols and a83a39a15a85a84a86a15a88a87 are strings in a59 a0 representing stacks. We remark that in our notation stacks grow from left to right, i.e., the top-most stack symbol will be found at the right end.</Paragraph> <Paragraph position="8"> Configurations of the PDA have the form a12a89a83a39a15a53a52 a24 , wherea83a90a31a77a59 a0 is a stack anda52a91a31a92a13 a0 is the remaining input. We let the binary relationa93 be defined by:</Paragraph> <Paragraph position="10"> tion in a72 of the form a83 a74a27a99a84 , where a51a101a100a103a102 , or of the form a83 a80a74a27 a84 , where a51a104a100 a45 . The relation a93 a0 denotes the reflexive and transitive closure ofa93 . An input stringa52 is recognized by the PDA if and only if a12a62a41a63a105a64a43a63a67a66a15a53a52 a24 a93 a0 a12a62a69a68a64a47a70a53a71a15a102a24 .</Paragraph> </Section> class="xml-element"></Paper>