File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1068_metho.xml
Size: 10,155 bytes
Last Modified: 2025-10-06 14:13:37
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1068"> <Title>Sample Runs</Title> <Section position="2" start_page="0" end_page="417" type="metho"> <SectionTitle> 1 Parser Generator for NLP </SectionTitle> <Paragraph position="0"> Yacc\[4\] was designed for unambiguous progl'anlming languages. Thus, Yacc cat) not elegantly handle a script language with a natural language flavor, i.e. Yacc forces a grammar writer to use tricks for handling ambiguities. To remedy this situation we have developed Nl,yacc which can handle arbitrary context-fi'ee gr;tnlmars t and allows a grammar writer to write natural rules and semantic actions. Although there are several parsing algorithms for a general context-fi'ee language, such as ATN, CYI(, and garley, &quot;the generalized Eli. parsing algorithm \[2\]&quot; would be the best in terms of its compatibility with Yacc and its efficiency.</Paragraph> <Paragraph position="1"> An ambiguous grammar causes a conflict in the parsing table, a state which has more than one action in an entry. The. generalized LR parsing proceeds exactly tit(.' same way as the stm~dard one except when it encounters a conflict. The standard deterministic LR parser chooses only one action in this situation. The generalized I,R parser, on the other hand, performs all the actions in the multiple entry by *This work was done while lshil stayed at l)ept, of splitting the parse stack fin' each action. The parser merges the divided sta.ck br;tnches, only when they have the same top state. This merger operation is important for efficiency. As a resuit, the stacl( becomes a. gra.plt instead of a simph,, linear state sequence.</Paragraph> <Paragraph position="2"> There is already a generalized LR parser for natural language processing developed at Carnegie Mellon University \[3\]. Nl,yacc diflhrs fi'om CMU's system in the following points.</Paragraph> <Paragraph position="3"> * NLyacc is written in C, while CMU's in Lisp.</Paragraph> <Paragraph position="4"> * CMU's cannot handh', c rules, while NI,yace does. c rules are handful for writing natural rules.</Paragraph> <Paragraph position="5"> The way to execute semantic actions differs. CMU's evaluates an Ll?(\]-like semantic action attached to each rule when reduce action is performed on that rule.</Paragraph> <Paragraph position="6"> NLyacc executes a semantic action in two levels; one is perfin'med during parsing for syntactic control and the. other is performed onto each successfifl final p;~rse. We will desc.ribe the details of NLyacc's approach in the next section.</Paragraph> <Paragraph position="7"> NLyacc is ,,pper-compatible to Yacc. NLyacc consists of three modules; a reader, a parsing table constructor, and a drive routine for the gene.ralized LR parsing. The reader accepts grammar ruh;s in the Yacc format. The table constructor produces a generalized LR. parsing t;tble instead of the standard I,R. parsing table. We describe the de.tails of the parser in the next sectiou.</Paragraph> </Section> <Section position="3" start_page="417" end_page="418" type="metho"> <SectionTitle> 2 Execution of Semantic Ac- </SectionTitle> <Paragraph position="0"> tions NLyacc differs from Yacc mainly in the execution process of semantic actions attached to each grammar rule. Namely, Yacc evaluates a semantic action a.q it parses the input. We examine if this evaluation mechanism is suitable for the generalized LR. parsing here. If we can assume that there is only one final parse, the parser can ewtluate semantic actions when only one branch exists on top of the stack. Although having only one final parse is often the cruse in practical applications, the constraint of being unambiguous is too strong in generM.</Paragraph> <Section position="1" start_page="417" end_page="417" type="sub_section"> <SectionTitle> 2.1 Handling Side Effects </SectionTitle> <Paragraph position="0"> Next, we examine what would happen if semantic actions are executed during parsing. When a reduce action is performed, the parser evaluates the action attached to the current rule.</Paragraph> <Paragraph position="1"> As described in the previous section, the parse stack grows in a graph form. Thus, when the action contains side effects like an assignment operation to a variable shared by different actions, that side effect must not propagate to tile other paths in the graph.</Paragraph> <Paragraph position="2"> If an environment, which is a set of v,zdue of variables, is prepared to each path of the parse branches, such side effect can be encapsulated.</Paragraph> <Paragraph position="3"> When a stack splits, a copy of the environment should be created for each branch. When the parse branches are merged, however, each environment can not be merged. Instead, the merged state must have all the environments.</Paragraph> <Paragraph position="4"> Thus, the number of environments grows exponentially as parsing proceeds. Therefore this approach decreases the parsing e\[Iiciency drastically. Also this high cost operation would be vain when the parse fails in the middle. To sum it up, although this approach retains compatibility with Yacc, it sacrifices efficiency too much.</Paragraph> </Section> <Section position="2" start_page="417" end_page="418" type="sub_section"> <SectionTitle> 2.2 Two Kinds of Semantic Actions </SectionTitle> <Paragraph position="0"> We, therefore, take another approach to handling semantic actions in NLyacc. Namely, the parser just keeps a list of actions to be executed, and performs all the actions after parsing is done. This method can avoid the problem above during parsing. After parsing is done, the semantic action evMuator performs the task as it traces all the history paths one by one. This approach retains parsing efficiency and can avoid the execution of useless semantic actions. A drawback of this approach is that semantic actions can not control the syntactic parsing, because actions are not evaluated until tile parsing is clone. To compensate the cons above, we have introduced a new semantic action enclosed with \[ \] to enable a user to discard semantically incorrect parses in the middle of parsing.</Paragraph> <Paragraph position="1"> Namely, there are two types of semantic actions: null An action enclosed with \[ \] is executed during parsing .just as done in Yacc. If 'return 0;' is execute<t in the action, the partial parse having invoked this action fails and is disca.rded.</Paragraph> <Paragraph position="2"> * An action enclosed with { ) is executed alter the syntactic parsing.</Paragraph> <Paragraph position="3"> In the example below, the bracketed action checks if the subtraction result is negative, and, if true, discar<ts its partial parse.</Paragraph> <Paragraph position="5"/> </Section> <Section position="3" start_page="418" end_page="418" type="sub_section"> <SectionTitle> 2.3 Keeping Parse History </SectionTitle> <Paragraph position="0"> Our generalized Lll. parsing algorithm is different from tile original one \[2\] in that ore' algorithm keeps a history of parse actions to execute semantic actions after the syntactle parsing. The original algorithm uses a packe<l forest representation for the stack, whereas our algorithm uses a list representation.</Paragraph> <Paragraph position="1"> The algorithm of keeping the parse history is shown as follows.</Paragraph> <Paragraph position="2"> 1) If the next action is &quot;shift s&quot;, then make < s > as the history, where < s > is a list of only one element s.</Paragraph> <Paragraph position="3"> 2) If the next action is &quot;reduce r : A -+ BIB2 &quot;&quot;.11~&quot;, then make append(lh, lI2, ..., IIn, l-r\]) as the history, where Hi is a history of Bi, r is the rule number of production &quot;A -+ 1~1132 * &quot;1\],/', an<l the function 'append' concatenates multiple lists and returns the result.</Paragraph> <Paragraph position="4"> Now we describe how to execute semantic actions using the parse history. First, before starting to parse, the parser ca.lls &quot;yyinit&quot; function to initialize wtriables in the semantic actions. Our system requires the. user to define &quot;yyinit&quot; to set initial values to the variables. Next, the parser starts parsing and l)erforms a shift action or a reduce action according to the parse history and evaluates the apl)ropriate semantic actions.</Paragraph> </Section> <Section position="4" start_page="418" end_page="418" type="sub_section"> <SectionTitle> 2.4 Efficient Memory Management </SectionTitle> <Paragraph position="0"> We use a list structure to implement the. parse stack, because the stack becomes a complex grN)h structure as described l)reviously. Because the parser discards fa.iled branches of the stack, the system rechfims the memory allocated for the discarded parses using the &quot;mark and sweep garhage collection algorithm \[1\]&quot; to use memory efficiently. 'Phis garl)age collection is triggered only when the memory is exhausted in our current implementation.</Paragraph> </Section> </Section> <Section position="4" start_page="418" end_page="418" type="metho"> <SectionTitle> 3 Distribution </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="418" end_page="418" type="sub_section"> <SectionTitle> Portability </SectionTitle> <Paragraph position="0"> Currently, NLyacc runs on UNIX worksta.tions and DOS personal computers.</Paragraph> </Section> <Section position="2" start_page="418" end_page="418" type="sub_section"> <SectionTitle> Debugging Grammars </SectionTitle> <Paragraph position="0"> For grammar debugging, NLyacc provides l)arse trace information such as a history of shift/reduce actions, execution information of '\[\] actions.' When NLya.cc encounters an error state, &quot;yyerror&quot; function is called just a.s in Yacc. Distribution NLyacc is distributed through e-mail (ple:tse contact nlyacc~nak.math.keio.ac.jp). I)istribution package includes all the source codes, a manual, and some sample grammars.</Paragraph> </Section> </Section> class="xml-element"></Paper>