File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-2156_concl.xml
Size: 2,471 bytes
Last Modified: 2025-10-06 13:58:09
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2156"> <Title>An alternative LR algorithm for TAGs</Title> <Section position="7" start_page="951" end_page="951" type="concl"> <SectionTitle> 7 Implementation </SectionTitle> <Paragraph position="0"> We have implemented the parser generator, with the extensions from the previous section.</Paragraph> <Paragraph position="1"> We have assumed that each set Adjunct(N), if it is not {nil}, depends only on the nonterminal label of N. This allows more compact storage of the entries goto+-(q,M): for a fixed state q and nonterminal B, several such entries where M has B as label can be collapsed into a single entry goto~(q,B). The goto function for tree substitution is represented similarly.</Paragraph> <Paragraph position="2"> We have constructed the LR table for the English grammar developed by the XTAG project at the University of Pennsylvania. This grammar contains 286 initial trees and 316 auxiliary trees, which together have 5950 nodes. There are 9 nonterminals that allow adjunct+-on, and 10 that allow substitution. There are 21 symbols that function as terminals.</Paragraph> <Paragraph position="3"> Our findings are that for a grammar of this size, the size of the LR table is prohibitively large. The table represented as a collection of unit clauses in Prolog takes over 46 MB for storage. The majority of this is needed to represent the three goto functions, which together require over 2.5 million entries, almost 99% of which is consumed by goto, and the remainder by gotox and the goto function for tree substitution. The reduction functions require almost 80 thousand entries. There are 5610 LR states. The size of the automata for recognizing the sets CS(Rt, E) and CS + (N, E) is negligible: together they contain just over 15 thousand transitions.</Paragraph> <Paragraph position="4"> The time requirements for generation of the table were acceptable: approximately 25 minutes were needed on a standard main frame with moderate load.</Paragraph> <Paragraph position="5"> Another obstacle to practical use is the equivalent of hidden left recurs+-on known from traditional LR parsing (Nederhof and Sarbo, 1996), which we have shown to be present in the grammar for English. This phenomenon precludes realization of nondeterminism by means of backtracking. Tabular realization was investigated by Nederhof (1998) and will be the sub-ject of further research.</Paragraph> </Section> class="xml-element"></Paper>