File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/a00-1028_relat.xml
Size: 4,239 bytes
Last Modified: 2025-10-06 14:15:34
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-1028"> <Title>Experiments with Corpus-based LFG Specialization</Title> <Section position="6" start_page="206" end_page="207" type="relat"> <SectionTitle> 5 Related Work </SectionTitle> <Paragraph position="0"> The work presented in the current article is related to previous work on corpus-based grammar specialization as presented in (Rayner, 1988; Salnuelsson and Rayner, 1991; Rayner and Carter, 1996; Samuelsson, 1994; Srinivas a.nd Joshi, 1995; Neumann, 1997).</Paragraph> <Paragraph position="1"> The line of work described in (Rayner, 1988; Samuelsson and Rayner, 1991; Rayner and Carter, 1996; Samuelsson, 1994) deals with unification-based grammars that already have a purelyconcatenative context-fi'ee backbone, and is more concerned with a different t~orm of specialization, consisting in the application of explanation-based learning (EBL). Here, the central idea is to collect the most frequently occurring subtrees in a treebank and use them as atomic units for parsing. The cited works differ mainly in the criteria adopted for selecting subtrees fi'om the treebank. In (Rayner, 1988; Samuelsson and Rayner, 1991; Rayner and Carter, 1996) these criteria are handcoded: all subtrees satisfying some properties are selected, and a new grammar rule is created by flattening each such subtree, i.e., by taking the root as lefl.-hand side and the yield as right-hand side, and in the process performing all unifications corresponding to the thus removed internal nodes. Experiments carried out on a corpus of 15,000 trees from the ATIS domain using a version of the SRI Core Language Engine resulted in a speedup of about 3.4 at a cost of 5% in gralmnatical coverage, which however was compensated by an increase in parsing accuracy.</Paragraph> <Paragraph position="2"> Finding suitable tree-cutting criteria requires a considerable amount of work, and must be repeated for each new grammar and for each new domain to which the grammar is to be specialized. Samuelsson (Samuelsson, 1994) proposes a technique to automatically selects what subtrees to retain. The selection of appropriate subtrees is done by choosing a subset of nodes at which to cut trees. Cutnodes are determined by computing the entropy of each node, and selecting only those nodes whose entropy exceeds a given threshold. Intuitively, nodes with low entropy indicate locations in the trees where a given symbol was expanded using a predictable set of rules, at least most of the times, so that the loss of coverage that derives from ignoring the remaining cases is low. Nodes with high entropy, on the other hand, indicate positions in which there is a high uncertainty in what rule was used to expand the symbol, so that it is better to preserve all alternatives. Several schemas are proposed to compute entropies, each leading to a different trade-off be~fllR null tween coverage reduction and speedup. In general, results are not quite as good as those obtained using handcoded criteria, though of course the specialized grammar is obtained fully automatically, and thus with much less effort.</Paragraph> <Paragraph position="3"> When ignoring issues related to the elimination of complex operators t&quot;1&quot;o111 the RHS of rule schemata, the grammar-pruning strategy described in the current article is equivalent to explanation-based learning where all nodes have been selected,as eutnodes. Conversely, EBL can be viewed as higher-order grammar pruning, removing not grammar rules, but gramlnar-rule combinations.</Paragraph> <Paragraph position="4"> Some of the work done on data-oriented parsing (DOP) (Bod, 1993; Bod and Scha, 1996; Bod and Kaplan, 1998; Sima'an, 1999) can also be considered related to our work, as it can be seen as a way to specialize in an gBL-like way the (initially unknown) grammar implicitly underlying a treebank.</Paragraph> <Paragraph position="5"> (Srinivas and aoshi, 1995) and (Neumann, 1997) apply EBL to speed up parsing with tree-adjoining grammars and sentence generation with HPSGs respectively, though they do so by introducing new components in their systems rather then by modifying the grammars they use.</Paragraph> </Section> class="xml-element"></Paper>