File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-1028_intro.xml
Size: 3,798 bytes
Last Modified: 2025-10-06 14:00:41
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-1028"> <Title>Experiments with Corpus-based LFG Specialization</Title> <Section position="3" start_page="0" end_page="204" type="intro"> <SectionTitle> 2 LFG and Grammar Pruning </SectionTitle> <Paragraph position="0"> The LFG formalism (Kaplan and Bresnan, 1982) allows the right-hand sides (RHS) of grammar rules to consist of a regular expression over grammar symbols. This makes it more appropriate to refer to the grammar rules as rule schemata, since each RHS can potentially be expanded into a (possibly infinite) number of distinct sequences of grammar symbols, each corresponding to a traditional phrase-structure rule. As can easily be imagined, the use of regular-expression operators such as Kleene-star and complementation may introduce a considerable amount of spurious ambiguity. Moreover, the LFG formalism provides operators which -- although not increasing its theoretical expressive power -- allow rules to be written more concisely. Examples of such operators are the ignore operator, which allows skipping any sequence of grammar symbols that matches a given pattern; the shuffle operator, which allows a set of grammar symbols to occur in any order; and the linear precedence operator, which allows partially specifying the order of grammar symbols.</Paragraph> <Paragraph position="1"> The pruning method we propose consists in eliminating complex operators from the grammar description by considering how they were actually instantiated when parsing a corpus. In LFGs, each rule scheme corresponds to a particular grammar symbol, since different expansions of the same symbol are expressed as alternatives in the regular expression on its RHS. We can define a specific path through the RHS of a rule scheme by the choices ~tf~ 211 made when matching it against some sequence of grammar symbols. Our training data allows us to derive, for each training example, the choices made at each rule expansion. By applying these choices to the rule scheme in isolation, we can derive a phrase-structure rule from it,.</Paragraph> <Paragraph position="2"> The grammar is specialized, or pruned, by retaining all and only those phrase-structure rules that correspond to a path taken through a rule scheme when expanding some node in some training example. Since the grammar formalism requires that each LHS occur only in one rule scheme in the grammar, extracted rules with the same LHS symbol are merged into a single rule scheme with a disjunction operator at its top level. For instance, if a rule scheme with the structure then it will be replaced by a rule scheme with the following structure A --+ {CIBC\]BD} The same approach is taken to replace all regular-expression operators, other than concatenation, with the actual sequences of grammar symbols that are matched against them. A more realistic example, taken from the actual data, is shown in Figure 1: none of the optional alternative portions following the V is ever used in any correct parse in the corpus. Moreover, the ADVP preceding the V occurs only 0 or 1 times in correct parses.</Paragraph> <Paragraph position="3"> Like other unification-based formalisms, lexical functional grammars allow grammar rules to be annotated with sets of feature-based constraints, here called &quot;functional descriptions&quot;, whose purpose is both to enforce additional constraints on rule applicability and to build an enriched predicate-argument structure called &quot;f-structure&quot;, which, together with the parse tree, constitutes the output of the parsing process. As these constraints are maintained verbatim in the specialized version of the rule scheme, this poses no problem for this form of grammar pruning.</Paragraph> </Section> class="xml-element"></Paper>