File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1041_metho.xml

Size: 18,343 bytes

Last Modified: 2025-10-06 14:08:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1041">
  <Title>Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Automatic F-Structure Annotation
</SectionTitle>
    <Paragraph position="0"> The Penn-II treebank employs CFG trees with additional &amp;quot;functional&amp;quot; node annotations (such as -LOC, -TMP, -SBJ, -LGS, . . . ) as well as traces and coindexation (to indicate LDDs) as basic data structures.</Paragraph>
    <Paragraph position="1"> The f-structure annotation algorithm of (Cahill et 3LFGs may also involve morphological and semantic levels of representation.</Paragraph>
    <Paragraph position="2"> al., 2002) exploits configurational, categorial, Penn-II &amp;quot;functional&amp;quot;, local head and trace information to annotate nodes with LFG feature-structure equations. A slightly adapted version of (Magerman, 1994)'s scheme automatically head-lexicalises the Penn-II trees. This partitions local subtrees of depth one (corresponding to CFG rules) into left and right contexts (relative to head). The annotation algorithm is modular with four components (Figure 2): left-right (L-R) annotation principles (e.g. leftmost NP to right of V head of VP type rule is likely to be an object etc.); coordination annotation principles (separating these out simplifies other components of the algorithm); traces (translates traces and coindexation in trees into corresponding reentrancies in f-structure ( 1 in Figure 3)); catch all and clean-up.</Paragraph>
    <Paragraph position="3"> Lexical information is provided via macros for POS tag classes.</Paragraph>
    <Paragraph position="4">  The f-structure annotations are passed to a constraint solver to produce f-structures. Annotation is evaluated in terms of coverage and quality, summarised in Table 1. Coverage is near complete with 99.82% of the 48K Penn-II sentences receiving a single, connected f-structure. Annotation quality is measured in terms of precision and recall (P&amp;R) against the DCU 105. The algorithm achieves an F-score of 96.57% for full f-structures and 94.3%  ing&amp;quot;minor&amp;quot; features such as person, number etc. The stricter preds-only captures only paths ending in PRED:VALUE.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 PCFG-Based LFG Approximations
</SectionTitle>
    <Paragraph position="0"> Based on these resources (Cahill et al., 2002) developed two parsing architectures. Both generate PCFG-based approximations of LFG grammars.</Paragraph>
    <Paragraph position="1"> In the pipeline architecture a standard PCFG is extracted from the &amp;quot;raw&amp;quot; treebank to parse unseen text. The resulting parse-trees are then annotated by the automatic f-structure annotation algorithm and resolved into f-structures.</Paragraph>
    <Paragraph position="2"> In the integrated architecture the treebank is first annotated with f-structure equations.</Paragraph>
    <Paragraph position="3"> An annotated PCFG is then extracted where each non-terminal symbol in the grammar has been augmented with LFG f-equations:</Paragraph>
    <Paragraph position="5"> followed by annotations are treated as a monadic category for grammar extraction and parsing.</Paragraph>
    <Paragraph position="6"> Post-parsing, equations are collected from parse trees and resolved into f-structures.</Paragraph>
    <Paragraph position="7"> Both architectures parse raw text into &amp;quot;proto&amp;quot; f-structures with LDDs unresolved resulting in incomplete argument structures as in Figure 4.</Paragraph>
    <Paragraph position="8">  Theoretically, LDDs can span unbounded amounts of intervening linguistic material as in [U.N. signs treaty]1 the paper claimed . . . a source said []1. In LFG, LDDs are resolved at the f-structure level, obviating the need for empty productions and traces in trees (Dalrymple, 2001), using functional uncertainty (FU) equations. FUs are regular expressions specifying paths in f-structure between a source (where linguistic material is encountered) and a target (where linguistic material is interpreted semantically). To account for the fronted sentential constituents in Figures 3 and 4, an FU equation of the form  |TOPIC =  |COMP* COMP would be required.</Paragraph>
    <Paragraph position="9"> The equation states that the value of the TOPIC attribute is token identical with the value of the final COMP argument along a path through the immediately enclosing f-structure along zero or more COMP attributes. This FU equation is annotated to the topicalised sentential constituent in the relevant CFG rules as follows</Paragraph>
    <Paragraph position="11"> and generates the LDD-resolved proper f-structure in Figure 3 for the traceless tree in Figure 4, as required. null In addition to FU equations, subcategorisation information is a crucial ingredient in LFG's account of LDDs. As an example, for a topicalised constituent to be resolved as the argument of a local predicate as specified by the FU equation, the local predicate must (i) subcategorise for the argument in question and (ii) the argument in question must not be already filled. Subcategorisation requirements are provided lexically in terms of semantic forms (subcat lists) and coherence and completeness conditions (all GFs specified must be present, and no others may be present) on f-structure representations. Semantic forms specify which grammatical functions (GFs) a predicate requires locally. For our example in Figures 3 and 4, the relevant lexical entries are:</Paragraph>
    <Paragraph position="13"> FU equations and subcategorisation requirements together ensure that LDDs can only be resolved at suitable f-structure locations.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Acquiring Lexical and LDD Resources
</SectionTitle>
    <Paragraph position="0"> In order to model the LFG account of LDD resolution we require subcat frames (i.e. semantic forms) and LDD resolution paths through f-structure. Traditionally, such resources were handcoded. Here we show how they can be acquired from f-structure annotated treebank resources.</Paragraph>
    <Paragraph position="1"> LFG distinguishes between governable (arguments) and nongovernable (adjuncts) grammatical functions (GFs). If the automatic f-structure annotation algorithm outlined in Section 3 generates high quality f-structures, reliable semantic forms can be extracted (reverse-engineered): for each f-structure generated, for each level of embedding we determine the local PRED value and collect the governable, i.e. subcategorisable grammatical functions present at that level of embedding. For the proper f-structure in Figure 3 we obtain sign([subj,obj]) and say([subj,comp]). We extract frames from the full WSJ section of the Penn-II Treebank with 48K trees. Unlike many other approaches, our extraction process does not predefine frames, fully reflects LDDs in the source data-structures (cf.</Paragraph>
    <Paragraph position="2"> Figure 3), discriminates between active and passive frames, computes GF, GF:CFG category pairas well as CFG category-based subcategorisation frames and associates conditional probabilities with frames. Given a lemma l and an argument list s, the probability of s given l is estimated as:</Paragraph>
    <Paragraph position="4"> Table 2 summarises the results. We extract 3586 verb lemmas and 10969 unique verbal semantic form types (lemma followed by non-empty argument list). Including prepositions associated with the subcategorised OBLs and particles, this number goes up to 14348. The number of unique frame types (without lemma) is 38 without specific prepositions and particles, 577 with. F-structure annotations allow us to distinguish passive and active frames. Table 3 shows the most frequent semantic forms for accept. Passive frames are marked p. We carried out a comprehensive evaluation of the automatically acquired verbal semantic forms against the COMLEX Resource (Macleod et al., 1994) for the 2992 active verb lemmas that both resources have in common. We report on the evaluation of GF-based frames for the full frames with complete prepositional and particle infomation. We use relative conditional probability thresholds (1% and 5%) to filter the selection of semantic forms (Table 4). (O'Donovan et al., 2004) provide a more detailed description of the extraction and evaluation of semantic forms.</Paragraph>
    <Paragraph position="5">  We further acquire finite approximations of FUequations. by extracting paths between co-indexed material occurring in the automatically generated f-structures from sections 02-21 of the Penn-II treebank. We extract 26 unique TOPIC, 60 TOPIC-REL and 13 FOCUS path types (with a total of 14,911 token occurrences), each with an associated probability. We distinguish between two types of TOPIC-REL paths, those that occur in wh-less constructions, and all other types (c.f Table 5). Given a path p and an LDD type t (either TOPIC, TOPIC-REL or FO-CUS), the probability of p given t is estimated as:</Paragraph>
    <Paragraph position="7"> In order to get a first measure of how well the approximation models the data, we compute the path types in section 23 not covered by those extracted from 02-21: 23/(02-21). There are 3 such path types (Table 6), each occuring exactly once. Given that the total number of path tokens in section 23 is 949, the finite approximation extracted from 02-23 covers 99.69% of all LDD paths in section 23.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 Resolving LDDs in F-Structure
</SectionTitle>
    <Paragraph position="0"> Given a set of semantic forms s with probabilities P(s|l) (where l is a lemma), a set of paths p with P(p|t) (where t is either TOPIC, TOPIC-REL or FO-CUS) and an f-structure f, the core of the algorithm to resolve LDDs recursively traverses f to: find TOPIC|TOPIC-REL|FOCUS:g pair; retrieve TOPIC|TOPIC-REL|FOCUS paths; for each path p with GF1 : ... : GFn : GF, traverse f along GF1 : . . . : GFn to sub-f-structure h; retrieve local PRED:l; add GF:g to h iff  [?] h together with GF is locally complete and coherent with respect to a semantic form s for l rank resolution by P(s|l) x P(p|t) The algorithm supports multiple, interacting TOPIC, TOPIC-REL and FOCUS LDDs. We use P(s|l) x P(p|t) to rank a solution, depending on how likely the PRED takes semantic frame s, and how likely the TOPIC, FOCUS or TOPIC-REL is resolved using path p. The algorithm also supports resolution of LDDs where no overt linguistic material introduces a source TOPIC-REL function (e.g. in reduced relative clause constructions). We distinguish between passive and active constructions, using the relevant semantic frame type when resolving LDDs.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
8 Experiments and Evaluation
</SectionTitle>
    <Paragraph position="0"> We ran experiments with grammars in both the pipeline and the integrated parsing architectures.</Paragraph>
    <Paragraph position="1"> The first grammar is a basic PCFG, while A-PCFG includes the f-structure annotations. We apply a parent transformation to each grammar (Johnson, 1999) to give P-PCFG and PA-PCFG. We train on sections 02-21 (grammar, lexical extraction and LDD paths) of the Penn-II Treebank and test on section 23. The only pre-processing of the trees that we do is to remove empty nodes, and remove all Penn-II functional tags in the integrated model. We evaluate the parse trees using evalb. Following (Riezler et al., 2002), we convert f-structures into dependency triple format. Using their software we evaluate the f-structure parser output against:  3. A subset of 560 dependency structures of the PARC 700 Dependency Bank following (Ka null plan et al., 2004) The results are given in Table 7. The parenttransformed grammars perform best in both architectures. In all cases, there is a marked improvement (2.07-6.36%) in the f-structures after LDD resolution. We achieve between 73.78% and 80.97% preds-only and 83.79% to 87.04% all GFs f-score, depending on gold-standard. We achieve between 77.68% and 80.24% against the PARC 700 following the experiments in (Kaplan et al., 2004). For details on how we map the f-structures produced by our parsers to a format similar to that of the PARC 700 Dependency Bank, see (Burke et al., 2004). Table 8 shows the evaluation result broken down by individual GF (preds-only) for the integrated model PA-PCFG against the DCU 105. In order to measure how many of the LDD reentrancies in the gold-standard f-structures are captured correctly by our parsers, we developed evaluation software for f-structure LDD reentrancies (similar to Johnson's (2002) evaluation to capture traces and their antecedents in trees). Table 9 shows the results with the integrated model achieving more than 76% correct LDD reentrancies.</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
9 Related Work
</SectionTitle>
    <Paragraph position="0"> (Collins, 1999)'s Model 3 is limited to wh-traces in relative clauses (it doesn't treat topicalisation, focus etc.). Johnson's (2002) work is closest to ours in spirit. Like our approach he provides a finite approximation of LDDs. Unlike our approach, however, he works with tree fragments in a post-processing approach to add empty nodes and their  DEP. PRECISION RECALL F-SCORE adjunct 717/903 = 79 717/947 = 76 78 app 14/15 = 93 14/19 = 74 82 comp 35/43 = 81 35/65 = 54 65 coord 109/143 = 76 109/161 = 68 72 det 253/264 = 96 253/269 = 94 95 focus 1/1 = 100 1/1 = 100 100 obj 387/445 = 87 387/461 = 84 85 obj2 0/1 = 0 0/2 = 0 0 obl 27/56 = 48 27/61 = 44 46 obl2 1/3 = 33 1/2 = 50 40 obl ag 5/11 = 45 5/12 = 42 43 poss 69/73 = 95 69/81 = 85 90 quant 40/55 = 73 40/52 = 77 75 relmod 26/38 = 68 26/50 = 52 59 subj 330/361 = 91 330/414 = 80 85 topic 12/12 = 100 12/13 = 92 96 topicrel 35/42 = 83 35/52 = 67 74 xcomp 139/160 = 87 139/146 = 95 91</Paragraph>
  </Section>
  <Section position="10" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DCU 105
</SectionTitle>
    <Paragraph position="0"> antecedents to parse trees, while we present an approach to LDD resolution on the level of f-structure.</Paragraph>
    <Paragraph position="1"> It seems that the f-structure-based approach is more abstract (99 LDD path types against approximately 9,000 tree-fragment types in (Johnson, 2002)) and fine-grained in its use of lexical information (subcat frames). In contrast to Johnson's approach, our LDD resolution algorithm is not biased. It computes all possible complete resolutions and orderranks them using LDD path and subcat frame probabilities. It is difficult to provide a satisfactory comparison between the two methods, but we have carried out an experiment that compares them at the f-structure level. We take the output of Charniak's  resolution to (Johnson, 2002) on the DCU 105 parser (Charniak, 1999) and, using the pipeline f-structure annotation model, evaluate against the DCU 105, both before and after LDD resolution.</Paragraph>
    <Paragraph position="2"> Using the software described in (Johnson, 2002) we add empty nodes to the output of Charniak's parser, pass these trees to our automatic annotation algorithm and evaluate against the DCU 105. The results are given in Table 10. Our method of resolving LDDs at f-structure level results in a preds-only f-score of 80.97%. Using (Johnson, 2002)'s method of adding empty nodes to the parse-trees results in an f-score of 79.75%.</Paragraph>
    <Paragraph position="3"> (Hockenmaier, 2003) provides CCG-based models of LDDs. Some of these involve extensive clean-up of the underlying Penn-II treebank resource prior to grammar extraction. In contrast, in our approach we leave the treebank as is and only add (but never correct) annotations. Earlier HPSG work (Tateisi et al., 1998) is based on independently constructed hand-crafted XTAG resources. In contrast, we acquire our resources from treebanks and achieve substantially wider coverage.</Paragraph>
    <Paragraph position="4"> Our approach provides wide-coverage, robust, and - with the addition of LDD resolution - &amp;quot;deep&amp;quot; or &amp;quot;full&amp;quot;, PCFG-based LFG approximations. Crucially, we do not claim to provide fully adequate statistical models. It is well known (Abney, 1997) that PCFG-type approximations to unification grammars can yield inconsistent probability models due to loss of probability mass: the parser successfully returns the highest ranked parse tree but the constraint solver cannot resolve the f-equations (generated in the pipeline or &amp;quot;hidden&amp;quot; in the integrated model) and the probability mass associated with that tree is lost. This case, however, is surprisingly rare for our grammars: only 0.0018% (85 out of 48424) of the original Penn-II trees (without FRAGs) fail to produce an f-structure due to inconsistent annotations (Table 1), and for parsing section 23 with the integrated model (A-PCFG), only 9 sentences do not receive a parse because no f-structure can be generated for the highest ranked tree (0.4%). Parsing with the pipeline model, all sentences receive one complete f-structure. Research on adequate probability models for unification grammars is important. (Miyao et al., 2003) present a Penn-II tree-bank based HPSG with log-linear probability models. They achieve coverage of 50.2% on section 23, as against 99% in our approach. (Riezler et al., 2002; Kaplan et al., 2004) describe how a carefully hand-crafted LFG is scaled to the full Penn-II treebank with log-linear based probability models.</Paragraph>
    <Paragraph position="5"> They achieve 79% coverage (full parse) and 21% fragement/skimmed parses. By the same measure, full parse coverage is around 99% for our automatically acquired PCFG-based LFG approximations.</Paragraph>
    <Paragraph position="6"> Against the PARC 700, the hand-crafted LFG grammar reported in (Kaplan et al., 2004) achieves an f-score of 79.6%. For the same experiment, our best automatically-induced grammar achieves an f-score of 80.24%.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML