File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/p99-1035_concl.xml

Size: 3,052 bytes

Last Modified: 2025-10-06 13:58:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1035">
  <Title>Inside-Outside Estimation of a Lexicalized PCFG for German</Title>
  <Section position="8" start_page="274" end_page="275" type="concl">
    <SectionTitle>
7 Conclusion
</SectionTitle>
    <Paragraph position="0"> Our principal result is that scrambling-style free-er phrase order, case morphology and subcategorization, and NP-internal gender, number and case agreement can be dealt with in a head-lexicalized PFCG formedism by means of carefully designed categories and rules which limit the size of the packed parse forest and give desirable pooling of parameters. Hedging this, we point out that we made compromises in the grammar (notably, in not enforcing nominativeverb agreement) in order to control the number of categories, rules, and parameters.</Paragraph>
    <Paragraph position="1"> A second result is that iterative lexicalized inside-outside estimation appears to ,be beneficial, although the precision/recall increments are small. We believe this is the first substantial investigation of the utility of iterative lexicalized inside-outside estimation of a lexicalized probabilistic grammar involving a carefully built grammar where parses can be evaluated by linguistic criteria.</Paragraph>
    <Paragraph position="2"> A third result is that using too many unlexicalized iterations (more than two) is detrimental. A criterion using cross-entropy overtraining  on held-out data dictates many more unlexicalized iterations, and this criterion is therefore inappropriate. null Finally, we have clear cases of lexicalized EM estimation being stuck in linguistically bad states. As far as we know, the model which gave the best results could also be stuck in a comparatively bad state. We plan to experiment with other lexicalized training regimes, such as ones which alternate between different training corpora. null The experiments are made possible by improvements in parser and hardware speeds, the carefully built grammar, and evaluation tools.</Paragraph>
    <Paragraph position="3"> In combination, these provide a unique environment for investigating training regimes for lexicalized PCFGs. Much work remains to be done in this area, and we feel that we are just beginning to develop understanding of the time course of parameter estimation, and of the general efficacy of EM estimation of lexicalized PCFGs as evaluated by linguistic criteria.</Paragraph>
    <Paragraph position="4"> We believe our current grammar of German could be extended to a robust free-text chunk/phrase grammar in the style of the English grammar of Carroll and Rooth (1998) with about a month's work, and to a free-text grammar treating verb-second clauses and additional complementation structures (notably extraposed clausal complements) with about one year of additional grammar development and experiment. These increments in the grammar could easily double the number of rules. However this would probably not pose a problem for the parsing and estimation software.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML