File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/j98-4004_concl.xml
Size: 3,656 bytes
Last Modified: 2025-10-06 13:58:03
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-4004"> <Title>PCFG Models of Linguistic Tree Representations</Title> <Section position="8" start_page="629" end_page="631" type="concl"> <SectionTitle> 6. Conclusion </SectionTitle> <Paragraph position="0"> This paper has presented theoretical and empirical evidence that the choice of tree representation can make a significant difference to the performance of a PCFG-based parsing system. What makes a tree representation a good choice for PCFG modeling seems to be quite different to what makes it a good choice for a representation of a linguistic theory. In conventional linguistic theories the choice of rules, and hence trees, The effects of selective application of the Parent transform. Each point corresponds to a PCFG induced after selective application of the Parent transform. The point labeled All corresponds to the PCFG induced after the Parent transform to all nonroot nonterminal nodes, as before. Points labeled with a single category A correspond to PCFGs induced after applying the Parent transform to just those nodes labeled A, while points labeled with a pair of categories A^B correspond to PCFGs induced applying the Parent transform to nodes labeled A with parents labeled B. (Some labels are elided to make the remaining labels legible). The x-axis shows the difference in number of productions in the PCFG after selective parent transform and the untransformed treebank PCFG, and the y-axis shows the difference in the average of the precision and recall scores.</Paragraph> <Paragraph position="1"> is usually influenced by considerations of parsimony; thus the Chomsky adjunction representation of PP modification may be preferred because it requires only a single context-flee rule, rather than a rule schema abbreviating a potentially unbounded number of rules that would be required in flat tree representations of adjunction. But in a PCFG model the additional nodes required by the Chomsky adjunction representation represent independence assumptions that seem not to be justified. In general, in selecting a tree structure one faces a bias/variance trade-off, in that tree structures with fewer nodes and/or richer node labels reduce bias, but possibly at the expense of an increase in variance. A tree transformation/detransformation methodology for empirically evaluating the effect of different tree representations on parsing systems was developed in this paper. The results presented earlier show that the tree representations that incorporated weaker independence assumptions performed signficantly better in the empirical studies than the more linguistically motivated Chomsky adjunction structures.</Paragraph> <Paragraph position="2"> Of course, there is nothing particularly special about the particular tree transformations studied in this paper: other transforms could--and should--be studied in exactly the same manner. For example, I am currently using this methodology to study the interaction between tree structure and a &quot;slash category&quot; node labeling in tree representations with empty categories (Gazdar et al. 1985). While the work presented here focussed on PCFG parsing models, it seems that the general transformation/detransformation approach can be applied to a wider range of prob- null Computational Linguistics Volume 24, Number 4 lems. For example, it would be interesting to know to what extent the performance of more sophisticated parsing systems, such as those described by Collins (1996) and Charniak (1997), depends on the particular tree representations they are trained on.</Paragraph> </Section> class="xml-element"></Paper>