File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-1022_concl.xml
Size: 3,556 bytes
Last Modified: 2025-10-06 13:55:06
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1022"> <Title>Multilevel Coarse-to-fine PCFG Parsing</Title> <Section position="6" start_page="173" end_page="174" type="concl"> <SectionTitle> 5 Conclusion and Future Research </SectionTitle> <Paragraph position="0"> We have presented a novel parsing algorithm based upon the coarse-to-fine processing model.</Paragraph> <Paragraph position="1"> Several aspects of the method recommend it.</Paragraph> <Paragraph position="2"> First, unlike methods that depend on best-first search, the method is &quot;holistic&quot; in its evaluation of constituents. For example, consider the impact of parent labeling. It has been repeatedly shown to improve parsing accuracy (Johnson, 1998; Charniak, 2000; Klein and Manning, 2003b), but it is difficult if not impossible to integrate with best-first search in bottom-up chart-parsing (as in Charniak et al. (1998)). The reason is that when working bottom up it is difficult to determine if, say, ssbar is any more or less likely than ss, as the evidence, working bottom up, is negligible. Since our method computes the exact outside probability of constituents (albeit at a coarser level) all of the top down information is available to the system. Or again, another very useful feature in English parsing is the knowledge that a constituent ends at the right boundary (minus punctuation) of a string.</Paragraph> <Paragraph position="3"> Thiscanbeincludedonlyinanad-hocwaywhen working bottom up, but could be easily added here.</Paragraph> <Paragraph position="4"> Many aspects of the current implementation that are far from optimal. It seems clear to us that extracting the maximum benefit from our pruning would involve taking the unpruned constituents and specifying them in all possible ways allowed by the next level of granularity.</Paragraph> <Paragraph position="5"> What we actually did is to propose all possible constituents at the next level, and immediately rule out those lacking a corresponding constituent remaining at the previous level. This was dictated by ease of implementation. Before using mlctf parsing in a production parser, the other method should be evaluated to see if our intuitions of greater efficiency are correct.</Paragraph> <Paragraph position="6"> It is also possible to combine mlctf parsing with queue reordering methods. The best-first search method of Charniak et al. (1998) estimates Equation 1. Working bottom up, estimating the inside probability is easy (we just sum the probability of all the trees found to build this constituent). All the cleverness goes into estimating the outside probability. Quite clearly the current method could be used to provide a more accurate estimate of the outside probability, namely the outside probability at the coarser level of granularity.</Paragraph> <Paragraph position="7"> There is one more future-research topic to add before we stop, possibly the most interesting of all. The particular tree of coarser to finer constituents that governs our mlctf algorithm (Figure 1) was created by hand after about 15 minutes of reflection and survives, except for typos, with only two modifications. There is no rea- null son to think it is anywhere close to optimal. It should be possible to define &quot;optimal&quot; formally and search for the best mlctf constituent tree.</Paragraph> <Paragraph position="8"> This would be a clustering problem, and, fortunately, one thing statistical NLP researchers know how to do is cluster.</Paragraph> </Section> class="xml-element"></Paper>