File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/j01-2004_concl.xml
Size: 4,215 bytes
Last Modified: 2025-10-06 13:53:00
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-2004"> <Title>Probabilistic Top-Down Parsing and Language Modeling</Title> <Section position="8" start_page="271" end_page="273" type="concl"> <SectionTitle> 6. Conclusion and Future Directions </SectionTitle> <Paragraph position="0"> The empirical results presented above are quite encouraging, and the potential of this kind of approach both for parsing and language modeling seems very promising.</Paragraph> <Paragraph position="1"> O&quot; ..... ' ,' ....... .....</Paragraph> <Paragraph position="2"> :-- ...... , .......... ? .......... , , , - 10 -9 -8 -7 -6 -5 -4 -3 lOgjo of base beam factor Figure 8 Increase in average precision/recall error, model perplexity, interpolated perplexity, and efficiency (i.e., decrease in rule expansions per word) as base beam factor decreases. With a simple conditional probability model, and simple statistical search heuristics, we were able to find very accurate parses efficiently, and, as a side effect, were able to assign word probabilities that yield a perplexity improvement over previous results. These perplexity improvements are particularly promising, because the parser is providing information that is, in some sense, orthogonal to the information provided by a trigram model, as evidenced by the robust improvements to the baseline trigram when the two models are interpolated.</Paragraph> <Paragraph position="3"> There are several important future directions that will be taken in this area. First, there is reason to believe that some of the conditioning information is not uniformly useful, and we would benefit from finer distinctions. For example, the probability of a preposition is presumably more dependent on a c-commanding head than the probability of a determiner is. Yet in the current model they are both conditioned on that head, as leftmost constituents of their respective phrases. Second, there are advantages to top-down parsing that have not been examined to date, e.g., empty categories. A top-down parser, in contrast to a standard bottom-up chart parser, has enough information to predict empty categories only where they are likely to occur. By including these nodes (which are in the original annotation of the Penn Treebank), we may be able to bring certain long-distance dependencies into a local focus. In addition, as mentioned above, we would like to further test our language model in speech recognition tasks, to see if the perplexity improvement that we have seen can lead to significant reductions in word error rate.</Paragraph> <Paragraph position="4"> Other parsing approaches might also be used in the way that we have used a top-down parser. Earley and left-corner parsers, as mentioned in the introduction, also have rooted derivations that can be used to calculated generative string prefix proba- null Computational Linguistics Volume 27, Number 2 bilities incrementally. In fact, left-corner parsing can be simulated by a top-down parser by transforming the grammar, as was done in Roark and Johnson (1999), and so an approach very similar to the one outlined here could be used in that case. Perhaps some compromise between the fully connected structures and extreme underspecification will yield an efficiency improvement. Also, the advantages of head-driven parsers may outweigh their inability to interpolate with a trigram, and lead to better off-line language models than those that we have presented here.</Paragraph> <Paragraph position="5"> Does a parsing model capture exactly what we need for informed language modeling? The answer to that is no. Some information is simply not structural in nature (e.g., topic), and we might expect other kinds of models to be able to better handle nonstructural dependencies. The improvement that we derived from interpolating the different models above indicates that using multiple models may be the most fruitful path in the future. In any case, a parsing model of the sort that we have presented here should be viewed as an important potential source of key information for speech recognition. Future research will show if this early promise can be fully realized.</Paragraph> </Section> class="xml-element"></Paper>