File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2002_metho.xml
Size: 7,914 bytes
Last Modified: 2025-10-06 14:09:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2002"> <Title>Robust Models of Human Parsing</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Probabilistic Parsing Models </SectionTitle> <Paragraph position="0"> In computational linguistics, probabilistic approaches to language processing play a central role. Significant advances toward robust, broad-coverage parsing models have been made based on probabilistic techniques such as maximum likelihood estimation or expectation maximization (for an overview, see Manning and Sch&quot;utze, 1999).</Paragraph> <Paragraph position="1"> An example of a simple probabilistic parsing model are probabilistic context-free grammars (PCFGs), which extend the formalism of context-free grammars (CFGs) by annotating each rule with a probability. PCFGs constitute an efficient, well-understood technique for assigning probabilities to the analyses produced by a context-free grammar.</Paragraph> <Paragraph position="2"> They are commonly used for broad-coverage grammars, as CFGs large enough to parse unrestricted text are typically highly ambiguous, i.e., a single sentence will receive a large number of parses. The probabilistic component of the grammar can then be used to rank the analyses a sentence might receive, and improbable ones can be eliminated.</Paragraph> <Paragraph position="3"> In the computational linguistics literature, a number of highly successful extensions to the basic PCFG model have been proposed. Of particular interest are lexicalized parsing models such as the ones developed by Collins (1996, 1997) and Carroll and Rooth (1998).</Paragraph> <Paragraph position="4"> In the human parsing literature, a PCFG-based model has been proposed by Jurafsky (1996) and Narayanan and Jurafsky (1998). This model shows how different sources of probabilistic information (such as subcategorization information and rule frequencies) can be combined using Bayesian inference. The model accounts for a range of disambiguation phenomena in linguistic processing. However, the model is only small scale, and it is not clear if it can be extended to provide robustness and coverage of unrestricted text.</Paragraph> <Paragraph position="5"> This problem is addressed by Brants and Crocker (2000) and Crocker and Brants (2000), who propose a broad-coverage model of human parsing based on PCFGs. This model is incremental, i.e., it makes word-by-word predictions, thus mimicking the behavior of the human parser. Also, Brants and Crocker's (2000) model imposes memory restrictions on the parser that are inspired by findings from the human sentence processing literature.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Robust Models of Human Parsing </SectionTitle> <Paragraph position="0"> The main weakness of both the Narayanan/Jurafsky and the Crocker/Brants model (discussed in the previous section) is that they have not been evaluated systematically. The authors only describe the performance of their models on a small set of hand-picked example sentences. No attempts are made to test the models against a full set of experimental materials and the corresponding reading times, even though a large amount of suitable data are available in the literature. This makes it very hard to obtain a realistic estimate of how well these models achieve the aim of providing robust, broad coverage models of human parsing. This can only be assessed by testing the models against realistic samples of unrestricted text or speech obtained from corpora.</Paragraph> <Paragraph position="1"> In this talk, we will present work that aims to perform such an evaluation. We train a series of increasingly sophisticated probabilistic parsing models on an identical training set (the Penn Treebank). These models include a standard unlexicalized PCFG parser, a head-lexicalized parser (Collins, 1997), and a maximum-entropy inspired parser (Charniak, 2000). We test all three models on the Embra corpus, a corpus of newspaper texts annotated with eye-tracking data from 23 subjects (McDonald and Shillcock, 2003). A series of regression analyses are conducted to determine if per-sentence reading time measures correlate with sentence probabilities predicted by the parsing models.</Paragraph> <Paragraph position="2"> Three baseline models are also included in the evaluation: word frequency, bigram and trigram probability (as predicted by a language model), and part of speech (POS) probability (as predicted by a POS tagger). Models based on n-grams have already been used successfully to model eye-tracking data, both on a word-by-word basis (McDonald and Shillcock, 2003) and for whole sentences (Keller, 2004).</Paragraph> <Paragraph position="3"> Our results show that for all three parsing models, sentence probability is significantly correlated with reading times measures. However, the models differ as to whether they predict early or late measures: the PCFG and the Collins model significantly predict late reading time measures (total time and gaze duration), but not early measures (first fixation time and skipping rate). The Charniak model is able to significantly predict both early and late measures.</Paragraph> <Paragraph position="4"> An analysis of the baseline models shows that word frequency and POS probability only predict early measures, while bigram and trigram probability only predict late measures. This indicates that the Charniak model is able to predict both early and late measures because it successfully combines lexical information (word frequencies and POS probabilities) with phrasal information (as modeled by a PCFG). This finding is in line with Charniak's own analysis, which shows that the high performance of his model is due to the fact that it combines a third-order Markov grammar with sophisticated phrasal and lexical features (Charniak, 2000).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Implications </SectionTitle> <Paragraph position="0"> The results reported in the previous section have interesting theoretical implications. Firstly, there is a methodological lesson here: simple baseline models based on n-gram or POS probabilities perform surprisingly well as robust, broad coverage models of human language processing. This is an important point that has not been recognized in the literature, as previous models have not been tested on realistic corpus samples, and have not been compared to plausible baselines.</Paragraph> <Paragraph position="1"> A second point concerns the role of lexical information in human parsing. We found that the best performing model was Charniak's maximum entropy-inspired parser, which combines lexical and phrasal information, and manages to predict both early and late eye-tracking measures. A number of existing theories of human parsing incorporate lexical information (MacDonald et al., 1994; MacDonald, 1994), but have so far failed to demonstrate how the use of such information can be scaled up to yield robust, broad coverage parsing models that can be tested on realistic data such as the Embra eye-tracking corpus.</Paragraph> <Paragraph position="2"> Finally, a major challenge that remains is the crosslinguistic aspect of human parsing. Virtually all existing computational models have only been implemented and tested for English data. However, a wide range of interesting problems arise for other languages. An examples are head-final languages, in which the probabilistic information associated with the head becomes available only at the end of the phrase, which poses a potential problem for incremental parsing models. Some initial results on a limited dataset have been obtained by Baldewein and Keller (2004) for head-final constructions in German.</Paragraph> </Section> class="xml-element"></Paper>