File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1007_metho.xml
Size: 23,785 bytes
Last Modified: 2025-10-06 14:10:16
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1007"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics A Finite-State Model of Human Sentence Processing</Title> <Section position="5" start_page="49" end_page="51" type="metho"> <SectionTitle> 3 Experiments </SectionTitle> <Paragraph position="0"> In the current study, Corley and Crocker's model is further tested on a wider range of so-called structural ambiguity types. A Hidden Markov Model POS tagger based on bigrams was used.</Paragraph> <Paragraph position="1"> We made our own implementation to be sure of getting as close as possible to the design of Corley and Crocker (2000). Given a word string, w0,w1,***,wn, the tagger calculates the probability of every possible tag path, t0,***,tn. Under the Markov assumption, the joint probability of the given word sequence and each possible POS sequence can be approximated as a product of conditional probability and transition probability as shown in (1).</Paragraph> <Paragraph position="3"> Using the Viterbi algorithm (Viterbi, 1967), the tagger finds the most likely POS sequence for a given word string as shown in (2).</Paragraph> <Paragraph position="4"> (2) argmaxP(t0,t1,***,tn|w0,w1,***,wn,u).</Paragraph> <Paragraph position="5"> This is known technology, see Manning and Sch&quot;utze (1999), but the particular use we make of it is unusual. The tagger takes a word string as an input, outputs the most likely POS sequence and the final probability. Additionally, it presents accumulated probability at each word break and probability re-ranking, if any. Note that the running probability at the beginning of a sentence will be 1, and will keep decreasing at each word break since it is a product of conditional probabilities.</Paragraph> <Paragraph position="6"> We tested the predictability of the model on empirical reading data with the probability decrease and the presence or absence of probability reranking. Adopting the standard experimental design used in human sentence processing studies, where word-by-word reading time or eye-fixation time is compared between an experimental sentence and its control sentence, this study compares probability at each word break between a pair of sentences. Comparatively faster or larger drop of probability is expected to be a good indicator of comparative processing difficulty. Probability reranking, which is a simplified model of the reanalysis process assumed in many human studies, is also tested as another indicator of garden-path effect. Given a word string, all the possible POS sequences compete with each other based on their probability. Probability re-ranking occurs when an initially dispreferred POS sub-sequence becomes the preferred candidate later in the parse, because it fits in better with later words.</Paragraph> <Paragraph position="7"> The model parameters, P(wi|ti) and P(ti|ti[?]1), are estimated from a small section (970,995 tokens,47,831 distinct words) of the British National Corpus (BNC), which is a 100 million-word collection of British English, both written and spoken, developed by Oxford University Press (Burnard, 1995). The BNC was chosen for training the model because it is a POS-annotated corpus, which allows supervised training. In the implementation we use log probabilities to avoid underflow, and we report log probabilities in the sequel.</Paragraph> <Section position="1" start_page="49" end_page="50" type="sub_section"> <SectionTitle> 3.1 Hypotheses </SectionTitle> <Paragraph position="0"> If the HSPM is affected by frequency information, we can assume that it will be easier to process events with higher frequency or probability compared to those with lower frequency or probability. Under this general assumption, the overall difficulty of a sentence is expected to be measured or predicted by the mean size of probability decrease. That is, probability will drop faster in garden-path sentences than in control sentences (e.g. unambiguous sentences or ambiguous but non-garden-path sentences).</Paragraph> <Paragraph position="1"> More importantly, the probability decrease pattern at disambiguating regions will predict the trends in the reading time data. All other things being equal, we might expect a reading time penalty when the size of the probability decrease at the disambiguating region in garden-path sentences is greater compared to the control sentences. This is a simple and intuitive assumption that can be easily tested. We could have formed the sum over all possible POS sequences in association with the word strings, but for the present study we simply used the Viterbi path: justifying this because this is the best single-path approximation to the joint probability.</Paragraph> <Paragraph position="2"> Lastly, re-ranking of POS sequences is expected to predict reanalysis of lexical categories. This is because re-ranking in the tagger is parallel to reanalysis in human subjects, which is known to be cognitively costly.</Paragraph> </Section> <Section position="2" start_page="50" end_page="51" type="sub_section"> <SectionTitle> 3.2 Materials </SectionTitle> <Paragraph position="0"> In this study, five different types of ambiguity were tested including Lexical Category ambiguity, Reduced Relative ambiguity (RR ambiguity), Prepositional Phrase Attachment ambiguity (PP ambiguity), Direct-Object/Sentential-Complement ambiguity (DO/SC ambiguity), and Clausal Boundary ambiguity. The following are example sentences for each ambiguity type, shown with the ambiguous region italicized and the disambiguating region bolded. All of the example sentences are garden-path sentneces.</Paragraph> <Paragraph position="1"> (3) Lexical Category ambiguity The foreman knows that the warehouse prices the beer very modestly.</Paragraph> <Paragraph position="2"> (4) RR ambiguity The horse raced past the barn fell.</Paragraph> <Paragraph position="3"> (5) PP ambiguity Katie laid the dress on the floor onto the bed. (6) DO/SC ambiguity He forgot Pam needed a ride with him.</Paragraph> <Paragraph position="4"> (7) Clausal Boundary ambiguity Though George kept on reading the story really bothered him.</Paragraph> <Paragraph position="5"> There are two types of control sentences: unambiguous sentences and ambiguous but non-garden-path sentences as shown in the examples below. Again, the ambiguous region is italicized and the disambiguating region is bolded.</Paragraph> <Paragraph position="6"> The horse that was raced past the barn fell. Note that the garden-path sentence (8) and its ambiguous control sentence (9) share exactly the same word sequence except for the disambiguating region. This allows direct comparison of probability at the critical region (i.e. disambiguating region) between the two sentences. Test materials used in experimental studies are constructed in this way in order to control extraneous variables such as word frequency. We use these sentences in the same form as the experimentalists so we inherit their careful design.</Paragraph> <Paragraph position="7"> In this study, a total of 76 sentences were tested: 10 for lexical category ambiguity, 12 for RR ambiguity, 20 for PP ambiguity, 16 for DO/SC ambiguity, and 18 for clausal boundary ambiguity. This set of materials is, to our knowledge, the most comprehensive yet subjected to this type of study. The sentences are directly adopted from various psycholinguistic studies (Frazier, 1978; Trueswell, 1996; Frazier and Clifton, 1996; Ferreira and Clifton, 1986; Ferreira and Henderson, 1986).</Paragraph> <Paragraph position="8"> As a baseline test case of the tagger, the well-established asymmetry between subject- and object-relative clauses was tested as shown in (11). (11) a. The editor who kicked the writer fired the entire staff. (Subject-relative) b. The editor who the writer kicked fired the entire staff. (Object-relative) The reading time advantage of subject-relative clauses over object-relative clauses is robust in English (Traxler et al., 2002) as well as other languages (Mak et al., 2002; Homes et al., 1981). For this test, materials from Traxler et al. (2002) (96 sentences) are used.</Paragraph> </Section> </Section> <Section position="6" start_page="51" end_page="53" type="metho"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="51" end_page="51" type="sub_section"> <SectionTitle> 4.1 The Probability Decrease per Word </SectionTitle> <Paragraph position="0"> Unambiguous sentences are usually longer than garden-path sentences. To compare sentences of different lengths, the joint probability of the whole sentence and tags was divided by the number of words in the sentence. The result showed that the average probability decrease was greater in garden-path sentences compared to their unambiguous control sentences. This indicates that garden-path sentences are more difficult than un-ambiguous sentences, which is consistent with empirical findings.</Paragraph> <Paragraph position="1"> Probability decreased faster in object-relative sentences than in subject relatives as predicted.</Paragraph> <Paragraph position="2"> In the psycholinguistics literature, the comparative difficulty of object-relative clauses has been explained in terms of verbal working memory (King and Just, 1991), distance between the gap and the filler (Bever and McElree, 1988), or perspective shifting (MacWhinney, 1982). However, the test results in this study provide a simpler account for the effect. That is, the comparative difficulty of an object-relative clause might be attributed to its less frequent POS sequence. This account is particularly convincing since each pair of sentences in the experiment share the exactly same set of words except their order.</Paragraph> </Section> <Section position="2" start_page="51" end_page="52" type="sub_section"> <SectionTitle> 4.2 Probability Decrease at the Disambiguating Region </SectionTitle> <Paragraph position="0"> A total of 30 pairs of a garden-path sentence and its ambiguous, non-garden-path control were tested for a comparison of the probability decrease at the disambiguating region. In 80% of the cases, the probability drops more sharply in garden-path sentences than in control sentences at the critical word. The test results are presented in (12) with the number of test sets for each ambiguous type and the number of cases where the model correctly predicted reading-time penalty of garden-path sentences. null The two graphs in Figure 1 illustrate the comparison of probability decrease between a pair of sentence. The y-axis of both graphs in Figure 1 is log probability. The first graph compares the probability drop for the prepositional phrase (PP) attachment ambiguity (Katie put the dress on the floor and/onto the bed....) The empirical result for this type of ambiguity shows that reading time penalty is observed when the second PP, onto the bed, is introduced, and there is no such effect for the other sentence. Indeed, the sharper probability drop indicates that the additional PP is less likely, which makes a prediction of a comparative processing difficulty. The second graph exhibits the probability comparison for the DO/SC ambiguity.</Paragraph> <Paragraph position="1"> The verb forget is a DO-biased verb and thus processing difficulty is observed when it has a sentential complement. Again, this effect was replicated here.</Paragraph> <Paragraph position="2"> The results showed that the disambiguating word given the previous context is more difficult in garden-path sentences compared to control sentences. There are two possible explanations for the processing difficulty. One is that the POS sequence of a garden-path sentence is less probable than that of its control sentence. The other account is that the disambiguating word in a garden-path sentence is a lower frequency word compared to that of its control sentence.</Paragraph> <Paragraph position="3"> For example, slower reading time was observed in (13a) and (14a) compared to (13b) and (14b) at the disambiguating region that is bolded.</Paragraph> <Paragraph position="4"> (13) Different POS at the Disambiguating Region a. Katie laid the dress on the floor onto ([?]57.80) the bed.</Paragraph> <Paragraph position="5"> b. Katie laid the dress on the floor after ([?]55.77) her mother yelled at her.</Paragraph> <Paragraph position="6"> (14) Same POS at the Disambiguating Region a. The umpire helped the child on ([?]42.77) third base.</Paragraph> <Paragraph position="7"> b. The umpire helped the child to ([?]42.23) third base.</Paragraph> <Paragraph position="8"> The log probability for each disambiguating word is given at the end of each sentence. As expected, the probability at the disambiguating region in (13a) and (14a) is lower than in (13b) and (14b) respectively. The disambiguating words in (13) have different POS's; Preposition in (13a) and Conjunction (13b). This suggests that the probabilities of different POS sequences can account for different reading time at the region. In (14), however, both disambiguating words are the same POS (i.e. Preposition) and the POS sequences for both sentences are identical. Instead, &quot;on&quot; and &quot;to&quot;, have different frequencies and this information is reflected in the conditional probability P(wordi|state). Therefore, the slower reading time in (14b) might be attributable to the lower frequency of the disambiguating word, &quot;to&quot; compared to &quot;on&quot;.</Paragraph> </Section> <Section position="3" start_page="52" end_page="53" type="sub_section"> <SectionTitle> 4.3 Probability Re-ranking </SectionTitle> <Paragraph position="0"> The probability re-ranking reported in Corley and Crocker (2000) was replicated. The tagger successfully resolved the ambiguity by reanalysis when the ambiguous word was immediately followed by the disambiguating word (e.g. Without her he was lost.). If the disambiguating word did not immediately follow the ambiguous region, (e.g. Without her contributions would be very inadequate.) the ambiguity is sometimes incorrectly resolved.</Paragraph> <Paragraph position="1"> When revision occurred, probability dropped more sharply at the revision point and at the disambiguation region compared to the control sen- null tences. When the ambiguity was not correctly resolved, the probability comparison correctly modeled the comparative difficulty of the garden-path sentences Of particular interest in this study is RR ambiguity resolution. The tagger predicted the processing difficulty of the RR ambiguity with probability re-ranking. That is, the tagger initially favors the main-verb interpretation for the ambiguous -ed form, and later it makes a repair when the ambiguity is resolved as a past-participle.</Paragraph> <Paragraph position="2"> In the first graph of Figure 2, &quot;chased&quot; is resolved as a past participle also with a revision since the disambiguating word &quot;by&quot; is immediately following. When revision occurred, probability dropped more sharply at the revision point and at the disambiguation region compared to the control sentences. When the disambiguating word is not immediately followed by the ambiguous word as in the second graph of Figure 2, the ambiguity was not resolved correctly, but the probababiltiy decrease at the disambiguating regions correctly predict that the garden-path sentence would be harder.</Paragraph> <Paragraph position="3"> The RR ambiguity is often categorized as a syntactic ambiguity, but the results suggest that the ambiguity can be resolved locally and its processing difficulty can be detected by a finite state model. This suggests that we should be cautious in assuming that a structural explanation is needed for the RR ambiguity resolution, and it could be that similar cautions are in order for other ambiguities usually seen as syntactic.</Paragraph> <Paragraph position="4"> Although the probability re-ranking reported in the previous studies (Corley and Crocker, 2000; Frazier, 1978) is correctly replicated, the tagger sometimes made undesired revisions. For example, the tagger did not make a repair for the sentence The friend accepted by the man was very impressed (Trueswell, 1996) because accepted is biased as a past participle. This result is compatible with the findings of Trueswell (1996). However, the bias towards past-participle produces a repair in the control sentence, which is unexpected. For the sentence, The friend accepted the man who was very impressed, the tagger showed a repair since it initially preferred a past-participle analysis for accepted and later it had to reanalyze. This is a limitation of our model, and does not match any previous empirical finding.</Paragraph> </Section> </Section> <Section position="7" start_page="53" end_page="53" type="metho"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> The current study explores Corley and Crocker's model(2000) further on the model's account of human sentence processing data seen in empirical studies. Although there have been studies on a POS tagger evaluating it as a potential cognitive module of lexical category disambiguation, there has been little work that tests it as a modeling tool of syntactically ambiguous sentence processing.</Paragraph> <Paragraph position="1"> The findings here suggest that a statistical POS tagging system is more informative than Crocker and Corley demonstrated. It has a predictive power of processing delay not only for lexically ambiguous sentences but also for structurally garden-pathed sentences. This model is attractive since it is computationally simpler and requires few statistical parameters. More importantly, it is clearly defined what predictions can be and cannot be made by this model. This allows systematic testability and refutability of the model unlike some other probabilistic frameworks. Also, the model training and testing is transparent and observable, and true probability rather than transformed weights are used, all of which makes it easy to understand the mechanism of the proposed model.</Paragraph> <Paragraph position="2"> Although the model we used in the current study is not a novelty, the current work largely differs from the previous study in its scope of data used and the interpretation of the model for human sentence processing. Corley and Crocker clearly state that their model is strictly limited to lexical ambiguity resolution, and their test of the model was bounded to the noun-verb ambiguity. However, the findings in the current study play out differently. The experiments conducted in this study are parallel to empirical studies with regard to the design of experimental method and the test material. The garden-path sentences used in this study are authentic, most of them are selected from the cited literature, not conveniently coined by the authors. The word-by-word probability comparison between garden-path sentences and their controls is parallel to the experimental design widely adopted in empirical studies in the form of regionby-region reading or eye-gaze time comparison.</Paragraph> <Paragraph position="3"> In the word-by-word probability comparison, the model is tested whether or not it correctly predicts the comparative processing difficulty at the garden-path region. Contrary to the major claim made in previous empirical studies, which is that the garden-path phenomena are either modeled by syntactic principles or by structural frequency, the findings here show that the same phenomena can be predicted without such structural information.</Paragraph> <Paragraph position="4"> Therefore, the work is neither a mere extended application of Corley and Crocker's work to a broader range of data, nor does it simply confirm earlier observations that finite state machines might accurately account for psycholinguistic results to some degree. The current study provides more concrete answers to what finite state machine is relevant to what kinds of processing difficulty and to what extent.</Paragraph> </Section> <Section position="8" start_page="53" end_page="54" type="metho"> <SectionTitle> 6 Future Work </SectionTitle> <Paragraph position="0"> Even though comparative analysis is a widely adopted research design in experimental studies, a sound scientific model should be independent of this comparative nature and should be able to make systematic predictions. Currently, probability re-ranking is one way to make systematic module-internal predictions about the garden-path effect. This brings up the issue of encoding more information in lexical entries and increasing ambiguity so that other ambiguity types also can be disambiguated in a similar way via lexical category disambiguation. This idea has been explored as one of the lexicalist approaches to sentence processing (Kim et al., 2002; Bangalore and Joshi, 1999).</Paragraph> <Paragraph position="1"> Kim et al. (2002) suggest the feasibility of modeling structural analysis as lexical ambiguity resolution. They developed a connectionist neural network model of word recognition, which takes orthographic information, semantic information, and the previous two words as its input and outputs a SuperTag for the current word. A SuperTag is an elementary syntactic tree, or simply a structural description composed of features like POS, the number of complements, category of each complement, and the position of complements. In their view, structural disambiguation is simply another type of lexical category disambiguation, i.e. SuperTag disambiguation. When applied to DO/SC ambiguous fragments, such as &quot;The economist decided ...&quot;, their model showed a general bias toward the NP-complement structure. This NP-complement bias was overcome by lexical information from high-frequency S-biased verbs, meaning that if the S-biased verb was a high frequency word, it was correctly tagged, but if the verb had low frequency, then it was more likely to be tagged as NP-complement verb. This result is also reported in other constraint-based model studies (e.g. Juliano and Tanenhaus (1994)), but the difference between the previous constraint-based studies and Kim et. al is that the result of the latter is based on training of the model on noisier data (sentences that were not tailored to the specific research purpose). The implementation of SuperTag advances the formal specification of the constraint-based lexicalist theory. However, the scope of their sentence processing model is limited to the DO/SC ambiguity, and the description of their model is not clear. In addition, their model is far beyond a simple statistical model: the interaction of different sources of information is not transparent. Nevertheless, Kim et al. (2002) provides a future direction for the current study and a starting point for considering what information should be included in the lexicon.</Paragraph> <Paragraph position="2"> The fundamental goal of the current research is to explore a model that takes the most restrictive position on the size of parameters until additional parameters are demanded by data. Equally important, the quality of architectural simplicity should be maintained. Among the different sources of information manipulated by Kim et. al., the so-called elementary structural information is considered as a reasonable and ideal parameter for addition to the current model. The implementation and the evaluation of the model will be exactly the same as a statistical POS tagger provided with a large parsed corpus from which elementary trees can be extracted.</Paragraph> </Section> class="xml-element"></Paper>