File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1310_intro.xml
Size: 3,564 bytes
Last Modified: 2025-10-06 14:00:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1310"> <Title>Nonlocal Language Modeling based on Context Co-occurrence Vectors</Title> <Section position="3" start_page="0" end_page="80" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Human pattern recognition rarely handles isolated or independent objects. We recognize objects in various spatiotemporal circumstances such as an object in a scene, a word in an uttermlce. These circumstances work as conditions, eliminating ambiguities and enabling robust recognition. The most challenging topics in machine pattern recognition are in what representation and to what extent those circumstances are utilized.</Paragraph> <Paragraph position="1"> In laalguage processing, a context--that is; a portion of the utterance or the text before the object--is ml important circumstmlce.</Paragraph> <Paragraph position="2"> One way of representing a context is statistical language nmdels which provide a word sequence probability, P(w~), where w~ denotes the sequence wi...wj. In other words, they provide the conditional probability of a word given with the previous word sequence, P( wilw~-l ), which shows the prediction of a word in a given context.</Paragraph> <Paragraph position="3"> The most conmmn laalguage models used nowadays are N-granl models based on a (N- 1)-th order Markov process: event predictions depend on at most (N- 1) previous events. Therefore, they offer the following approximation: null</Paragraph> <Paragraph position="5"> A common value for N is 2 (bigram language model) or 3 (trigram language model); only a short local context of one or two words is considered.</Paragraph> <Paragraph position="6"> Even such a local context is effective in some cases. For example, in Japanese, after the word kokumu 'state affairs', words such as daijin 'minister' mad shou 'department' likely follow; kaijin 'monster' and shou 'priZe' do not. After dake de 'only at', you cml often find wa (topic-marker), but you hardly find ga (nominative-marker) or wo (accusativemarker). These examples show behaviors of compound nouns and function word sequences are well handled by bigram mad trigraan models. These models are exploited in several applications such as speech recognition, optical character recognition and nmrphological analysis. null Local language models, however, cannot predict nmch in some cases. For instance, the word probability distribution after de wa 'at (topic-marker)' is very flat. However, even if the probability distribution is flat in local language models, the probability of daijin 'minister' and kaijin 'monster' must be very different in documents concenfing politics. Bigram and trigram models are obviously powerless to such kind of nonlocal, long-distmlce lexical dependencies.</Paragraph> <Paragraph position="7"> This paper presents a nonlocal language model. The important information concerning long-distance lexical dependencies is the word co-occurrence information. For example, words such as politics, govermnent, administration, department, tend to co-occur with daijin 'minister'. It is easy to measure co-occurrences of word pairs from a training corpus, but utilizing them as a representation of context is the problem. We present a vector</Paragraph> <Paragraph position="9"> representation of word co-occurrence information; and show that the context can be represented as a sum of word co-occurrence vectors in a docmnent and it is incorporated in a non-local language model.</Paragraph> </Section> class="xml-element"></Paper>