File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/j96-4004_abstr.xml
Size: 5,129 bytes
Last Modified: 2025-10-06 13:48:40
<?xml version="1.0" standalone="yes"?> <Paper uid="J96-4004"> <Title>A Statistically Emergent Approach for Language Processing: Application to Modeling Context Effects in Ambiguous Chinese Word Boundary Perception</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> This paper suggests that the language understanding process can be effectively modeled as the statistical outcome of a large number of independent activities occurring in parallel. There is no global controller deciding which processes to run next. All processing is done locally by many simple, independent agents that make their decisions stochastically. The system is self-organizing, with coherent behavior being a statistically emergent property of the system as a whole. The model, in a nutshell, simulates language understanding as a crystallization process. This process consists of a series of hierarchical, structure-building activities in which high-level linguistic structures are formed from their constituents and get properly hooked up to each other as the process converges.</Paragraph> <Paragraph position="1"> The essential features of the model are: * The process of sentence analysis is a series of computational activities that determine how various constituents in a sentence can be meaningfully related.</Paragraph> <Paragraph position="2"> * Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong t Department of Computer Information Science, University of Pennsylvania, Philadelphia, PA 19104-6389 :~ Department of Information Systems & Computer Science, National University of Singapore, Lower Kent Ridge Road, Singapore 119260, Republic of Singapore (c) 1996 Association for Computational Linguistics Computational Linguistics Volume 22, Number 4 * All computational activities are carried out by a large number of procedures known as codelets.</Paragraph> <Paragraph position="3"> * A linguistic structure is not built by a single codelet. Rather, it is constructed by a sequence of codelets. The execution of this sequence of codelets is interleaved with other codelets that are responsible for building other structures.</Paragraph> <Paragraph position="4"> * The order by which structures are built is not explicitly programmed, but is an emergent outcome of chains of codelets working in an asynchronous parallel mode.</Paragraph> <Paragraph position="5"> * Computational activities are a combination of top-down and bottom-up activities.</Paragraph> <Paragraph position="6"> * Computational activities are indirectly guided by a semantic network of linguistic concepts, which ensures that these activities do not operate independently of the system's representation of the context of a sentence. * Decision making is stochastic, with the amount of randomness being controlled by a parameter known as the computational temperature.</Paragraph> <Paragraph position="7"> We have applied our model to the task of capturing the effect of context on the perception of ambiguous word boundaries in Chinese sentences (Gan 1993). Our ap- proach differs from existing work on Chinese word segmentation (Liang 1983; Wang, Wang, and Bai 1991; Fan and Tsai 1988; Chang, Chen, and Chen 1991; Chiang et al. 1992; Sproat and Shih 1990; Wu and Su 1993; Lua and Gan 1994; Lai et al. 1992; Sproat et al. 1994; Sproat et al. 1996) primarily in that our system performs sentence interpretation, in addition to word boundary identification. Our system figures out where the word boundaries of a sentence are by determining how various constituents in a sentence can be meaningfully related. The relations the system builds represent its interpretation of the sentence. In the initial stage of a run, the system constructs relations between characters of a sentence. Through a spreading activation mechanism, the system gradually shifts to the construction of words and of relations between words. Later, the system progresses to identifying and constructing chunks (in other words, phrases), and to establishing connections between chunks. Note that there is no top-level executive that decides the order of these activities. At any given time, the system stochastically selects one action to execute. Therefore, efforts toward building different structures are interleaved, sometimes cooperating and sometimes competing. The system's high-level behavior, therefore, arises from its low-level stochastic actions. We will give a detailed description of this application in this paper. In Section 2, we introduce the problem of ambiguous Chinese word boundary perception, and follow, in Section 3, with a summary of the current practices in Chinese word identification. We describe our model in Section 4, showing a sample run of our program in Section 5 to illustrate the behavior of the model. Finally, some discussions of the model are covered in Section 6. In Section 7, we compare our model with others, and explore areas for future research in Section 8.</Paragraph> </Section> class="xml-element"></Paper>