File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/p92-1029_metho.xml

Size: 14,427 bytes

Last Modified: 2025-10-06 14:13:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1029">
  <Title>Association-based Natural Language Processing with Neural Networks</Title>
  <Section position="3" start_page="0" end_page="224" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Currently, most practical applications in natural language processing (NLP) have been realized via symbolic manipulation engines, such as grammar parsers. However, the current trend (and focus of research) is shifting to consider aspects of semantics and discourse as part of NLP. This can be seen in the emergence of new theories of language, such as Situation Theory \[Barwise 83\] and Discourse Representation Theory \[Kamp 84\].</Paragraph>
    <Paragraph position="1"> While these theories provide an excellent theoretical framework for natural language understanding, the practical treatment of context dependency within the language can also be improved by enhancing underlying component technologies, such as knowledge based systems. In particular, alternate approaches to symbolic manipulation provided by connectionist models \[Rumelhart 86\] have emerged. Connectionist approaches enable the extraction of processing knowledge from examples, instead of building knowledge bases manually.</Paragraph>
    <Paragraph position="2"> The model described here represents the unification of the connectionist approach and conventional symbolic manipulation; its most valuable feature is the use of word associations using neural network technology.</Paragraph>
    <Paragraph position="3"> Word and concept associations appear to be central in human cognition \[Minsky 88\].</Paragraph>
    <Paragraph position="4"> Therefore, simulating word associations contributes to semantic disambiguation in the computational process of interpreting sentences by putting a strong preference to expected words(meanings).</Paragraph>
    <Paragraph position="5"> The paper describes NLP reinforced by association of concepts and words via a connectionist network. The model is employed within a NLP application system for kana- null kanji conversion x. Finally, an evaluation of the system and advantages over conventional systems are presented.</Paragraph>
  </Section>
  <Section position="4" start_page="224" end_page="224" type="metho">
    <SectionTitle>
2 A brief overview of
</SectionTitle>
    <Paragraph position="0"> kana-kanji conversion Japanese has a several interesting feature in its variety of letters. Especially the existence of several thousand of kanji (based on Chinese characters; ~, 111,..) made typing task hard before the invention of kana-kanji conversion\[Amano 79\] . Now it has become a standard method in inputting Japanese to computers. It is also used in word processors and is familiar to those who are not computer experts. It comes from the simpleness of operations. By only typing sentences by phonetic expressions of Japanese (kan a), the kana-kanji converter automatically converts kana into meaningful expressions(kanji). The simplified mechanism of kana-kanji conversion can be described as two stages of processing: morphological analysis and homonym selection.</Paragraph>
  </Section>
  <Section position="5" start_page="224" end_page="224" type="metho">
    <SectionTitle>
* Morphological Analysis
</SectionTitle>
    <Paragraph position="0"> Kana-inputted (fragment of) sentences are morphologically analized through dictionary look up, both lexicons and grammars. There are many ambiguities in word division due to the agglutinative nature of Japanese (Japanese has no spaces in text), Each partitioning of the kana is then further open to being a possible interpretation of several alternate kanji.</Paragraph>
    <Paragraph position="1"> The spoken word douki, for example, can mean motivation, pulsation, synchronization, or copperware. All of them are spelt identically in kana( kdeg5 -~), but have different kanji eharacters(~, ~'t-~, ~\], ~1 1 Many commercial products use kana-kanji conversion technology in Japan, including the TOSHIBA Tosword-series of Japanese word processors.</Paragraph>
    <Paragraph position="2"> ~-~,respectively). Some kana words have 10 or more possible meanings. Therefore the stage of Homonym Selection is indispensable to kana-kanji conversion for the reduction of homonyms.</Paragraph>
    <Section position="1" start_page="224" end_page="224" type="sub_section">
      <SectionTitle>
Homonym Selection
</SectionTitle>
      <Paragraph position="0"> Preferable semantic homonyms are selected according to the co-occurrence restrictions and selectional restrictions.</Paragraph>
      <Paragraph position="1"> The frequency of use of each word is also taken into account. Usually, the selection is also reinforced by a simple context holding mechanism; when homonyms appear in previous discourse and one of them is chosen by a user, the word is automatically memorized in the system as in a cache technology. Then, when the same homonyms appear the memorized word is selected as the most preferred candidate and is shown to the user.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="224" end_page="226" type="metho">
    <SectionTitle>
3 Association-based kana-
</SectionTitle>
    <Paragraph position="0"> kanji conversion The above mechanisms are simple and effective in regarding kana-kanji converter as a typing aid. However, the abundance of homonyms in Japanese contributes to many of the ambiguities and a user is forced to choose the desired kanji from many candidates. To reduce homonym ambiguities a variety of techniques are available; however, these tend to be limited from a semantic disambiguation perspective. In using word co-occurrence restrictions, it is necessary to collect a large amount of co-occurrence phenomena, a practically impossible task. In the case of the use of selectional restrictions, an appropriate thesaurus is necessary but it is known that defining the conceptual hierarchy is difficult work \[Lenat 89\]\[EDR 90\]. Techniques for storing previous kanji selections (cache)  Text;ua.l Znpu ~&amp;quot; . ......... ...... j ',,'-.. / ~\ ~~\ ',, &amp;quot;t.. ~ 2&amp;quot;~J ~ ',, &amp;quot;-~  are too simple to disambiguate between possible previous selections for the same homonym with respect to the context or between context switches.</Paragraph>
    <Paragraph position="1"> To avoid these problems without increasing computational costs, we propose the use of the associative functionality of neural networks. The use of association is a natural extension to the conventional context holding mechanism. The idea is summarized as follows. There are two stages of processing: network generation and kana-kanji conversion.</Paragraph>
    <Paragraph position="2"> A network representing the strength of word association is automatically generated from real documents. Real documents can be consideredas training data because they are made of correctly converted kanji. Each node in the network uniquely correspond to a word entry in the dictionary of kana-kanji conversion. Each node has an activation level. The link between nodes is a weighted link and represents the strength of association between words. The network is a Hopfield-type network\[Hopfield 84\]; links are bidirectional and a network is one layered.</Paragraph>
    <Paragraph position="3"> When the user chooses a word from homonym candidates, a certain value is inputted to the node corresponding to the chosen word and the node will be activated. The activation level of nodes connected to the activated node will be then activated. In this manner, the activation spreads over the net- null work through the links and the active part of the network can be considered as the associative words in that context. In kana-kanji conversion, the converter decides the preference of word order for homonyms in the given context by comparing the node activation level of each node of homonyms. An example of the method is shown in Figure 1.</Paragraph>
    <Paragraph position="4"> Assume the network is already built from certain documents. A user is inputting a text whose topic is related to computer hardware. In the example, words like clock ( ~ t~ .~ ~ ) and signal (4~'-~-) already appear in the previous context, so their activation levels are relatively high. When the word DOUKI (~&amp;quot;) ~) is inputted in kana and the conversion starts, the activation level of synchronization (~J~) is higher than that of other candidates due to its relationship to clock or signal. The input douki is then correctly converted into synchronization (\[~jtj\]).</Paragraph>
    <Paragraph position="5"> The advantages of our method are: * The method enables kanji to be given based on a preference related to the current context. Alternative kanji selections are not discarded but are just given a lower context weighing. Should the context switch, the other possible selections will obtain a stronger context preference; this strategy allows the system to capably  handle context change.</Paragraph>
    <Paragraph position="6"> * Word preferences of a user are reflected in the network.</Paragraph>
    <Paragraph position="7"> * The correctness of the conversion is improved without high-cost computation  such as semantic/discourse analyses.</Paragraph>
  </Section>
  <Section position="7" start_page="226" end_page="226" type="metho">
    <SectionTitle>
4 Implementation
</SectionTitle>
    <Paragraph position="0"> The system was built on Toshiba AS-4000 workstation (Sun4 compatible machine) using C. The system configuration is shown in Figure 2.</Paragraph>
    <Paragraph position="1"> The left-hand side of the dashed line represents an off-line network building process. The right-hand side represents a kana-kanji conversion process reinforced with a neural network handler. The network is used by the neural network handler and word associations are done in parallel with kana-kanji conversion. The kana-kanji converter receives kanasequences from a user. It searches the dictionary for lexical and grammatical information and finally creates a list of possible homonym candidates. Then the neural network handler is requested for activation levels of homonyms.</Paragraph>
    <Paragraph position="2"> After the selection of preferred homonyms, it shows the candidates in kanji to a user. When the user chooses the desired one, the chosen word information is sent to the neural network handler through a homonym choice interface and the corresponding node is activated.</Paragraph>
    <Paragraph position="3"> The roles and the functions of main components are described as follows.</Paragraph>
  </Section>
  <Section position="8" start_page="226" end_page="228" type="metho">
    <SectionTitle>
* Neural Network Generator
</SectionTitle>
    <Paragraph position="0"> Several real documents are analyzed and the network nodes and the weights of links are automatically decided. The documents consist of the mixture of kana and kanji; homonyms for the kanji within the given context are also provided. The documents, therefore, can be seen as training data for the neural network. The analysis proceeds through the following steps.</Paragraph>
    <Paragraph position="1">  1. Analyze the documents morphologically and convert into a sequence of words. Note that particles and demonstratives are ignored because they have no characteristics in word association.</Paragraph>
    <Paragraph position="2"> 2. Count up the frequency of the all combination of co-appeared word-pair in a paragraph and memorize</Paragraph>
    <Paragraph position="4"> them as the strength of connection.</Paragraph>
    <Paragraph position="5"> A paragraph is recognized only by a format information of documents.</Paragraph>
    <Paragraph position="6">  3. Sum up the strength of connection for each word-pair.</Paragraph>
    <Paragraph position="7"> 4. Regularize the training data; this involves removing low occurrences (noise) and partitioning the fre- null quency range in order to obtain a monotonically decreasing (in frequency) training set.</Paragraph>
    <Paragraph position="8"> Although the network data have only positive links and not all nodes are connected, non-connected nodes are assumed to be connected by negative weights so that the Hopfield conditions \[Hopfield 84\] are satisfied. As described above, the technique used here is a morphological and statistical analysis. Actually this module is a pattern learning of co-appearing words in a paragraph.</Paragraph>
    <Paragraph position="9"> The idea behind of this approach is that words that appear together in a paragraph have some sort of associative connection. By accumulating them, pairs without such relationships will be statistically rejected.</Paragraph>
    <Paragraph position="10"> From a practical point of view, automated network generation is inevitable. Since human word association differ by individ- null ual, creation of a general purpose associative network is not realistic. Because the training data for the network is supposed to be supplied by users' documents in our system, automatic network generation mechanism is necessary even if the generated network is somewhat inaccurate. null * Neural Network Handler The role of the module is to recall the total patterns of co-appearing words in a paragraph from the partial patterns of the current paragraph given by a user.</Paragraph>
    <Paragraph position="11"> The output value Oj for each node j is calculated by following equations.</Paragraph>
    <Paragraph position="13"> where f : a sigmoidal function : a real number representing the inertia of the network(0 &lt; ~ &lt; 1). nj : input value to node j. Ij : external input value to node j. wjl : weight of a link from node i to node j; Wji ---- Wij , Wii .~ O.</Paragraph>
    <Paragraph position="14"> The external input value Ij takes a certain positive value when the word corresponding to node j is chosen by a user. Otherwise zero.</Paragraph>
    <Paragraph position="15"> Although the module is software implemented, it is fast enough to follow tile typing speed of a user. 2  ing for the spm-seness of the network. Tile basic algorithm is almost same as the conventional one. The difference is that holnonym candidates are sorted by the activation levels of the corresponding nodes in the network, except when local constraints such as word co-occurrence restrictions are applicable to the candidates. The associative information also affects the preference decision of grammatical ambiguities.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML