File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/w93-0103_metho.xml

Size: 12,006 bytes

Last Modified: 2025-10-06 14:13:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="W93-0103">
  <Title>Lexical Concept Acquisition From Collocation Map 1</Title>
  <Section position="3" start_page="22" end_page="24" type="metho">
    <SectionTitle>
2 Definition of Lexical Concept
</SectionTitle>
    <Paragraph position="0"> Whenever we think of a word, we are immediately reminded of some form of meaning of the word. The reminded structure can be very diverse in size and the type of the information that the structure delivers. Though it is not very clear at this point what the structure is and how it is derived, we are sure that at least some type of the reminded structure is readily converted to the verbal representation. Then the content of vebral form must be a clue to the reminded structure. The reminded structure is commonly referred to as the meaning of a word. Still the verbal representation can be arbitrarily complex, yet the representation is made up of words. Thus the words in the clue to the meaning of a word seem to be an important element of the meaning.</Paragraph>
    <Paragraph position="1"> Now define the concept of a word as Definition 1 The lexical concept of a word is a set of associated words that are weighted by their associativeness.</Paragraph>
    <Paragraph position="2"> The notion of association is rather broadly defined. A word is associated with another word when the one word is likely to occur in the clue of the reminded structure of the other word in some relations. The association by its definition can be seen as a probabilistic function of two words. Some words are certainly more likely to occur in association with a particular word. The likeliness may be deterministically explained by some formal  theories, but we believe it is more of inductive(experimental) process. Now define the concept a of word w as a probabilistic distribution of its associated words.</Paragraph>
    <Paragraph position="4"> pi&gt;T.</Paragraph>
    <Paragraph position="5"> Thus the set of associated words consists of those whose probability is above threshold value T. The probabilistic distribution of words may exist independently of the influence of relations among words though it is true that relations in fact can affect the distribution. But in this paper we do not take other information into the model. If we do so, the model will have the complexity and sophistication of knowledge representation. Such an approach is exemplified by the work of Goldman and Charniak (1992).</Paragraph>
    <Paragraph position="6"> Equation 1 can be further elaborated in several ways. It seems that the concept of a word as in Equation 1 may not be sufficient. That is, Equation 1 is about the direct association of a given word. Indirect association can also contribute to the meaning of a word. Now define the expanded concept of a word as</Paragraph>
    <Paragraph position="8"> If the indirect association is repeated for several depths a class of words in particular aspects can be obtained. A potential application of Equation 3 and 4 is the automatic thesaurus construction. Subsumption relation between words may be computed by carefully expanding the meaning of the words. The subsumption relation, however, may not be based on the meaning of the words, but it rather be defined in statistical nature.</Paragraph>
    <Paragraph position="9"> The definition of lexical meaning as we defined is simple, and yet powerful in many ways. For instance, the distance between words can be easily computed from the representation. The probabilistic elements of the representation make the acquisition an experimental process and base the meaning of words on more consistent foundation. The computation of Equation 1, however, is not simple. In the next section we define Collocation Map and explain the algorithm to compute the conditional probabilities from the Map.</Paragraph>
  </Section>
  <Section position="4" start_page="24" end_page="27" type="metho">
    <SectionTitle>
3 Collocation Map
</SectionTitle>
    <Paragraph position="0"> Collocation map is a kind of Belief Net or knowledge map that represents the dependencies among words(concepts). As it does not have decision variables and utility, it is different from influence diagram. One problem with knowledge map is that it does not allow cycles while words can be mutually dependent. Being DAG is a big advantage of the formalism in computing probabilistic decisions, so we cannot help but stick to it. A cyclic relation should be broken into linear form as shown in figure 1. Considering the size of collocation map and the connectivity of nodes in our context is huge it is not practical to maintain all the combination of conditional probabilities for each node. For instance if a node has n conditioning nodes there will be 2 n units of probability information to be stored in the node. We limit the scope to the direct dependencies denoted by arcs.</Paragraph>
    <Paragraph position="1"> What follows is about the dependency between two words. In figure 2,</Paragraph>
    <Paragraph position="3"> Pl denotes the probability that word b occurs provided word a occurred. Once a text is transformed into an ordered set of words, the list should be decomposed into binary relations of words to be expressed in collocation map. Here in fact we are making an implicit assumption that if a word physically occurs frequently around another word, the first word is likely to occur in the reminded structure of the second word. In other words, physical occurrence order may be a cause to the formation of associativeness among words.</Paragraph>
    <Paragraph position="4"> Di = (a,b,c,d,e,f,...,z).</Paragraph>
    <Paragraph position="5"> When Di is represented by a, b, c, * - -, z), the set of binary relations with window size 3(let us call this set/~3) format is as follows.</Paragraph>
    <Paragraph position="6">  s D i = (ab, ac, bc, ad, bd, cd, be,ce,de,cf,...,).</Paragraph>
    <Paragraph position="7"> For words di and ct, P(c tldi) can be computed at least in two ways. As mentioned earlier, we take the probability in the sense of frequency rather than belief. In the first</Paragraph>
    <Paragraph position="9"> where i &lt; j.</Paragraph>
    <Paragraph position="10"> Each node di in the map maintains two variables f(di) and f(diej), while each arc keeps the information of P(cjldi). From the probabilities in arcs the joint distribution over all variables in the map can be computed, then any conditional probability can be computed. Let S denote the state vector of the map.</Paragraph>
    <Paragraph position="12"> Computing exact conditional probability or marginal probability requires often exponential resources as the problem is know to be NP-hard. Gibb's sampling must be one of the best solutions for computing conditional or marginal probabilities in a network such as collocation map. It approximates the probabilities, and when optimal solutions are asked simulated annealing can be incorporated. Not only for computing probabilities, pattern completion and pattern classification can be done through the map using Gibb's sampling.</Paragraph>
    <Paragraph position="13"> In Gibb's sampling, the system begins at an arbitrary state or a given S, and a free variable is selected arbitrarily or by a selecting function, then the value of the variable will be alternated. Once the selection is done, we may want to compute P(S = g) or other fimction of S. As the step is repeated, the set of S's form a sample. In choosing the next variable, the following probability can be considered.</Paragraph>
    <Paragraph position="15"> The probability is acquired from samples by recording frequencies, and can be updated as the frequencies change. The second method is inspired by the model of (Neal 1992) which shares much similarity with Boltzmann Machine. The difference is that the collocation map has directed ares. The probability that a node takes a particular value is measured by the energy difference caused by the value of the node.</Paragraph>
    <Paragraph position="17"/>
    <Paragraph position="19"> Conditional and marginal probabilities can be approximated from Gibb's sampling. A selection of next node to change has the following probability distribution.</Paragraph>
    <Paragraph position="21"> The acquisition of probability for each arc in the second method is more complicated than the first one. In principle, the general patterns of variables cannot be captured without the assistance of hidden nodes. Since in our case the pattern classification is not an absolute requirement, we may omit the hidden nodes after careful testing. If we employ hidden units, the collocation map may look as in figure 5 for instance.</Paragraph>
    <Paragraph position="22"> Learning is done by changing the weights in ares. As in (Neal, 1992), we adopt gradient ascent algorithm that maximize log-likelihood of patterns.</Paragraph>
    <Paragraph position="24"> Batch learning over all the patterns is, however, unrealistic in our case considering the size of collocation map and the gradual nature of updating. It is hard to vision  that whole learning is readjusted every time a new document is to be learned. Gradual learning(non batch) may degrade the performance of pattern classification probably by a significant degree, but what we want to do with collocation map is not a clear cut pattern identification up to each learning instance, but is a much more brute categorization. One way to implement the learning is first to clamp the nodes corresponding to the input set of binary dependencies, then run Gibb's sampling for a while. Then, add the average of energy changes of each arc to the existing values.</Paragraph>
    <Paragraph position="25"> So far we have discussed about computing the conditional probability from Collocation Map. But the use of the algorithm is not limited to the acquisition of lexical concept. The areas of the application of the Collocation Map seems to reach virtually every corner of natural language processing and other text processing such as automatic indexing.</Paragraph>
    <Paragraph position="26"> An indexing problem is to order the words appearing in a document by their relative importance with respect to the document. Then the weight C/(wi) of each word is the probability of the word conditioned by the rest of the words.</Paragraph>
    <Paragraph position="27"> ek(wi) = P( wi l wj, j 5k i) . (14) The application of the Collocation Map in the automatic indexing is covered in detail in Han (1993).</Paragraph>
    <Paragraph position="28"> In the following we illustrate the function of Collocation Map by way of an example. The Collocation Map is built from the first 12500 nouns in the textbook collection in Penn Tree Bank. Weights are estimated using the mutual information measure. The topics of the textbook used includes the subjects on planting where measuring by weight and length is frequently mentioned. Consider the two probabilities as a result of the sampling on the Collocation Map.</Paragraph>
    <Paragraph position="29"> P(depthlinch ) = 0.51325, and P(weightlinch ) = 0.19969.</Paragraph>
    <Paragraph position="30"> When the sampling was loosened, the values were 0.3075 and 0 respectively. The first version took about two minutes, and the second one about a minute in Sun 4 workstation. The quality of sampling can be controlled by adjusting the constant factor, the cooling speed of temperature in simulated annealing, and the sampling density. The simple experiment agrees with our intuition, and this demonstrates the potentail of Collocation Map. It, however, should be noted that the coded information in the Map is at best local. When the Map is applied to other areas, the values will not be very meaningful. This may sound like a limitation of Collocation Map like approach, but can be an advantage. No system in practice will be completely general, nor is it desirable in many cases. Figure 4 shows a dumped content of node tree in the Collocation Map, which is one of 4888 nodes in the Map.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML