File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1014_metho.xml

Size: 23,625 bytes

Last Modified: 2025-10-06 14:12:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="A92-1014">
  <Title>Automatic Learning for Semantic Collocation</Title>
  <Section position="4" start_page="104" end_page="106" type="metho">
    <SectionTitle>
3 Algorithm
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="104" end_page="106" type="sub_section">
      <SectionTitle>
3.1 Relaxation Process - Informal Explanation
</SectionTitle>
      <Paragraph position="0"> of the Algorithm Though the algorithm simply counts frequencies of co-occurrences of word relations, there are some complications. In this section, we also use the prepositional phrase attachment problem as an example, though the algorithm can be applied to any kind of structural ambiguity. null 1. We have to count frequencies of &amp;quot;meaningful&amp;quot; co-occurrences between verbs and nouns, i.e. co-occurrences where the nouns actually appear in the position of the head-noun of PP's which can be attached to verbs or other nouns. The frequency of &amp;quot;general&amp;quot; co-occurrences where the two words occur, for example, in the same sentences may be of little use.</Paragraph>
      <Paragraph position="1"> This means that we encounter the problem of the chicken and the egg here, i.e. in order to obtain ire- null quencies of &amp;quot;meaningful&amp;quot; co-occurrences in sample texts, we have to know the correct attachment positions of PPs, and determining the correct attachments of PPs in sample texts requires knowledge of frequencies of &amp;quot;meaningful&amp;quot; co-occurrences. 2. We usually cannot expect to have a corpus of sample sentences large enough for &amp;quot;intrinsic&amp;quot; relations to appear significantly more often than &amp;quot;accidental&amp;quot; relations. It is desirable, or inevitable in a sense, to introduce methods of increasing the number of co-occurrences.</Paragraph>
      <Paragraph position="2"> One possible way of doing this (which we have adopted in our algorithm) is to introduce &amp;quot;semantic&amp;quot; similarity measures between words, and count the number of extended co-occurrences taking the similarity measures into account. That is, the frequency of \[girl, WITH, necklace\] in sample texts contributes not only to the plausibility value of the tuple (girl, WITH, necklace), but also to that of (girl, WITH, scarf), according to the similarity value (or semantic distance) of &amp;quot;scarf&amp;quot; and &amp;quot;necklace&amp;quot;.</Paragraph>
      <Paragraph position="3"> Because we compute semantic distances among nouns based on (dis-)similarities of their patterns of co-occurrence with other words (in short, two nouns are judged to be close to each other, if they often co-occur with the same words), we also encounter an chicken and egg problem here. The calculation of semantic distance requires frequencies of collocation, and in order to find semantic collocations, semantic distance could be helpful.</Paragraph>
      <Paragraph position="4"> The two chicken and egg problems in the above are treated differently in the algorithm. We focus on the first problem in this paper, while readers who are interested in the second problem can refer to \[Sekine et al., 1992\] In the following, we call the tuples generated from sample texts by a parser &amp;quot;instance-tuples&amp;quot; and the tuples to which plausibility value are assigned &amp;quot;hypothesis-tuples&amp;quot;. Instance-tuples and hypothesis-tuples are indicated by \[A, R, B\] and (A, R, B), respectively.</Paragraph>
      <Paragraph position="5"> Note that for the sake of explanation the following is not an accurate description of the algorithm. An accurate one is given in the next section.</Paragraph>
      <Paragraph position="6"> Input: I saw a girl with a telescope.</Paragraph>
      <Paragraph position="7">  (STEP-I) Generate instance-tuples All possible instance-tuples such as \[saw, SUBJ, I\], \[girl, WITH, telescope\], \[saw, WITH, telescope\], etc. are generated by a simple parser.</Paragraph>
      <Paragraph position="8"> (STEP-2) Assign credits Assign credits to the instance-tuples, by considering the plausibility value of corresponding hypothesistuples. As we will explain later, we assign credits in such a way that (a) (b)  the sum of credits assigned to competing instance-tuples is equal to 1. Competing tuples means such tuples as \[girl, WITH, scarf\] and \[saw, WITH, scarf\] which show different attachment positions of the same PP.</Paragraph>
      <Paragraph position="9"> the credits assigned to instance-tuples are proportional to the plausibility value of the corresponding hypothesis-tuples.</Paragraph>
      <Paragraph position="10"> Because hypothesis-tuples have the same plausibility value at the initial stage, each instance-tuple is assigned the same credit, say, 1/(number of competing tuples). The credit of \[saw, SUB J, I\] is one, while the credits of \[girl, WITH, scarf\] and \[saw, WITH, scarf\] are  accumulating credits assigned to the corresponding instance-tuples.</Paragraph>
      <Paragraph position="11"> All occurrences of instance-tuples generated from the sample corpus have their credits assigned in (STEP-2). We assume that tuples corresponding to &amp;quot;intrinsic&amp;quot; ontological relations occur more often in texts than &amp;quot;accidental&amp;quot; ones. That is, we expect that instance-tuples of \[girl, WITH, scarf\] occur more often than those of \[saw, WITH, scarf\] and that the sum of the credits of \[girl, WITH, scarf\] is greater than that of \[saw, WITH, scarf\]. This leads to a higher plausibility value for (girl., WITH, scarf) than for (saw, WITH, scarf).</Paragraph>
      <Paragraph position="12"> After (STEP-3), the algorithm goes back to (STEP2) to compute new credits to instance-tuples. Unlik~ the first cycle, because the hypothes{s-tuple (girl, WITH, scarf) has been assigned a higher plausi.</Paragraph>
      <Paragraph position="13"> bility value than (saw, WITH, scarf), the credit to be assigned to \[girl, WITH, scarf\] would b~ higher than \[saw, WITH, scarf\].</Paragraph>
      <Paragraph position="14"> When we recompute the plausibility value ii (STEP-3), the increased credit assigned to \[girl~ WITH, scarf\] in (STEP-2) increases the plausibil ity value of (girl, WITH, scarf) and on the othel hand, the decreased credit of \[saw, WITH, scarf: results in a lower plausibility value for (saw, WITH scarf).</Paragraph>
      <Paragraph position="15"> By repeating (STEP-2) and (STEP-3), we expec there should be an increase in the credits assigned t~ instance-tuples which correspond to correct attach ment position. Further, the credits of hypothesis tuples should approach values which represent th, real &amp;quot;intrinsicality&amp;quot; of the denoted relationships. (STEP-3) will be further augmented by introducin semantic distances between words., i.e. a simila hypothesis helps to increase the credit of a hypoth esis. We expect this should resolve the second typ of chicken and egg problems. See \[Sekine et al 1992\]</Paragraph>
    </Section>
    <Section position="2" start_page="106" end_page="106" type="sub_section">
      <SectionTitle>
3.2 Terminology and notation
</SectionTitle>
      <Paragraph position="0"> instance-tuple \[h, r, a\] : a token of a dependency relation; part of the analysis of a sentence in a corpus.</Paragraph>
      <Paragraph position="1"> hypothesis-tuple (h,r,a): a dependency relation; an abstraction or type over identical instance-tuples.</Paragraph>
      <Paragraph position="2"> cycle : repeat time of the relaxation cycle.</Paragraph>
      <Paragraph position="3"> CT,~ : Credit of instance-tuple T with identification number i. \[0, 1\] V~ : Plausibility value of a hypothesis-tuple T in cycle g. \[0, 1\] D g (wa,wb) : distance between words, w= and Wb in cycle g. \[0, 1\]</Paragraph>
    </Section>
    <Section position="3" start_page="106" end_page="106" type="sub_section">
      <SectionTitle>
3.3 Algorithm
</SectionTitle>
      <Paragraph position="0"> 1. For each sentence we use a simple grammar to find all tuples possibly used in this sentence. Each instance-tuple is then given credit in proportion to the number of competing tuples.</Paragraph>
      <Paragraph position="2"> This credit shows which rules are suitable for this sentence. On the first iteration thesplit of the credit between ambiguous analyses is uniform as shown above, but on subsequent iterations plausibility values of the hypothesis-tuples V~ -1 before the iteration are used to give preference to credit for some analyses over others. The formula for this will be shown later.</Paragraph>
      <Paragraph position="3"> 2. Hypothesis-tuples have a plausibility value which indicates their reliability by a figure from 0 to 1.</Paragraph>
      <Paragraph position="4"> If an instance-tuple occurs frequently in the corpus or if it occurs where there are no alternative tuples, the plausibility value for the corresponding hypothesis must be large. After analysing all the sentences of the corpus, we get a set of sentences with weighted instance-tuples. Each instance-tuple invokes a hypothesis-tuple. For each hypothesistuple, we define the plausibility value by the following formula. This formula is designed so that the value does not exceed 1.</Paragraph>
      <Paragraph position="6"> 3. At this stage, the word-distances can be used to modify the plausibility values of the hypothesis- null tuples. The word-distances are either defined externally by human intuition or calculated in the previous cycle with the formula shown later. Distance between words induces a distance between hypothesis-tuples. Then for each hypothesis-tuple, another hypothesis-tuple which gives greatest effect can be used to increase its plausibility value. The new plausibility value with similar hypothesis-tuple effect is calculated by the following formula.</Paragraph>
      <Paragraph position="8"> Here, the hypothesis-tuple T' is the hypothesis-tuple which gives the greatest effect to the hypothesis-tuple T (original one). Hypothesis-tuple T and T' have all the same elements except one.</Paragraph>
      <Paragraph position="9"> The distance between T and T' is the distance between the different elements, w= and Wb. Ordinarily the difference is in the head or argument element, but when the relation is a preposition, it is possible to consider distance from another preposition.</Paragraph>
      <Paragraph position="10"> Distances between words are calculated on the basis of similarity between hypothesis-tuples about them.</Paragraph>
      <Paragraph position="11"> The formula is as follows:</Paragraph>
      <Paragraph position="13"> w= and wb, respectively and whose heads and relations are the same. /9 is a constant parameter.</Paragraph>
      <Paragraph position="14"> This procedure will be repeated from the beginning, modifying the credits of instance-tuples between ambiguous analyses by using the plausibility values of hypothesis-tuples. This will hopefully be more accurate than the previous cycle. On the first iteration, we used just a constant figure for the credits of instance-tuples. But this time we can use the plausibility value of the hypothesis-tuple which was deduced from the previous iteration. Hence with each iteration we expect more reliable figures. 1'o calculate the new credit of instance-tuple T, we use:</Paragraph>
      <Paragraph position="16"> Here, V@ in the numerator position is the plausibility value of a hypothesis-tuple which is the same tuple as the instance-tuple T. VT g in the denominator position are the plausibility values of competing hypothesis-tuples in the sentence and the plausibility value of the same hypothesis-tuple itself, a is a constant parameter.</Paragraph>
      <Paragraph position="17"> Iterate step 1 to 5 several times, until the information is saturated.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="106" end_page="109" type="metho">
    <SectionTitle>
4 Experiment
</SectionTitle>
    <Paragraph position="0"> We conducted two experiments to show the effectiveness of our algorithm. The first one uses a small, artificial corpus to show how the algorithm works. The second one is a real experiment in which we use data from a real corpus (computer manuals).</Paragraph>
    <Section position="1" start_page="106" end_page="108" type="sub_section">
      <SectionTitle>
4.1 Artificial corpus
</SectionTitle>
      <Paragraph position="0"> We treat the prepositional attachment ambiguity in this experiment. Though the corpus consists of only 7 artificial sentences, this experiment shows the basic characteristics and the effectiveness of the algorithm.</Paragraph>
      <Paragraph position="1"> The corpus and the input data to the algorithm are as follows:</Paragraph>
      <Paragraph position="3"> Table 1 shows the result of the first cycle. Figures in this table show the plausibility values of hypothesis-tuples between the words in the corresponding columns. The plausibility value of the hypothesis-tuple (saw, WITH, telescope), for example, is 0.75.</Paragraph>
      <Paragraph position="4"> WITH, necklace) becomes only 0.66. The difference in the behavior of these two hypothesis-tuples is caused by the difference of the plausibility values assigned to the hypothesis-tuples (girl, WITH, scarf) and (saw, WITH, scarf). The plausibility values are 1.00 and 0.50 respectively.</Paragraph>
      <Paragraph position="5">  Then we proceed to the second cycle, using the plausibility values which were produced in the previous cycle. Table 3 shows the plausibility values after the fifth cycle.  These plausibility values basically reflect the number of co-occurrences of the two words. However, the hypothesis-tuple (girl, WITH, scarf) has plausibility value 1.0, because in the sentence &amp;quot;A girl with a scarf saw me&amp;quot;, there is no ambiguity in the attachment position of &amp;quot;with a scarf&amp;quot;.</Paragraph>
      <Paragraph position="6"> Then we compute the effects of similar hypothesis-tuples by considering distances among words. The effects which the existence of similar hypotheses has on other hypotheses are clearly shown in Table 2.</Paragraph>
      <Paragraph position="7"> The plausibility values of the hypothesis-tuples have changed from the former ones. For example, we can find a sharp distinction between the plausibility values of the hypothesis-tuples (saw, WITH, necklace) and (girl, WITH, necklace) Though these two have the same plausibility value 0.50 before considering the effect of similar hypotheses, the plausibility value of (girl, WITH, necklace) becomes 0.82 while that of (saw, By the fifth cycle, most of the figures have moved well * towards the extremes, either 0 or 1.</Paragraph>
      <Paragraph position="8"> For example, the plausibility values of the hypothesis-tuples (saw, WITH, necklace) and (girl, WITH, necklace), are well apart, 0.30 and 0.99, respectively, although they had the same plausibility value after the first cycle. Also the hypothesis-tuple (moon, WITH, telescope) has the plausibility value 0.00, though its initial plausibility value was 0.50. We can claim that the learning process has worked well in making these differences. null On the other hand, if the movement towards extreme values was too strong, there might be a possibility that only the strongest plausibility value survived. When there are two hypotheses which have instances in the same sentence, how do the plausibility values move? This can be seen with the two hypothesis-tuples (saw, WITH, telescope) and (girl, WITH, telescope) which are contradictory hypothesis-tuple~ in the sentence &amp;quot;I saw a girl with a telescope&amp;quot; In the results, both of their plausibility values are hig\[ and avoid the monopoly, because a number of instance tuples and a similar hypothesis contribute to increase the plausibility values of both of the hypotheses tuples. Note that the relation (saw, WITHOUT, telescope) which does not appear in the sentences, has a rathe~ high plausibility value, 0.64. This occurred becaus~ of the effect of the similar hypothesis-tuple (saw: WITH, telescope). But the relation (meet, WITH  telescope), which has a relatively high plausibility value 0.57, is normally unacceptable. This is caused by the close distance between the words 'meet' and 'see'.</Paragraph>
      <Paragraph position="9"> The distances between words in the 5th cycle are shown in Table 4.</Paragraph>
      <Paragraph position="10">  we put the credit 1/2 for each instance-tuple in which N2 is an argument.</Paragraph>
      <Paragraph position="11"> We have not made any word distance information before the process.</Paragraph>
      <Paragraph position="12"> We classified the results obtained as correct or incorrect. 'Correct' means that a hypothesis-tuple which has the highest plausibility value is the correct tuple according to our human judgement. 'Incorrect' means it is judged wrong by a human. 'Indefinite' means that plausibility values of some hypothesis-tuples have the same highest value. 'Uncertain' means that it is impossible even for a human to judge which hypothesis tuple is the best without context. The results are shown in Table 6. These results, both the plausibility values of hypothesis-tuples and the word distances, seem to behave as we expected.</Paragraph>
    </Section>
    <Section position="2" start_page="108" end_page="109" type="sub_section">
      <SectionTitle>
4.2 The Japanese compound noun corpus
</SectionTitle>
      <Paragraph position="0"> We conducted an experiment using compound nouns extracted from a :Japanese computer manual, because of its simplicity and feasibility. The corpus consists of 4152 sentences (about 90,000 words). This might be small considered for statistical analysis purpose, but as the corpus is a sublanguage one, the structures of sentences are rather homogeneous and therefore the number of sentences might be considered sufficient.</Paragraph>
      <Paragraph position="1"> There are 616 compound nouns in the corpus, where 210 different words appear. We call an element word of a compound noun a 'word'.</Paragraph>
      <Paragraph position="2"> No. of words No. of compound nouns  We assume that all words in each compound noun can be structurally related, if they satisfy a condition that a relation has a preceding argument and a following head. For example, from a compound noun with 4 elements, we can extract 6 tuples as follows.</Paragraph>
      <Paragraph position="3">  We know that each element can be the argument in one relation. In the above example, N1 has 3 instance-tuples in which to be the argument. We put the credit 1/3 as initial credit for each instance-tuple. Similarly,  The percentage of correct answers was about 70 %. Though this result is not as impressive as that of the last experiment, it is not bad.</Paragraph>
      <Paragraph position="4"> From a perusal of the incorrect analyses, we can find typical reasons for making an incorrect analysis. When there are 2 competing tuples for a 3-element compound noun, these tuples are individually both acceptable in many cases. For example, let's take a compound noun 'file transfer operation'. If we consider the two instance-tuples, \[transfer, MODIFY, file\] and \[operation, MODIFY, file\], both are acceptable in the absence of any context. In this case, the plausibility values of the two hypothesis-tuples become almost the same. But there might be a very small difference which may be caused by the effect of a similar hypothesis-tuple. If the wrong hypothesis tuple gains the higher plausibility value, the analysis becomes wrong.</Paragraph>
      <Paragraph position="5"> We think that the relation between the words of a compound noun can be defined not only by a semantic relation between each word but also by the structure of the compound noun itself. This feature of compound nouns makes it hard to get a higher percentage of correct answers in this experiment.</Paragraph>
      <Paragraph position="6">  The behavior of the algorithm changes according to the two parameters, a in formula 5 and /3 in formula 4. Though the parameters are set as a = 4.0 and 13 --- 20.0 in the experiments, we have no established procedures for determining these parameters appropriately. We need to develop criteria or methods to determine these parameters, depending on characteristics of sample texts, etc..</Paragraph>
      <Paragraph position="7">  (b) Word sense ambiguity  The entity of collocational relation is represented by a word and the relation labels are either a simple grammatical functions or a surface prepositions. This means we ignored the word sense ambiguity of a word'or a preposition in this algorithm. A new method to treat this problem might be needed.</Paragraph>
      <Paragraph position="8"> (c) Combination with other clues of disambiguation It is already known that ontological knowledge is not the only clue to settle ambiguities. There are the problems related with context, discourse, situation etc.. We want to weave these problems into our algorithm. It also has to be noted that rather local, structural preferential clues may help disambiguations. \[Wilks, 1985\] (d) Word distance Though we currently assume that the semantic distances of words are given in the form of single numbers, our research group is now planning to extend to cover multi-dimensional aspects of word meanings. This extension may introduce another complication in our algorithm.</Paragraph>
      <Paragraph position="9"> (e) Form of collocation In the current algorithm, semantic collocations are represented in the form of a triplet (tuple). However, each tuple expresses only a collocation between two words. This is not sufficient for treating relationships among several words, such as subcategorization frames of predicates, knowledge frames, etc. In order to treat such multi-word collocations, we may have to treat co-occurrences of triplets in similar fashions to how we treat co-occurrences of words.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="109" end_page="109" type="metho">
    <SectionTitle>
6 Future Directions
</SectionTitle>
    <Paragraph position="0"> Besides planning to resolve the problems written above, there are some other ideas for extending our project.</Paragraph>
    <Paragraph position="1"> Some of them are really stimulating.</Paragraph>
    <Paragraph position="2"> (a) More experiments Though the results of the two preliminary experiments look promising, we have to conduct more experiments using another real corpora before claiming that the algorithm is effective.</Paragraph>
    <Paragraph position="3"> (b) Extension to Machine Translation Though the algorithm in its present form is designed to acquire monolingual knowledge, we are planning to develop it for acquiring &amp;quot;knowledge&amp;quot; for translation. null If &amp;quot;semantic&amp;quot; collocations discovered by the algorithm reflect the domain ontology, the collocations in two languages (and the semantic classes of words to be produced based on the collocations) are expected to be similar in the sense that their correspondence is rather straightforward.</Paragraph>
    <Paragraph position="4"> Experience in MT research, however, generally indicates the opposite, i.e. monolingual regularities and bilingual regularities are sometimes orthogonal and the correspondences of two languages are not so straightforward.</Paragraph>
    <Paragraph position="5"> These two rather contradicting predictions (and experiences) have to be consolidated through actual experiments.</Paragraph>
    <Paragraph position="6">  (c) Incremental learning system  We don't need to distinguish the knowledge acquisition phase from the phase of using it in actual application systems. It is possible to acquire knowledge and exploit it at the same time.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML