File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-1008_metho.xml

Size: 18,865 bytes

Last Modified: 2025-10-06 14:10:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1008">
  <Title>Acquiring Inference Rules with TemporalConstraints by Using Japanese Coordinated Sentences and Noun-Verb Co-occurrences</Title>
  <Section position="3" start_page="57" end_page="59" type="metho">
    <SectionTitle>
2 Algorithmwitha Simplied Score
</SectionTitle>
    <Paragraph position="0"> In the following, we begin by providing an overview of our algorithm. We specify the basic steps in the algorithm and the form of the rules to be acquired. We also examine the direction of implications andtemporal ordering described by the rules. After that, we describe a simplied version of the scoring function that ouralgorithm uses and then discuss a problem related to it. Thebias mechanism, which wementioned in the introduction, is described in the section after that.</Paragraph>
    <Section position="1" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
2.1 Procedure andGenerated Inference Rules
</SectionTitle>
      <Paragraph position="0"> Our algorithm is given a noun as its input and produces a set of inference rules. A produced rule expresses an implication relation between two descriptions including the noun. Our basic assumptions for the acquisition can be stated as follows.</Paragraph>
      <Paragraph position="1"> * If verbs v1 and v2 frequently co-occur in coordinatedsentences, theverbsrefer totwoevents that actually frequently co-occur in the real world, andasentence including v1 andanother sentence including v2 are good candidates to be descriptions that have an implication relation and a particular temporal order between them.</Paragraph>
      <Paragraph position="2"> * The above tendency becomes stronger when the verbs frequently co-occur with a given noun n; i.e., if v1 and v2 frequently co-occur in coordinated sentences andthe verbs also frequently co-occur with a nounn, a sentence including v1 and n and another sentence including v2 and n are good candidates to be descriptions that have an implication relation between them.</Paragraph>
      <Paragraph position="3"> Our procedure consists of the following steps.</Paragraph>
      <Paragraph position="4"> Step1 Select M verbs that take a given noun n as their argument most frequently.</Paragraph>
      <Paragraph position="5"> Step2 For each possible pair of the selected verbs, compute the value of a scoring function that embodies our assumptions, and select the N verb pairs that have the largest score values. Note thatweexcludethecombinationofthesameverb fromthe pairs to be considered.</Paragraph>
      <Paragraph position="6"> Step3 If the score value fora verb pair is higher than athreshold th andtheverbstake nastheir syntactic objects, generate an inference rule from the verb pair and the noun.</Paragraph>
      <Paragraph position="7"> Note that we used 500 as the value of M. N was set to 4 and th was set to various values during our experiments. Another important point is that, in Step 3, the argument positions at which the given noun can appear is restricted to syntactic objects. This was because we empirically found that the rules generated fromsuch verb-nounpairs were relatively accurate.</Paragraph>
      <Paragraph position="8"> Assume that a given noun is goods and the verb pair sell and manufacture is selected in Step 3.</Paragraph>
      <Paragraph position="9"> Then, the following rule is generated.</Paragraph>
      <Paragraph position="10"> * If someone sells goods, usually someone manufactures the goods at the same time as or before the event of the selling of the goods.</Paragraph>
      <Paragraph position="11"> Although the word someone occurs twice, we do not demand that it refers to the same person in both instances. It just works as a placeholder. Also note that the adverb usually1 was inserted to prevent the rule frombeingregarded as invalid byconsidering situationsthat arelogically possible butunlikely inpractice. null The above rule is produced when manufacture and sell frequently co-occur in coordinated sentences such as The company manufactured goods and it sold them. One might be puzzled because the order of the occurrences of the verbs in the coordinated sentences isreversed in therule. Theverbsell in the second(embedded) sentence/clause in the coordinated sentence appears as a verb in the precondition of the rule, while manufacture in the rst (embedded) sentence/clause is the verb in the consequence. A question then, is why we chose such an order, or such a direction of implication. There is another possibility, which might seem more straightforward.</Paragraph>
      <Paragraph position="12"> From the same coordinated sentences, we could produce the rule where the direction is reversed; i.e,., If someone manufactures goods, usually someone sells  the goodsat the same time as or after the manufacturing. The difference is that the rules generated by our procedure basically infer a past event from another event, while the rules with the opposite direction have to predict a future event. In experiments using ourdevelopment set, we observed that the rules predicting future events were often unacceptable because of the uncertainty that weusually encounter inpredictingthe future or achieving a future goal. For instance, people might do something (e.g., manufacturing) with an intention to achieve some other goal (e.g., selling) in thefuture. Butthey sometimes fail to achieve their future goal for some reason. Some manufactured goods are never sold because, forinstance, they are notgood enough. In our experiments, we found that the precision rates of the rules with the direction we adopted were much higher than those of the rules with the opposite direction.</Paragraph>
    </Section>
    <Section position="2" start_page="58" end_page="58" type="sub_section">
      <SectionTitle>
2.2 SimpliedScoring Function
</SectionTitle>
      <Paragraph position="0"> To be precise, a rule generated by our method has the following form, where vpre and vcon are verbs and n is a given noun.</Paragraph>
      <Paragraph position="1"> * If someonevpre n, usually someonevcon thenat the same time as or before the vpre-ing of the n. We assume that all three occurrences of nounn in the rule refer to the same entity.</Paragraph>
      <Paragraph position="2"> Now, we dene a simplied version of our scoring function as follows.</Paragraph>
      <Paragraph position="4"> Here, Pcoord(vcon,vpre) is the probability that vcon and vpre are observed in coordinated sentences in a way that the event described by vcon temporally precedes or occurs at the same time as the event described by vpre. (More precisely, vcon and vpre must be the main verbs of two conjuncts S1 and S2 in a Japanese coordinated sentence that is literally translated to the form S1 and S2.) This means that in the coordinated sentences, vcon appears rst and vpre second. Pargprime(n|vpre)andParg(n|vcon)aretheconditional probabilities that noccupies the argument positionsargprime ofvpre andarg ofvcon, respectively. At the beginning, as possible argument positions, we specied ve argument positions, including the syntactic object and the subject. Note that when vpre and vcon frequently co-occur in coordinated sentences and n often becomes arguments of vpre and vcon, the score has a large value. This means that the score embodies our assumptions for acquiring rules.</Paragraph>
      <Paragraph position="5"> Theterm Pcoord(vcon,vpre)Pargprime(n|vpre)Parg(n|vcon) in BasicS is actually an approximation of the probability P(vpre,argprime,n,vcon,arg,n) that we will observe the coordinated sentences such that the two sentences/clauses in the coordinated sentence are headed by vpre and vcon and n occupies the argument positions argprime of vpre and arg of vcon. Another important point is that the score is divided byP(n)2. This is because theprobabilities such asParg(n|vcon)tendtobe large for a frequently observed noun n. The division by P(n)2 is done to cancel such a tendency. This division does not affect the ranking for the same noun, but, since we give a uniform threshold for selecting the verb pairs for distinct nouns, such normalization isdesirable, asweconrmed inexperiments usingour development set.</Paragraph>
    </Section>
    <Section position="3" start_page="58" end_page="59" type="sub_section">
      <SectionTitle>
2.3 Paraphrases andCoordinated Sentences
</SectionTitle>
      <Paragraph position="0"> Thus, we have dened our algorithm and a simplied scoring function. Now let us discuss a problem that is caused by the scoring function.</Paragraph>
      <Paragraph position="1"> As mentioned in the introduction, a large portion of the acquired rules actually consists of paraphrases. Here, by a paraphrase, we mean a rule consisting of two descriptions referring to an identical event. The following example is an English translation of such paraphrases obtained by our method. We think this rule is acceptable. Note that we invented a new English verb clearly-write as a translation of a Japanese verbmeiki-suruwhile write is a translation of another Japanese verb kaku.</Paragraph>
      <Paragraph position="2"> * If someone clearly-writes a phone number, usually someone writes the phone number at the same time as or before the clearly-writing of the phonenumber.</Paragraph>
      <Paragraph position="3"> Note that clearly-write and write have almost the same meaning but the former is often used in texts related to legal matters. Evidently, in the above rule, clearly-write and write describe the same event, and it can be seen as a paraphrase. There are two types ofcoordinated sentence that ourmethodcan use as clues to generate the rule.</Paragraph>
      <Paragraph position="4"> * He clearly-wrote a phone number and wrote the phonenumber.</Paragraph>
      <Paragraph position="5"> * Heclearly-wrote aphonenumber, andalsowrote an address.</Paragraph>
      <Paragraph position="6"> The rst sentence is more similar to the inference rule than the second in the sense that the two verbs  share the same object. However, it is ridiculous because it describes the same event twice. Such a sentence is not observed frequently in corpora, and will not be used as clues to generate rules in practice. On the other hand, we frequently observe sentences of the second type in corpora, and our method generates the paraphrases from the verb-verb co-occurrences taken from such sentences. However, there is a mismatch between the sentence and the acquired rule in the sense that the rule describes two events related to the same object (i.e., a phone number), while the above sentence describes two events that are related to distinct objects (i.e., a phone number and an address). Regarding this mismatch, two questions need to be addressed.</Paragraph>
      <Paragraph position="7"> The rst question is why our method can acquire the rule despite the mismatch. The answer is that ourmethodobtains theverb-verb co-occurrence probabilities (Pcoord(vcon,vpre)) and the verb-noun co-occurrence probabilities (e.g.,Parg(n|vcon)) independently, and that the method does not check whether the two verbs share an argument.</Paragraph>
      <Paragraph position="8"> Then the next question is why our method can acquire accurate paraphrases from such coordinated sentences. Though we do not have a denite answer now, ourhypothesis is related to the strategy that people adoptin writing coordinated sentences. When two similar but distinct events, which can be described by the same verb,occur successively or at the same time, people avoid repeating the same verb to describe the two events in a single sentence. Instead they try to use distinct verbs that have similar meanings. Suppose that a person wrote his name and address. To report what she did, she may write I clearly-wrote my name and also wrote my address but will seldom writeIclearly-wrote mynameandalsoclearly-wrote my address. Thus, we can expect to be able to nd in coordinated sentences a large number of verb pairs consisting of two verbs with similar meanings. Note that our method tends to produce two verbs that frequently co-occur withagiven noun. Thisalso helpsto produce the inference rules consisting of two semantically similar verbs.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="59" end_page="60" type="metho">
    <SectionTitle>
3 BiasMechanism
</SectionTitle>
    <Paragraph position="0"> We now describe a bias used in our full scoring function, which signicantly improves the precision. The full scoring function is dened as</Paragraph>
    <Paragraph position="2"> The bias is denoted as Parg(vcon), which is the probability that we can observe the verb vcon, which is the verb in the consequence of the rule, and its argument position arg is occupied by a noun, no matter which nounactually occupies the position.</Paragraph>
    <Paragraph position="3"> An intuitive explanation of the assumption behind this bias is that as the situation within which the descriptionoftheconsequence inaruleisvalid becomes wider, the rule becomes more likely to be a proper one. Consider the following rules.</Paragraph>
    <Paragraph position="4"> * If someone demands a compensation payment, someone orders the compensation payment.</Paragraph>
    <Paragraph position="5"> * If someone demands a compensation payment, someone requests the compensation payment.</Paragraph>
    <Paragraph position="6"> Weconsider therst rule to beunacceptable while the secondexpresses aproper implication. Thedifference is the situations in which the descriptions in the consequences hold. In our view, the situations described by order are more specic than those referred to by request. In other words, order holds in a smaller range of situations than request. Requesting something can happen in any situations where there exists someone who can demand something, but ordering can occur only ina situationswheresomeonein aparticular social positioncandemandsomething. Thebasic assumption behind our bias is that rules with consequences that can be valid in a wider range of situations, such as requesting a compensation payment, are more likely to be proper ones than the rules with consequences that hold in a smaller range of situations, such as ordering a compensation payment.</Paragraph>
    <Paragraph position="7"> Thebias Parg(vcon)wasintroduced tocapture variationsofthesituations in which event descriptions are valid. Weassume that frequently observed verbsform generic descriptions that can be valid within a wide range of events, while less frequent verbs tend to describe events that can occur in a narrower rangeof situations and form more specic descriptions than the frequently observed verbs. Regarding the requestorder example, (a Japanese translation of) request is observed more frequently than (a Japanese translationof)order incorporaandthis observation isconsistent with our assumption. A similar idea by Geffet and Dagan (Geffet and Dagan, 2005) was proposed forcapturing lexical entailment. Thedifference is that they relied on word co-occurrences rather than the frequency of words to measure the specicity of the semantic contents of lexical descriptions, and needed Web search to avoid data sparseness in co-occurrence  statistics. On the other hand, our method needs only simpleoccurrence probabilities ofsingleverbsandwe expect our method to be applicable to wider vocabulary than Geffet and Dagans method.</Paragraph>
    <Paragraph position="8"> The following is a more mathematical justication for the bias. According to the following discussion, Parg(vcon) can be seen as a metric indicating how easily we can establish an interpretation of the rule, which is formalized as a mapping between events. In our view, if we can establish the mapping easily, the rule tendstobeacceptable. Thediscussion starts from a formalization of an interpretation of an inference rule. Consider the rule If exp1 occurs, usually exp2 occurs at the same time or before the occurrence of exp1, where exp1 and exp2 are natural language expressions referring to events. In the following, we call such expressions event descriptions and distinguish them from an actual event referred to by the expressions. An actual event is called an event instance.A possible interpretation of the rule is that, for any event instance e1 that can be described by the event description exp1 in the precondition of the rule, there always exists an event instance e2 that can be described by the event description exp2 in the consequence and that occurs at the same time as or before e1 occurs. Let us write e : exp if event instance e can be described by event description exp. The above interpretation can then be represented by the formula</Paragraph>
    <Paragraph position="10"> Here, the mapping f represents a temporal relation betweenevents, andtheformulae2 = f(e1)expresses that e2 occurs at the same time as or beforee1.</Paragraph>
    <Paragraph position="11"> The bias Parg(vcon) can be considered (an approximation of) a parameter required for computing the probability that a mapping frandom satises the requirements for f in Ph when we randomly construct frandom. The probability is denoted as P{e2 : exp2 [?]</Paragraph>
    <Paragraph position="13"> the number of events describable by exp1. We assume that the larger this probability is, the more easily we can establish f. We can approximate P{e2 :</Paragraph>
    <Paragraph position="15"> observing that theprobabilistic variables e1 ande2 are independent since frandom associates them in a completely random manner and by 2) assuming that the occurrence probability of the event instances describable by exp2 can be approximated by the probability that exp2 is observed in text corpora. This means that P(exp2) is one of the metrics indicating how easily we can establish the mapping f in Ph.</Paragraph>
    <Paragraph position="16"> Then, the next question is what kind of expressions should be regarded as the event description exp2. A primary candidate will be the whole sentence appearingin theconsequence part oftherule to beproduced.</Paragraph>
    <Paragraph position="17"> Since we specify only a verb vcon and its argument n in the consequence in a rule, P(exp2) can be denoted by Parg(n,vcon), which is the probability that we observe the expression such that vcon is a head verb and noccupies an argument position arg ofvcon. Bymultiplying this probability to BasicS as a bias, we obtain the following scoring function.</Paragraph>
    <Paragraph position="19"> In our experiments, though, this score did not work well. Since Parg(n,vcon) often has a small value, the problem of data sparseness seems to arise. Then, we used Parg(vcon), which denotes the probability of observing sentences that contain vcon and its argument position arg, no matter which noun occupies arg, instead of Parg(n,vcon). We multiplied the probability to BasicS as a bias and obtained the following score, which is actually the scoring function we propose.</Paragraph>
    <Paragraph position="21"/>
  </Section>
class="xml-element"></Paper>
Download Original XML