XML Viewer - w00-0742

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0742_metho.xml
Size: 20,145 bytes
Last Modified: 2025-10-06 14:07:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0742">
  <Title>Inductive Logic Programming for Corpus-Based Acquisition of Semantic Lexicons</Title>
  <Section position="4" start_page="200" end_page="203" type="metho">
    <SectionTitle>
3 The machine learning method
</SectionTitle>
    <Paragraph position="0"> Trying to infer lexical semantic information from corpora is not new: lots of works have already been conducted on this subject, especially in the statistical learning domain (see (Grefenstette, 1994b), for e.g., or (Habert et al., 1997) and (Pichon and S~billot, 1997) for surveys of this field). Following Harris's framework (Harris et al., 1989), such research tries to extract both syntagmatic and paradigmatic information, respectively studying the words that appear in the same window-based or syntactic contexts as a considered lexical unit (first order word affinities (Grefenstette, 1994a)), or the words that generate the same contexts as the key word (second order word affinities). For example, (Briscoe and Carroll, 1997) and (Faure and N~dellec, 1999) try to automatically learn verbal argument structures and selectional restrictions; (Agarwal, 1995) and (Bouaud et al., 1997) build semantic classes; (Hearst, 1992) and (Morin, 1997) focus on particular lexical relations, like hyperonymy. Some of these works are concerned with automatically obtaining more complete lexical semantic representations ((Grefenstette, 1994b; Pichon and S~billot, 1999). Among these studies, (Pustejovsky et al., 1993) presents a research whose aim is to acquire GL nominal qualia structures from a corpus; this work is however quite different from ours because it supposes that the qualia structure contents are initialized and are only refined with the help of the corpus by using the type coercion 3 mechanism.</Paragraph>
    <Paragraph position="1"> In order to automatically acquire N-V pairs whose elements are linked by one of the semantic relations defined in the qualia structure in GL, we have decided to use a machine learning method. This section is devoted to the explanation of this choice and to the description of the method that we have developed.</Paragraph>
    <Paragraph position="2"> Machine learning aims at automatically building programs from examples that are known to be positive or negative examples of their runnings. According to Mitchell (Mitchell, 1997), &amp;quot;a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improve with experience E&amp;quot;.</Paragraph>
    <Paragraph position="3"> Among different machine learning techniques, we have chosen the Inductive Logic Programming framework (ILP) (Muggleton and De-Raedt, 1994) to learn from a textual corpus N-V pairs that are related in terms of one of the relations defined in the qualia structure in GL.</Paragraph>
    <Paragraph position="4"> Programs that are infered from a set of facts and a background knowledge are here logic programs, that is, sets of Horn clauses. In the ILP framework, the main idea is to obtain a set of generalized clauses that is sufficiently generic to cover the majority of the positive examples (E+), and sufficiently specific to rightly correspond to the concept we want to learn and to cover no (or a few - some noise can be allowed) negative example(s) (E-). For our experiment, 3A semantic operation that converts an argument to the type which is expected by a function, where it would otherwise result in a type error.</Paragraph>
    <Paragraph position="5">  we furnish a set of N-V pairs related by one of the qualia relations within a POS context (E+), and a set of N-V pairs that are not semantically linked (E-), and the method infers general rules (clauses) that explain these E +. This particular explanatory characteristic of ILP has motivated our choice: ILP does not just provide a predictor (this N-V pair is relevant, this one is not) but also a data-based theory. Contrary to some statistical methods, it does not just give raw results but explains the concept that is learnt 4. We use Progol (Muggleton, 19915) for our project, Muggleton's ILP implementation that has already been proven well suited to deal with a big amount of data in multiple domains, and to lead to results comparable to other ILP implementations (Roberts et al., 1998).</Paragraph>
    <Paragraph position="6"> In this section we briefly describe the corpus on which our experiment has been conducted.</Paragraph>
    <Paragraph position="7"> We then explain the elaboration of E + and Efor Progol. We finally present the generalized clauses that we obtain. The validation of the method is detailed in section 4.</Paragraph>
    <Section position="1" start_page="201" end_page="201" type="sub_section">
      <SectionTitle>
3.1 The corpus
</SectionTitle>
      <Paragraph position="0"> The French corpus used in this project is a 700 kBytes handbook of helicopter maintenance, given to us by MATRA CCR A@rospatiale, which contains more than 104000 word occurrences 5. The MATRA CCR corpus has some special characteristics that are especially well suited for our task: it is coherent; it contains lots of concrete terms (screw, door, etc.) that are frequently used in sentences together with verbs indicating their telic (screws must be tightened, etc.) or agentive roles.</Paragraph>
      <Paragraph position="1"> This corpus has been POS-tagged with the help of annotation tools developed in the MUL-TEXT project (Armstrong, 1996); sentences and words are first segmented with MtSeg; words are analyzed and lemmatized with Mmorph (Petitpierre and Russell, 1998; Bouillon et al., 1998), and finally disambiguated by the Tatoo tool, a Hidden Markov Model tagger (Armstrong et al., 1995). Each word therefore only receive one POS-tag, with less than 2% of er4Learning with ILP has already been successfully used in natural language processing, for example in corpus POS-tagging (Cussens, 1996) or semantic interpretation (Mooney, 1999).</Paragraph>
      <Paragraph position="2"> 5104212 word occurrences.</Paragraph>
      <Paragraph position="3"> rors.</Paragraph>
    </Section>
    <Section position="2" start_page="201" end_page="202" type="sub_section">
      <SectionTitle>
3.2 Example construction
</SectionTitle>
      <Paragraph position="0"> The first task consists in building up E + and E- for Progol, in order for it to infer generalized clauses that explain what, in the POS context of N-V pairs, distinguishes the relevant pairs from the not relevant ones. Work has to be done to determine what is the most appropriate context for this task. We just present here the solution we have finally chosen. Section 4 describes methods and measures to evaluate the &amp;quot;quality&amp;quot; of the learning that enable us to choose between the different contextual possibilities. Here is our methodology for the construction of the examples.</Paragraph>
      <Paragraph position="1"> We first consider all the nouns of the MATRA CCR corpus. More precisely, we only deal with a 81314 word occurrence subcorpus of the MATRA CCR corpus, which is formed by all the sentences that contain at least one N and one V. This subcorpus contains 1489 different N (29633 noun occurrences) and 567 different V (9522 verb occurrences). For each N of this subcorpus, the 10 most strongly associated V, in terms of Chi-square, are selected. This first step both produces pairs that are really bound by one qualia relation ((dcrou, serrer)) 6 and pairs that are fully irrelevant ((roue, prescrire)) 7.</Paragraph>
      <Paragraph position="2"> Each pair is manually annotated as relevant or irrelevant according to Pustejovsky's qualia structure principles. A Perl program is then used to find the occurrences of these N-V pairs in the sentences of the corpus.</Paragraph>
      <Paragraph position="3"> For each occurrence of each pair that is supposed to be used to build one E +, that is for each of the previous pairs that has been globally annotated as relevant, a manual control has to be done to ensure that the N and the V really are in the expected relation within the studied sentence. After this control, a second Perl program automatically produces the E +. Here is the form of the positive examples: POSITiVE(category_before_N, category_after.N, category_before_V, V_type, distance, position).</Paragraph>
      <Paragraph position="4"> where V_type indicates if the V is an infinitive form, etc., distance corresponds to the number  of verbs between the N and the V, and position is POS (for positive) if the V appears before the N in the sentence, NEG if the N appears before the V.</Paragraph>
      <Paragraph position="5"> For example, POSITIVE(VRBINF, P_DE, VID, VRBINF~ 0, POS).</Paragraph>
      <Paragraph position="6"> means that a N-V pair, in which the N is surrounded with an infinitive verb on its left (VRBINF) and a preposition de s (P.DE) on its right, in which the V is preceded by nothing 9 (VID) 1deg and is an infinitive one (VRBINF), in which no verb exists between the N and the V (0), and in which the V appears before the N in the sentence (POS), is a relevant pair (for example, in ouvrir la porte de ...).</Paragraph>
      <Paragraph position="7"> The E- are elaborated in the same way than the E +, with the same Perl program. E- and E + forms are identical, except the presence of a sign :- before the predicate POSITIVE to denote aE-: :-POSITIVE (category_before.N, category_after_N, category_before_V, V_type, distance, position).</Paragraph>
      <Paragraph position="8"> These E- are automatically built from the previous highly correlated N-V pairs that have been manually annotated as irrelevant. For example, null :-POSITIVE(VID, P_PAR, NC, VRBPP, 0, NEG).</Paragraph>
      <Paragraph position="9"> means that a N-V pair, in which the N has nothing on its left (VID) and a preposition par n (P_PAR) on its right, in which the V is preceded by a noun (NC) and is a past participle (VRBPP), in which no verb exists between the N and the V (0), and in which the V appears after the N in the sentence (NEG), is an irrelevant pair (for example, in freinage par goupilles fendues).</Paragraph>
    </Section>
    <Section position="3" start_page="202" end_page="203" type="sub_section">
      <SectionTitle>
3.3 Learning with the help of Progol
</SectionTitle>
      <Paragraph position="0"> These E + and E- are then furnish to Progol in order for it to try to infer generalized clauses that explain the concept &amp;quot;qualia pair&amp;quot; versus &amp;quot;not qualia pair&amp;quot;. We do not discuss here either parameter setting that concerns the choice of the example POS context, or evaluation criteria; this discussion is postponed to next section; we simply present the learning method and the type of generalized clauses that we have obtained. null Some information have to be given to Progol for it to know what are the categories that can undergo a generalization. For example, if two E + are identical but possess different locative prepositions as second arguments (for eg. sur 12 and sous13), must Progol produce a generalization corresponding to the same clause except that the second argument is replaced by the general one: locative-preposition, or by a still more general one: preposition? The background knowledge used by Progol is knowledge on the domain. For example here, it contains the fact that a verb can be found in the corpus in an infinitive or a conjugated form,</Paragraph>
      <Paragraph position="2"> and that an infinitive form is denoted by the tag VERBINF, and a conjugated form by the tags  edge, learning can begun. The output of Progol is of two kinds: some clauses that have not at all been generalized (that is, some of the E+), and some generalized clauses; we call the set of these generalized clauses G, and it is this set G that interests us here. Here is an example of one of the generalized clauses that we have obtained</Paragraph>
      <Paragraph position="4"> which means that N-V pairs (i) in which the category before the N is a locative preposition (PREPOSITIONLIEU(A)), (ii) in which there is nothing after the N and before the V (VIDE(C) for the second and third arguments), (iii) in which the V is an infinitive one (VERBINF(D)), and (iv) in which there is no verb between the N and the V (proximity denoted by P:aEs(E)14), are relevant. No constraint is set on N/V order in the sentences.</Paragraph>
      <Paragraph position="5"> This generalized clause covers, for example, the following E+: POSITIVE(P_SUR, VID, VID, VERBINF, 0, POS).</Paragraph>
      <Paragraph position="6"> which corresponds to the relevant pair (prise, brancher) 15 that is detected in the corpus in the sentence &amp;quot;Brancher les connecteurs sur les prises ~lectriques.&amp;quot;.</Paragraph>
      <Paragraph position="7"> Some of the generalized clauses in G cover lots of E +, others far less. We now present a method to detect what the &amp;quot;good&amp;quot; c, lauses are, that is, the clauses that explain the concept that we want to learn, and a measure of the &amp;quot;quality&amp;quot; of the learning that has been conducted.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="203" end_page="205" type="metho">
    <SectionTitle>
4 Learning validation and results
</SectionTitle>
    <Paragraph position="0"> This section is dedicated to two aspects of the validation of our machine learning method.</Paragraph>
    <Paragraph position="1"> First we define the theoretical validation of the learning, that is, we focus on the determination of a means to detect what are the &amp;quot;good&amp;quot; generalized clauses, and of a measure of the quality of the concept learning; this parameter setting and evaluation criterion phase explains how we have chosen the precise POS context for N-V pairs in the E + and E- (as described in subsection 3.2): the six contextual elements in examples are the combination that leads to the best results in terms of the learning quality measure that we have chosen. The second step of the validation is the empirical one. We have applied the generalized clauses that have been selected to the Matra CCR corpus and haw~ evaluated the quality of the results in terms of pairs that are indicated relevant or not. Here are these two phases.</Paragraph>
    <Paragraph position="2"> 14Close(E).</Paragraph>
    <Paragraph position="3"> l~(plug, to plug in).</Paragraph>
    <Section position="1" start_page="203" end_page="204" type="sub_section">
      <SectionTitle>
4.1 Theoretical validation
</SectionTitle>
      <Paragraph position="0"> As we have previously noticed, among the generalized clauses produced from our E + and Eby Progol (set G), some of them cover a lot of E +, others only a few of them. What we want is to get a way to automatically find what are the generalized clauses that have to be kept in order to explain the concept we want to learn.</Paragraph>
      <Paragraph position="1"> We have first defined a measure of the theoretical generality of the clauses 16. The theoretical generality of a generalized clause is the number of not generalized clauses (E +) that this clause can cover. For example, both POSITIVE(P_AUTOURDE, VID, VID, VERBINF, 0, NEG).</Paragraph>
      <Paragraph position="2"> and POSITIVE(P_CHEZ, VID, VID, VERBINF, 0, POS).</Paragraph>
      <Paragraph position="3"> can be covered by clause (1) (cf. subsection 3.3). During the study of, for example, the distribution of the number of clauses in G on these different theoretical generality values, our &amp;quot;hope&amp;quot; is to obtain a gaussian-like graph in order to automatically select all the clauses present under the gaussian plot, or to calculate two thresholds that cover 95% of these clauses and to reject the other 5%. This distribution is however not a gaussian one.</Paragraph>
      <Paragraph position="4"> Our second try has not only concerned the theoretical coverage of G clauses but also their empirical coverage. This second measure that we have defined is the number of E + that are really covered by each clause of G. We then consider the distribution of the empirical coverage of G clauses on the theoretical coverages of these clauses, that is, we consider the graph in which, for each different theoretical measure value for G clauses, we draw a line whose length corresponds to the total number of E + covered by the G clauses that have this theoretical coverage value. Here two gaussians clearly appear (cf. figure 1), one for rather specific clauses and the other for more general ones. We have therefore decided to keep all the generalized clauses produced by Progol.</Paragraph>
      <Paragraph position="5"> 16We thank J. Nicolas, INRIA researcher at IRISA, for his help on this point.</Paragraph>
      <Paragraph position="6">  The second point concerns the determination of a measure of the quality of the learning for the parameter setting. We are especially interested in the percentage of E + that are covered by the generalized clauses, and if we permit some noise in Progol parameter adjustment to allow more generalizations, by the percentage of E- that are rejected by these generalized clauses. The measure of the recall and the precision rates of the learning method can be summarized in a Pearson coefficient:</Paragraph>
      <Paragraph position="8"> where A = actual, Pr = predicated, P -- positive, N= negative, T= true, F= false; the more close to 1 this value is, the better the learning is.</Paragraph>
      <Paragraph position="9"> The results for our learning method with a rate of Progol noise equal to 0 are the following: from the 4031 initial E + and the 6922 initial E-, the 109 generalized clauses produced by Progol cover 2485 E + and 0 E-; 1546 E + and 6922 Epositive examples on clauses are therefore uncovered; the value of the Pearson coefficient is 0.71. (NB: Figure 1 illustrates these results).</Paragraph>
      <Paragraph position="10"> We have developed a Perl program whose role is to find which Progol noise rate leads to the best results. This Progol noise rate is equal to 37. With this rate, the results are the following: from the 4031 initial E + and the 6922 initial E-, the 66 generalized clauses produced by Progol cover 3547 E + and 348 E-; 484 E + and 6574 Eare therefore uncovered; the value of the Pearson coefficient is 0.84. The stability of the set of learnt generalized clauses has been tested.</Paragraph>
    </Section>
    <Section position="2" start_page="204" end_page="205" type="sub_section">
      <SectionTitle>
4.2 Empirical validation
</SectionTitle>
      <Paragraph position="0"> In order to evaluate the empirical validity of our learning method, we have applied the 66 generalized clauses to the Matra CCR corpus and have studied the appropriateness of the pairs that are stated relevant or irrelevant by them.</Paragraph>
      <Paragraph position="1"> Of course, it is impossible to test all the N-V combinations present in such a corpus. Our evaluation has focussed on some of the signif- null icant nouns of the domain.</Paragraph>
      <Paragraph position="2"> A Perl program presents to one expert all the N-V pairs that appear in one sentence in a part of the corpus and include one of the studied nouns. The expert manually tags each pair as relevant or not. This tagging is then compared to the results obtained for these N-V pairs of the same part of the corpus by the application of the generalized clauses learnt wit\]h Progol. The results for seven significant nouns (vis, @crou, porte, voyant, prise, capot, bouchon) 17 are presented in table 1. In the left column, one N-V pair is considered as tagged &amp;quot;relevant&amp;quot; by the generalized clauses if at least one of them covers this pair; in the right one, at least six different clauses of G must cover a pair for it to be said correctly detected by the generalized clauses; the aim of this second test is to reduce noise in the results.</Paragraph>
      <Paragraph position="3">  The results are quite promising, especially if we compare them to those obtain by Chi-square correlation (cf. table 2). This comparison is interesting because Chi-square is the first step of our selection of N-V couples in the corpus (cf. subsection 3.2).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML