File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/w93-0107_metho.xml

Size: 19,462 bytes

Last Modified: 2025-10-06 14:13:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="W93-0107">
  <Title>HIERARCHICAL CLUSTERING OF VERBS</Title>
  <Section position="3" start_page="70" end_page="71" type="metho">
    <SectionTitle>
2. CIAULAI: An algorithm to
</SectionTitle>
    <Paragraph position="0"> acquire word clusters Incremental example-based learning algorithms, like COBWEB Fisher (1987), seem more adequate than other Machine Learning and Statistical methods to the task of acquiring word taxonomies from corpora. COBWEB has several desirable features: a) Incrementality, since whenever new data are available, the system updates its classification; b) A formal description of the acquired clusters; c) The notion of category utility, used to select among competing classifications. b) and e) are particularly relevant to our linguistic problem, as remarked in the Introduction.</Paragraph>
    <Paragraph position="1"> On the other side, applying COBWEB to verb classification is not straightforward. First, there is a knowledge representation problem, that is common to most Machine Learning algorithms: Input instances must be pre-coded (manually) using a feature-I Ciaula stands for Concept formation Algorithm Used for Language Acquisition, and has been inspired by the tale &amp;quot;Ciaula scopre la luna&amp;quot; by Luigi Pirandello (1922).  vector like representation. This limited the use of such algorithms in many real world problems. In the specific case we are analyzing, a manual codification of verb instances is not realistic on a large scale. Second, the algorithm does not distinguish multiple usages of the same verb, nor different verbs that are found with the same pattern of use, since different instances with the same feature vector are taken as identical. The motivation is that concept formation algorithms as COBWEB assume the input information as being stable, unambiguous, and complete. At the opposite, our data do not exhibit a stable behaviour, they are ambiguous, incomplete, and possibly misleading, since errors in codification of verb instances may well be possible.</Paragraph>
    <Paragraph position="2"> In the following sections we will discuss the methods by which we attempted to overcome these obstacles.</Paragraph>
    <Section position="1" start_page="71" end_page="71" type="sub_section">
      <SectionTitle>
2.1 Representing verb instances
</SectionTitle>
      <Paragraph position="0"> This section describes the formal representation of verb instances and verb clusters in CIAULA.</Paragraph>
      <Paragraph position="1"> Verb usages input to the clustering algorithm are represented by their thematic roles, acquired semi-automatically from corpora by a process that has been described in Basili, (1992a), (1992b), (in press). In short, sentences including verbs are processed as follows: First, a (general-purpose) morphologic and a partial syntactic analyzer Basili, (1992b) extracts from the sentences in the corpus all the elementary syntactic relations (esl) in which a word participates. Syntactic relations are word pairs and triples augmented with a syntactic information, e.g. for the verb to carry: N_V(</Paragraph>
      <Paragraph position="3"> with;truck), etc.</Paragraph>
      <Paragraph position="4"> Each syntactic relation is stored with its frequency of occurrence in the corpus. Ambiguous relations are weighted by a 1/k factor, where k is the number of competing esl in a sentence.</Paragraph>
      <Paragraph position="5"> Second, the verb arguments are tagged by hand using 10-12 &amp;quot;naive&amp;quot; conceptual types (semantic tags), such as: ACT, PLACE, HUMAN_ENTITY, GOOD, etc.</Paragraph>
      <Paragraph position="6"> Conceptual types are not the same for every domain, even though the commercial and legal domains have many common types.</Paragraph>
      <Paragraph position="7"> Syntactic relations between words are validated in terms of semantic relations between word classes using a set of semi-automatically acquired selectional rules Basili, (1992a). For example, V_prep_N(carry,with,truck) is accepted as an istance of the high-level selectional rule  \[ACT\]--&gt;(INSTRUMENT)&gt;\[MACHINE\]. The relation: \[carry\]&gt;(INSTRUMENT)-&gt;\[truck\] is acquired as part of the argument structure of the verb to carry. In other published papers we demonstrated that the use of semantic tags greatly increase the statistical stabifity of the data, and add predictive power to the acquired information on word usages, at the price of a limited manual work (the semantic tagging).</Paragraph>
      <Paragraph position="8"> For the purpose of this paper, the interesting aspect is that single instances of verb usages (local 2 meanings) are validated on the basis of a global analysis of the corpus. This considerably reduces (though does not eliminate) the presence of erroneous instances.</Paragraph>
      <Paragraph position="9"> The detected thematic roles of a verb v in a sentence are represented by the featurevector: null</Paragraph>
      <Paragraph position="11"> where Rit are the thematic roles (AGENT, INSTRUMENT etc.) 3 and Catjt are the conceptual types of the words to which v is related semantically. For example, the 2 i.e. meanings that are completely described within a single sentence of the corpus</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="71" end_page="74" type="metho">
    <SectionTitle>
3 The roles used are an extension of Sowa's
</SectionTitle>
    <Paragraph position="0"> conceptual relations \[Sowa 1984\]. Details on the set of conceptual relations used and a corpus-based method to select a domain-approprime set, are provided in other papers.</Paragraph>
    <Paragraph position="1">  following sentence in the commercial domain: &amp;quot;... ia ditta produce beni di consumo con macchinari elettromeccanici..&amp;quot; &amp;quot;... the company produces goods with electroraechanical machines..&amp;quot; originates the instance: produce/(AGENT:HUMAN~ENTITY, OBJECT:GOODS, INSTRUMENT:MACHINE) Configurations in which words of the same conceptual type play the same roles are strong suggestion of semantic similarity between the related events. The categorisation process must capture this similarity among local meanings of verbs. The representation of verb clusters follows the scheme adopted in COBWEB. Each target class is represented by the probability that its members (i.e. verbs) are seen with a set of typical roles. Given the set {Ri}i~I of thematic roles and the set {Catj }je j of conceptual types, a target class ~ for our clustering system is given by the following (2) cE = &lt; cog,, \[x\]ij, Vc E, S~ &gt; or equivalently by (2)' &lt; c, \[x\]ij, V, S &gt; A class is represented in COBWEB by the matrix \[x\]ij, showing the distribution of probability among relations (Ri) and conceptual types (Cat j). The additional parameters V~,and cog, are introduced to account for multiple instances of the same verb in a class, c~ is the cardinality (i.e. the number of different instance members of cE), and V~ is the set of pairs &lt;v, v#&gt; such that it exists at least one instance v / (Ri:Caztj) classified in ~, and v# is the number of such instances.</Paragraph>
    <Paragraph position="2"> Finally, S~,iS the set of CEsubtypes. The definitions of the empty class (3.1) and of the top node of the taxonomy (3.2) follows</Paragraph>
    <Paragraph position="4"> with xij=0 for each i,j (3.2) &lt;Ntot, \[x\]ij, V, S &gt; where Ntot is the number of available instances in the corpus, V is the set of verbs with their absolute occurrences. An excerpt of a class acquired from the legal domain is showed in Fig. 1. The semantic types used in this domain are listed in the figure.</Paragraph>
    <Paragraph position="5"> Special type of classes are those in which only a verb has been classified, that we will call singleton classes. A singleton class is a class cE=&lt;c,\[x\]ij,V,S&gt; for which card(V)= 1. It will be denoted by { v } where v is the only member of (whatever its occurrences) ~. For a singleton class it is clearly true that S={0}. Note that a singleton class is different from an instance because any number of instances of the verb v can be classified in {v }.</Paragraph>
    <Section position="1" start_page="72" end_page="74" type="sub_section">
      <SectionTitle>
2.2 Measuring the utility of a
</SectionTitle>
      <Paragraph position="0"> classification As remarked in the introduction, a useful property of concept formation algorithms, with respect to agglomerate statistical approaches, is the use of formal methods that guide the classification choices.</Paragraph>
      <Paragraph position="1"> Quantitative approaches to model human choices in categorisation have been adopted in psychological models of conceptual development. In her seminal work, Rosch (1976) introduced a metrics of preference, the category cue validity, expressed by the sum of expectations of observing some feature in the class members. This value is maximum for the so-called basic level categories. A later development, used in COBWEB, introduces the notion of category utility, derived from the application of the Bayes law to the expression of the predictive power of a given classification. Given a classification  into K classes, the category utility is given by:</Paragraph>
      <Paragraph position="3"> In COBWEB, a hill climbing algorithm is defined to maximize the category utility of a resulting classification. The following expression is used to discriminate among conflicting clusters:</Paragraph>
      <Paragraph position="5"> The clusters that maximize the above quantity provide the system with the capability of deriving the best predictive taxonomy with respect to the set of i  The notion of category utility adopted in COBWEB, however, does not fully cope with our linguistic problem. As remarked in the previous section, multiple instances of the same entity are not considered in COBWEB. In order to account for multiple instances of a verb, we introduced the notion of mnemonic inertia. The mnemonic inertia models an inertial trend attracting a new instance of an already classified verb in the class where it was previously classified.</Paragraph>
      <Paragraph position="6"> Given the incoming instance v / (Ri:Ca, tj) and a current classification in the set of classes ~, for each k the mnemonic inertia is modelled by:</Paragraph>
      <Paragraph position="8"> where #v is the number of instances of the verb v already classified in 5~: and Ck is the cardinality of c~:.</Paragraph>
      <Paragraph position="9"> (6) expresses a fuzzy membership of v to the class 5~k. The more instances of v are classified into 5fk, the more future observations of v will be attracted by ~. A suitable combination of the mnemonic  inertia and the category utility provides our system with generalization capabilities along with the &amp;quot;conservative&amp;quot; policy of leaving different verb instances separate. The desired effect within the data is that slightly different usages of a verb are classified in the same cluster, while remarkable differences result in different classifications.</Paragraph>
      <Paragraph position="10"> The global measure of category utility, used by the CIAULA algorithm during classification, can now be defined. Let v / (Ri:Catj) be the incoming instance, 56k be the set of classes, and let cu(v,k) be the category utility as defined in (5), the measure It, given by</Paragraph>
      <Paragraph position="12"> expresses the global utility of the classification obtained by assigning the instance v to the class ~C/k. (7) is a distance metrics among instances and classes.</Paragraph>
    </Section>
    <Section position="2" start_page="74" end_page="74" type="sub_section">
      <SectionTitle>
2.3 The incremental clustering
</SectionTitle>
      <Paragraph position="0"> algorithm.</Paragraph>
      <Paragraph position="1"> The algorithm for the incremental clustering of verb instances follows the approach used in COBWEB. Given a new incoming instance I and a current valid classification {5~k}ke K, the system evaluates the utility of the new classification obtained by inserting I in each class. The maximum utility value corresponds to the best predictive configuration of classes. A further attempt is made to change the current configuration (introducing a new class, merging the two best candidate for the classification or splitting the best classes in the set of its son) to improve the predictivity. The main difference with respect to COBWEB, due to the linguistic nature of the problem at hand, concern the procedure to evaluate the utility of a temporary classification and the MERGE operator, as it applies to singleton classes. The description of the algorithm is given in Appendix 1. Auxiliary procedures are omitted for brevity.</Paragraph>
      <Paragraph position="2"> According to (7), the procedure G_UTILITY(x, I, ~, ..~, v) evaluates the utility of the classification as a combination of the category utility and the inertial factor introduced in (6). Current values experimented for v are 0.90-0.75.</Paragraph>
      <Paragraph position="3"> Figure 2 shows the difference between the standard MERGE operation, identical to that used in COBWEB, and the elementary MERGE between two singleton classes, as defined in CIAULA.</Paragraph>
      <Paragraph position="4"> - Fig. 2: Merge (a) vs. Elementary Merge (b) -</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="74" end_page="76" type="metho">
    <SectionTitle>
3. Experimental Results.
</SectionTitle>
    <Paragraph position="0"> The algorithm has been experimented on two corpora of about 500,000 words each, a legal and a commercial domain, that exhibit very different linguistic styles and verb usages. Only verbs for which at least 65 instances in each corpus have been considered, in order to further reduce parsing errors. Notice however that the use of semantic tags in corpus parsing reduces considerably the noise, with respect to other corpus-based approaches.</Paragraph>
    <Paragraph position="1">  In the first experiment, CIAULA classifies 3325 examples of 371 verbs, from the legal corpus. In the second, it receives 1296 examples of 41 verbs from the commercial corpus. Upon a careful analysis of the clusters obtained from each domain, the resulting classifications were judged quite expressive, and semantically biased from the target linguistic domains, a part from some noise due to wrong semantic interpretation of elementary syntactic structures Basili et al., (1992a). However, the granularity of the description of the final taxonomy is too fine, to be usefully imported in the type hierarchy of a NLP system. Furthermore, the order of presentation of the different examples strongly influences the final result 4. In order to derive reliable results we must find some invariant with respect to the presentation order. An additional requirement is to define some objective measure of the quality of the acquired classification, other than the personal judgement of the authors.</Paragraph>
    <Paragraph position="2"> In this section we define a measure of the class informative power, able to capture the most relevant levels of the hierarchy. The idea is to extract from the hierarchy the basic level classes, or classes that are repository of the most relevant lexical information about their members. We define basic level classes of the classification those bringing most predictive and stable information with respect to the presentation order.</Paragraph>
    <Paragraph position="3"> The notion of basic level classes has been introduced in Rosch (1978). She experimentally demonstrated that some conceptual categories are more meaningful than others as for the quantity of information they bring about their members. Membership to such classes implies a grater number of attributes to be inherited by instances of the domain. These classes appear at the intermediate levels of a taxonomy: for example within the vague notion of animal, classes such dog or cat 4 This is an inherent problem with concept formation algorithms seem to concentrate the major part of information about their members, with respect for example to the class of mammals Lakoff (1987).</Paragraph>
    <Paragraph position="4"> But what is a basic-level class for verbs? A formal definition for these more representative classes, able to guide the intuition of the linguist in the categorisation activity has been attempted, and will be discussed in the next section.</Paragraph>
    <Paragraph position="5"> 3.1. Basic level categories of verbs.</Paragraph>
    <Paragraph position="6"> The information conveyed by the derived clusters, c~=&lt;c,\[x\]ij,V,S&gt;, is in the distributions of the matrices \[x\]ij, and in the set V. Two examples may be helpful at distinguishing classes that are more selective, from other more vague clusters. Let C~degl be a singleton class, with WI=&lt;I,\[xl\],VI,{O}&gt;. This clearly implies that \[xl\] is binary. This class is highly typical, as it is strongly characterized by its only instance, but it has no generalization power. Given, for example, a class qbdegl=&lt;10,\[x2\],V2,S&gt; for which the cardinality of a V2 is 10, and let \[x2\] be such that for each couple &lt;ij&gt; for which x2ij~0, it follows x2i'=I/10j . This class is scarcely typical but has a strong generalization power, as it clusters verbs that show no overlaps between the thematic roles they are represented by. We can say that typicality is signaled by high values of roles-types probabilities (i.e.</Paragraph>
    <Paragraph position="7"> xij=prob((Ri:Catj) I c g) ), while the generalization power to of a class W=&lt;c,\[x\]ij,V,S&gt;, is related to the following quantity: (8) co = card(V)/c To quantify the typicality of a class cg=&lt;c,\[x\]ij,V,S&gt;, the following definitions are useful. Given a threshold ae \[0,1\], the typicality of Cgis given by:  (9) xW = ~&lt;i,j&gt;e TW xij / card(T~ where T~,is the typicality set of ~, i.e.</Paragraph>
    <Paragraph position="8"> I&lt;i,j&gt; I xij &gt;a}.</Paragraph>
    <Paragraph position="9"> DEF (Basic-level verb category). Given two thresholds T, 8 e \[ 0,1 \], c~deg=&lt;e,\[x\]ij,V,S&gt; is a basic-level category for the related taxonomy iff:  (10.1) co &lt; T (generalization power) (10.2) 'cog &gt; 8 (typicality)  Like all the classes derived by the algorithm of section 2.3, each basic-level category ~=&lt;c,\[x\]ij,V,S&gt; determines two fuzzy membership values of the verb v included in V. The local membership of v to ~, I.t l~(v), is defined by:</Paragraph>
    <Paragraph position="11"> where nv is the number of different instances of v in the learning set. (11) depends on the contribution of v to the distribution of probabilities \[x\]i',j i.e. it measures the adherence of v to the prototype. (12) determines how typical is the classification of v in ~, with respect to all the observations of v in the corpus. Low values of the global membership are useful at identifying instances of v that are likely to be originated by parsing errors.</Paragraph>
    <Paragraph position="12"> Given a classification ,.qbf extended sets of linguistic instances, the definition (10) identifies all the basic-level classes.</Paragraph>
    <Paragraph position="13"> Repeated experiment over the two corpora demonstrated that these classes are substantially invariant with respect to the presentation order of the instances.</Paragraph>
    <Paragraph position="14"> The values y=0.6 and 8=0.75 have been empirically selected as producing the most stable results in both corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML