XML Viewer - w03-1303

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1303_metho.xml
Size: 20,668 bytes
Last Modified: 2025-10-06 14:08:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1303">
  <Title>Using Domain-Specific Verbs for Term Classification</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Term Classification Approaches
</SectionTitle>
    <Paragraph position="0"> Similarly to general classification algorithms, the existing term classification approaches typically rely on learning techniques. These techniques are most often statistically based (e.g. hidden Markov models, naive Bayesian learning, etc.). Other techniques include decision trees, inductive rule learning, support-vector machines (SVMs), etc. We, on the other hand, suggest the use of a genetic algorithm as a learning engine for the classification task. Let us now discuss some approaches to the automatic classification of biomedical terms.</Paragraph>
    <Paragraph position="1"> Nobata et al. (2000) implemented a statistical method for term classification. In their approach, each class was represented by a list of (single) words. The first step was to estimate the conditional probability P(c  |w) of each word w being assigned to a specific class c, based on the assumption that each word occurrence is independent of its context and position in the text. Further, yet another strong restriction was made by assuming that there was one-to-one correspondence between terms and their classes. In addition, this approach is not applicable to &amp;quot;unknown&amp;quot; terms, i.e. terms containing words for which no classification probabilities had been determined. A special class, referring to &amp;quot;other&amp;quot;, was introduced to cover such words. Bearing in mind the increasing number of new terms, such an approach is bound to produce skewed results, where many of the terms would simply be classified as &amp;quot;other&amp;quot;.</Paragraph>
    <Paragraph position="2"> While Nobata et al. (2000) statistically processed the information found inside the terms, Collier et al. (2001) applied statistical techniques to the information found outside the terms. A hidden Markov model based on n-grams (assuming that a term's class may be induced from the previous n-1 lexical items and their classes) was used as a theoretical basis for their classification method. The method relied on the orthographic features including numerals, capital and Greek letters, special characters (such as `-`, `/`, `+`, etc.), parenthesis, etc. In the biomedical domain, such features often provide hints regarding the class of a specific term.</Paragraph>
    <Paragraph position="3"> Each unclassified term was assigned a class of the most similar (with respect to the orthographic features) term from the training set. This approach encountered the minority class prediction problem.</Paragraph>
    <Paragraph position="4"> Namely, the best classification results in terms of recall and precision were achieved for the most frequent class of terms in their training corpus, while the worst results were those achieved for the least frequent class.</Paragraph>
    <Paragraph position="5"> Hatzivassiloglou et al. (2001) proposed a method for unsupervised learning of weights for context elements (including words as context constituents and the corresponding positional and morphological information) of known terms and using these weights for term classification. Three well-known learning techniques were used: naive Bayesian learning, decision trees, and inductive rule learning. Simplified classification experiments in which a classification algorithm was choosing between two or three options respectively were conducted. The precision of binary classification was around 76% for all three learning algorithms, and the precision dropped to approximately 67% when choosing between three options. If the proposed techniques were to be applied for general classification where the number of options is arbitrary, the precision is expected to decrease even further.</Paragraph>
    <Paragraph position="6"> Nenadic et al. (2003b) conducted a series of large-scale experiments with different types of features for a multi-class SVM. These features included document identifiers, single words, their lemmas and stems, and automatically recognised terms. The results indicated that the performance was approximately the same (around 60% in the best case) when using single words, lemmas or stems. On the other side, terms proved to be better (more than 90% precision) than single words at lower recall points (less than 10%), which means that terms as features can improve the precision for minority classes. The best results were achieved with document identifiers, but such features cannot be used on the fly in new documents.</Paragraph>
    <Paragraph position="7"> Spasic et al. (2002) used a genetic algorithm (GA) based on a specific crossover operator to explore the relationships between verbs and the terms complementing them. The GA performed reasoning about term classes allowed to be combined with specific verbs by using an existing ontology as a seed for learning. In this paper, we use the results of the proposed methodology as a platform for term classification. In the following section we briefly overview the method for the acquisition of verb complementation patterns.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="1" type="metho">
    <SectionTitle>
3 Verb Complementation Patterns
</SectionTitle>
    <Paragraph position="0"> By looking at the context of an isolated verb occurrence it is difficult to predict all term classes that can be combined with the given verb. On the other hand, the whole &amp;quot;population&amp;quot; of terms complementing a specific verb is likely to provide a certain conclusion about that verb with respect to its complementation patterns. This was a primary motivation for Spasic et al. (2002) to use a GA as it operates on a population of individuals as opposed to a single individual. This fact also makes the approach robust, since it does not rely on every specific instance of verb-term combination to be correctly recognised.</Paragraph>
    <Paragraph position="1"> As not all verbs are equally important for the term classification task, we are primarily interested in domain-specific verb complementation patterns.</Paragraph>
    <Paragraph position="2"> In our approach, a complementation pattern of a domain-specific verb is defined as a disjunction of terms and/or their classes that are used in combination with the given verb. The automatic acquisition of these patterns is performed in the following steps: term recognition, domain-specific verb extraction, and the learning of complementation patterns. Let us describe each of these steps in more detail.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Term Recognition
</SectionTitle>
      <Paragraph position="0"> First, a corpus is terminologically processed: both terms present in the ontology and the terms recognised automatically are tagged. Terms already classified in the ontology are used to learn the classes allowed by the domain-specific verbs, while the new terms are yet to be classified based on the learnt classes. New terms are recognized by the C/NC-value method (Frantzi et al., 2000), which extracts multi-word terms. This method recognises terms by combining linguistic knowledge and statistical analysis. Linguistic knowledge is used to propose term candidates through general term formation patterns. Each term candidate t is then quantified by its termhood C-value(t) calculated as a combination of its numerical characteristics: length |t |as the number of words, absolute frequency f(t) and two types of frequency relative to the set S(t) of candidate terms containing a nested candidate term t (frequency of occurrence nested inside other candidate terms and the number of different term candidates containing a nested candidate term):</Paragraph>
      <Paragraph position="2"> Obviously, the higher the frequency of a candidate term the greater its termhood. The same holds for its length. On the other side, the more frequently the candidate term is nested in other term candidates, the more its termhood is reduced.</Paragraph>
      <Paragraph position="3"> However, this reduction decreases with the increase in the number of different host candidate terms as it is hypothesised that the candidate term is more independent if the set of its host terms is more versatile.</Paragraph>
      <Paragraph position="4"> Term distribution in top-ranked candidate terms is further improved by taking into account their context. The relevant context words, including nouns, verbs and adjectives, are extracted and assigned weights based on how frequently they co-occur with top-ranked term candidates. Subsequently, context factors are assigned to candidate terms according to their co-occurrence with top-ranked context words. Finally, new termhood estimations (NC-values) are calculated as a linear combination of the C-values and context factors.</Paragraph>
      <Paragraph position="5"> Nenadic et al. (2003a) modified the C/NC-value to recognise acronyms as a special type of single-word terms, and, thus, enhanced the recall of the method. On the other hand, the modified version incorporates the unification of term variants into the linguistic part of the method, which also improved the precision, since the statistical analysis is more reliable when performed over classes of equivalent term variants instead of separate terms.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Domain-Specific Verb Recognition
</SectionTitle>
      <Paragraph position="0"> Verbs are extracted from the corpus and ranked based on the frequency of occurrence and the frequency of their co-occurrence with terms. A stop list of general verbs frequently mentioned in scientific papers independently of the domain (e.g. observe, explain, etc.) was used to filter out such verbs. The top ranked verbs are selected and considered to be domain-specific. Moreover, these verbs are also corpus-specific (e.g. activate, bind, etc.). Table 3 provides a list of such verbs, which were used in the experiments.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
3.3 Complementation Pattern Learning
</SectionTitle>
      <Paragraph position="0"> In order to learn a verb complementation pattern for each of the selected verbs separately, terms are collected from the corpus by using these verbs as anchors. A GA has been implemented as an iterative reasoning procedure based on a partial order relation induced by the domain-specific ontology.</Paragraph>
      <Paragraph position="1">  In each iteration pairs of verb complementation patterns represented as sets of terms and term classes are merged. This operation involves the substitution of less general terms/classes by their more general counterparts, if there is a path in the ontology connecting them. Otherwise, the disjunction of the terms is formed and passed to the next iteration. Figure 1 depicts the process of learning a verb complementation pattern.</Paragraph>
      <Paragraph position="2"> Since the partial order relation induced by the ontology is transitive, the order in which terms are processed is of no importance. The final verb complementation patterns are minimal in the sense that the number of terms in a verb complementation pattern and the depth of each individual term in the ontology are minimised.</Paragraph>
      <Paragraph position="3">  for the verb bind</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1" end_page="4" type="metho">
    <SectionTitle>
4 Term Classification Method
</SectionTitle>
    <Paragraph position="0"> The verb complementation patterns have been obtained by running the GA on a set of terms some of which were present in an ontology, which is used  The partial order relation is based on the hierarchy of terms/classes: term/class t  during the learning process. The newly recognised terms (i.e. the ones not found in the ontology) will remain included in the final verb complementation patterns as non-classified terms, since at this point it is not known which classes could replace them. All elements of the final verb complementation patterns can be thus divided into two groups based on the criterion of their (non)existence in the ontology. The elements already present in the ontology are candidate classes for the newly recognised terms. Let us now describe the classification method in more detail.</Paragraph>
    <Paragraph position="1"> Let V = {v</Paragraph>
    <Paragraph position="3"> } be a set of automatically identified domain-specific verbs.</Paragraph>
    <Paragraph position="4"> During the phase of learning verb complementation patterns, each of these verbs is associated with a set of classes and terms it co-occurs with. Let C</Paragraph>
    <Paragraph position="6"> denote a set of classes assigned automatically to the verb v</Paragraph>
    <Paragraph position="8"> on the information found in the corpus and the training ontology. As indicated earlier, we define such set to be a verb complementation pattern for the given verb.</Paragraph>
    <Section position="1" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
4.1 Statistical Analysis
</SectionTitle>
      <Paragraph position="0"> As we planned to use verb complementation patterns for term classification, we modified the original learning algorithm (Spasic et al., 2002) by attaching the frequency information to terms and their classes. When substituting a less general class by its more general counterpart,  the frequency information is updated by summing the two respective frequencies of occurrence. In the final verb complementation pattern, each class c i,j has the frequency feature f i,j , which aggregates the frequency of co-occurrence with v</Paragraph>
      <Paragraph position="2"> ) for the given class and its subclasses. The frequency information is used to estimate the class probabilities given a verb, P(c</Paragraph>
      <Paragraph position="4"> The ontology used for learning allowed multiple inheritance only at the leaf level, that way incurring no ambiguities when substituting subclass by its superclass. The multiple inheritance at the leaf level was resolved by mapping each term to all its classes, which were then processed by a GA.</Paragraph>
      <Paragraph position="5"> Unclassified terms remain present in the final verb complementation patterns, and, like classes, they are also assigned the information on the frequency of co-occurrence with the given verb.</Paragraph>
      <Paragraph position="6"> When classifying a specific term, this information is used to select the verb based on whose pattern the term will be classified. Precisely, the verb the given term most frequently co-occurs with is chosen, as it is believed to be the most indicative one for the classification purpose.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Term Similarity Measure
</SectionTitle>
      <Paragraph position="0"> A complementation pattern associated with the chosen verb typically contain several classes. In order to link the newly recognised terms to specific candidate classes, we used a hybrid term similarity measure, called the CLS similarity measure. It combines contextual, lexical and syntactic properties of terms in order to estimate their similarity (Nenadic et al., 2002).</Paragraph>
      <Paragraph position="1"> Lexical properties used in the CLS measure refer to constituents shared by the compared terms.</Paragraph>
      <Paragraph position="2"> The rationale behind the lexical term similarity involves the following hypotheses: (1) Terms sharing a head are likely to be hyponyms of the same term (e.g. progesterone receptor and oestrogen receptor). (2) A term derived by modifying another term is likely to be its hyponym (e.g. nuclear receptor and orphan nuclear receptor). Counting the number of common constituents is a simple and straightforward approach to measuring term similarity, but it falls short when it comes to single-word terms and those introduced in an ad-hoc manner. Thus, properties other than lexical need to be included.</Paragraph>
      <Paragraph position="3"> We use syntactic properties in the form of specific lexico-syntactical patterns indicating parallel usage of terms (e.g. both Term and Term). All terms used within a parallel structure have identical syntactic features and are used in combination with the same verb, preposition, etc., and, hence, can be regarded as similar with high precision.</Paragraph>
      <Paragraph position="4"> However, patterns used as syntactic properties of terms have relatively low frequency of occurrence compared to the total number of terms, and in order to have a good recall, a large-size corpus is needed. In order to remedy for small-size corpora, other contextual features are exploited.</Paragraph>
      <Paragraph position="5"> Context patterns (CPs) in which terms appear are used as additional features for term comparison. CPs consist of the syntactic categories and other grammatical and lexical information (e.g.</Paragraph>
      <Paragraph position="6"> PREP NP V:stimulate). They are ranked according to a measure called CP-value (analogue to C-value for ATR). The ones whose CP-value is above a chosen threshold are deemed significant and are used to compare terms. Each term is associated with a set of its CPs, and contextual similarity between terms is then measured by comparing the corresponding sets. Automatically collected CPs are indeed domain-specific, but the method for their extraction is domain independent.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="3" type="sub_section">
      <SectionTitle>
4.3 Term-Class Similarity
</SectionTitle>
      <Paragraph position="0"> The CLS similarity measure applies to pairs of terms. However, in case of multiple choices provided by the verb complementation patterns, we need to compare terms to classes. In order to do so, we use the similarity between the given term and the terms belonging to the classes. The selection of terms to be compared is another issue. One possibility is to use the full or random set of terms (belonging to the given class) that occur in the corpus. Alternatively, some ontologies provide a set of prototypical instances for each class, which can be used for comparison of terms and classes.</Paragraph>
      <Paragraph position="2"> are terms representing the class, and t is a term, then the similarity between the term t and the class c is calculated in the following way:</Paragraph>
      <Paragraph position="4"> This example-based similarity measure maximises the value of the CLS measure between the term and the instances representing the class. In addition, the values of the CLS measure are mapped into the interval (0,1) by performing vector normalisation in order to make them comparable to the class probability estimations.</Paragraph>
    </Section>
    <Section position="4" start_page="3" end_page="4" type="sub_section">
      <SectionTitle>
4.4 Term Classification
</SectionTitle>
      <Paragraph position="0"> Finally, given the term t and the verb v i it most frequently co-occurs with, a score is calculated for  For example, in the UMLS ontology each class is assigned a number of its prototypical examples represented by terms. each class c i,j from the set C i according to the following formula:</Paragraph>
      <Paragraph position="2"> where a (0 [?] a [?] 1) is a parameter, which balances the impact of the class probabilities and the similarity measure.</Paragraph>
      <Paragraph position="3">  A class with the highest C(t, c i,j ) score is used to classify the term t. Alternatively, multiple classes may be suggested by setting a threshold for C(t, c i,j ).</Paragraph>
      <Paragraph position="4"> At this point, let us reiterate that the final verb complementation patterns are minimal in the sense that the number of terms in a verb complementation pattern and the depth of each individual term in the ontology are minimised. The latter condition may cause the classification to be crude, that is new terms will be assigned to classes close to the root of the ontology. For more fine-grained classification results, the classes placed close to the root of the ontology should be either removed from the initial verb complementation patterns, thus being unable to override the classes found lower in the hierarchy or in other way prevented from substituting less general terms. The depth up to which the terms are to be blocked may be empirically determined. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML