File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2609_metho.xml

Size: 14,988 bytes

Last Modified: 2025-10-06 14:10:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2609">
  <Title>Learning to Identify Definitions using Syntactic Features</Title>
  <Section position="3" start_page="64" end_page="64" type="metho">
    <SectionTitle>
2 Previous work
</SectionTitle>
    <Paragraph position="0"> Work on identifying definitions from free text initially relied on manually crafted patterns without applying any machine learning technique. Klavans and Muresan (2000) set up a pattern extractor for their Definder system using a tagger and a finite state grammar. Joho and Sanderson (2000) retrieve descriptive phrases (dp) of query nouns (qn) from text to answer definition questions like Who is qn? Patterns such as 'dp especially qn', as utilized by Hearst (1992), are used to extract names and their descriptions.</Paragraph>
    <Paragraph position="1"> Similar patterns are also applied by Liu et al.</Paragraph>
    <Paragraph position="2"> (2003) to mine definitions of topic-specific concepts on the Web. As an additional assumption, specific documents dedicated to the concepts can be identified if they have particular HTMLand hyperlink structures.</Paragraph>
    <Paragraph position="3"> Hildebrandt et al. (2004) exploit surface patterns to extract as many relevant &amp;quot;nuggets&amp;quot; of information of a concept as possible. Similar to our work, a copular pattern NP1 be NP2 is used as one of the extraction patterns. Nuggets which do not begin with a determiner are discarded to filter out spurious nuggets (e.g., progressive tense). Nuggets extracted from every article in a corpus are then stored in a relational database. In the end, answering definition questions becomes as simple as looking up relevant terms from the database.</Paragraph>
    <Paragraph position="4"> Thisstrategy issimilar toourapproach foranswering definition questions.</Paragraph>
    <Paragraph position="5"> The use of machine learning techniques can be found in Miliaraki and Androutsopoulos (2004) and Androutsopoulos and Galanis (2005) They use similar patterns as (Joho and Sanderson, 2000) to construct training attributes. Sager and L'Homme (1994) note that the definition of a term should at least always contain genus (term's category) and species (term's properties). Blair-Goldensohn et al. (2004) uses machine learning and manually crafted lexico-syntactic patterns to match sentences containing both a genus and species phrase for a given term.</Paragraph>
    <Paragraph position="6"> There is an intuition that most of definition sentences are located at the beginning of documents. This lead to the use of sentence number as a good indicator of potential definition sentences. Joho and Sanderson (2000) use the position of the sentences as one of their ranking criteria, while Miliaraki and Androutsopoulos (2004), Androutsopoulos and Galanis (2005) and Blair-Goldensohn et al. (2004) apply it as one of their learning attributes.</Paragraph>
  </Section>
  <Section position="4" start_page="64" end_page="710" type="metho">
    <SectionTitle>
3 Syntactic properties of potential
</SectionTitle>
    <Paragraph position="0"> definition sentences To answer medical definition sentences, we used the medical pages of Dutch Wikipedia2 as source. Medical pages were selected by selecting all pages mentioned on the Healthcare index page, and recursively including pages mentioned on retrieved pages as well.</Paragraph>
    <Paragraph position="1"> The corpus was parsed syntactically by Alpino, a robust wide-coverage parser for Dutch (Malouf and van Noord, 2004). The result of parsing (illustrated in Figure 1) is a dependency graph. The Alpino-parser comes withan integrated named entity classifier which assigns distinct part-of-speech tags to person, organization, and geographical named entities.</Paragraph>
    <Paragraph position="2"> Potentialdefinition sentences aresentences containing a form of the verb zijn3 (to be) with a subject and nominal predicative phrase as sisters. The syntactic pattern does not match sentences in which zijn is used as a possessive pronoun (his) and sentences where a form of zijn is used as an auxiliary. In the latter case, no predicative phrase complement will be found. On the other hand, we do include sentences in which the predicative phrase precedes the subject, as in Onderdeel van de testis is de Leydig-cel (the Leydig cel is part of the testis). As word order in Dutch is less strict than in English, it becomes relevant to include such non-canonical word orders as well.</Paragraph>
    <Paragraph position="3"> A number of non-definition sentences that will be extracted using this method can be filtered by simplelexical methods. Forinstance, ifthesubject isheaded by(theDutchequivalents of) cause, con- null number 7. Nodes are labelled with depedency relations and categories or part-of-speech tags, root forms, and string positions.</Paragraph>
    <Paragraph position="4"> sequence, example, problem, result, feature, possibility, symptom, sign, etc., or contains the determiner geen (no), the sentence will not be included in the list of potential definitions.</Paragraph>
    <Paragraph position="5"> However, even after applying the lexical filter, not all extracted sentences are definitions. In the next sections, we describe experiments aimed at increasing the accuracy of the extraction method.</Paragraph>
  </Section>
  <Section position="5" start_page="710" end_page="710" type="metho">
    <SectionTitle>
4 Annotating training examples
</SectionTitle>
    <Paragraph position="0"> To create evaluation and training data, 2500 extracted sentences were manually annotated as definition, non-definition, or undecided. One of the criteria for undecided sentences is that it mentions a characteristic of a definition but is not really a (complete) definition, for example, Benzeen is carcinogeen (Benzene is a carcinogen). The result of this annotation is given in Table 1. The annotated data was used both to evaluate the accuracy of the syntactic extraction method, and to training and evaluate material for the machine learning experiments as discussed in the next sections.</Paragraph>
    <Paragraph position="1"> After discarding the undecided sentences, we are left with 2299 sentences, 1366 of which are definitions. This means that the accuracy of the extraction method using only syntax was 59%.4 4This is considerably higher than the estimated accuracy of 18% reported in Tjong Kim Sang et al. (2005). This is probably partly due to the fact that the current corpus consists of encyclopedic material only, whereas the corpus used If we take sentence postion into account as well, and classify all first sentences as definitions and all other sentences as non-definitions, a baseline accuracy of 75,9% is obtained.</Paragraph>
    <Paragraph position="2"> It is obvious from Table 1 that the first sentences of Wikipedia lemmas that match the syntactic pattern are almost always definitions. It seems that e.g. Google's5 define query feature, when restricted to Dutch at least, relies heavily on this fact to answer definition queries. However it is also obvious that definition sentences can also be found in other positions. For documents from other sources, which are not as structured as Wikipedia, the first position sentence is likely to be an even weaker predictor of definition vs. non-definition sentences.</Paragraph>
  </Section>
  <Section position="6" start_page="710" end_page="710" type="metho">
    <SectionTitle>
5 Attributes of definition sentences
</SectionTitle>
    <Paragraph position="0"> Weaimatfinding the bestattributes for classifying definition sentences. We experimented with combinations of the following attributes: Text properties: bag-of-words, bigrams, and root forms. Punctuation is included as Klavans and Muresan (2000) observe that it can be used to recognize definitions (i.e. definitions tend to conin Tjong Kim Sang et al. (2005) contained web material from various sources, such as patient discussion groups, as well. The latter tends to contain more subjective and context- null other position of documents annotated as definition, non-definition, and undecided.</Paragraph>
    <Paragraph position="1"> tain parentheses more often than non-definitions).</Paragraph>
    <Paragraph position="2"> No stopword filtering is applied as in our experiments it consistently decreased accuracy. Note that we include all bigrams in a sentence as feature. A different use ofn-grams has been explored by Androutsopoulos and Galanis (2005) who add only n-grams (n [?] {1,2,3}) occurring frequently either directly before or after a target term.</Paragraph>
    <Paragraph position="3"> Document property: the position of each sentence in the document. This attribute has been frequently used in previous work and is motivated by the observation that definitions are likely to be located in the beginning of a document.</Paragraph>
    <Paragraph position="4"> Syntactic properties: position of each sub-ject in the sentence (initial, e.g. X is Y; or noninitial, e.g. Y is X), and of each subject and predicative complement: type of determiner (definite, indefinite, other). These attributes have not been investigated in previous work. In our experiments, sentence-initial subjects appear in 92% of the definition sentences and and 76% of the non-definition sentences. These values show that a definition sentence with a copular pattern tends to put its subject in the beginning. Two other attributes are used to encode the type of determiner of the subject and predicative compelement. As shown in Table 2, the majority of subjects in definition sentences have no determiner (62%), e.g. Paracetamol is een pijnstillend en koortsverlagend middel (Paracetamol is an pain alleviating and a fever reducing medicine), while in non-definition sentences subject determiners tend to be definite (50%), e.g. De werkzame stof is acetylsalicylzuur (The operative substance is acetylsalicylacid). Predicative complements, as shown in Table 3, tend to contain indefinite determiners in definition sentences (64%), e.g. een pijnstillend ...medicijn (a pain alleviating...medicine), while in non-definition the determiner tends to be definite (33%), e.g. Een fenomeen is de Landsgemeinde (A phenomenon is the Landsgemeinde).</Paragraph>
    <Paragraph position="5">  of subjects, e.g. location, person, organization, or no-class. A significant difference in the distribution of this feature between definition and non-definition sentences can be observed in Table 4. More definition sentences have named entity  classes contained in their subjects (40.63%) compared to non-definition sentences (11.58%). We also experimented with named entity classes contained in predicative complements but it turned out that very few predicates contained named entities, and thus no significant differences in distribution between definition and non-definition sentences could be observed.</Paragraph>
    <Paragraph position="6"> Features for lexical patterns, as used in (Androutsopoulos and Galanis, 2005), e.g. qn which (is|was|are|were) dp, are not added because in this experiment we investigate only a copular pattern.</Paragraph>
    <Paragraph position="7"> WordNet-based attributes are also excluded, given that coverage for Dutch (using EuroWordNet) tends tobeless good than forEnglish, andeven for English their contribution is sometimes insignifi- null tems using word bigrams only and word bigrams in combination with syntactic and sentence position features (word features have been translated into English).</Paragraph>
    <Paragraph position="8"> We use the text classification tool Rainbow6 (McCallum, 2000) to perform most of our experiments. Each sentence is represented as a string of words, possibly followed by bigrams, root forms, (combinations of) syntactic features, etc.</Paragraph>
    <Paragraph position="9"> All experiments were performed by selecting only the 2000 highest ranked features according to information gain. In the experiments which include syntactic features, the most informative features tend tocontain afairnumber ofsyntactic features. Thisisillustrated fortheconfiguration using bigrams, sentence position, and syntax in table 5.</Paragraph>
    <Paragraph position="10"> It supports our intuition that the position of subjects and the type of determiner of subjects and predicative complements are clues to recognizing definition sentences.</Paragraph>
    <Paragraph position="11"> To investigate the effect of each attribute, we set up several configurations of training examples as described in Table 6. We start with using only bag-of-words or bigrams, and then combine them with other attribute sets.</Paragraph>
  </Section>
  <Section position="7" start_page="710" end_page="710" type="metho">
    <SectionTitle>
6 Learning-based methods
</SectionTitle>
    <Paragraph position="0"> We apply three supervised learning methods to each of the attribute configurations in Table 6, namely naive Bayes, maximum entropy, and support vector machines (SVMs). Naive Bayes is a fast and easy to use classifier based on the probabilistic model of text and has often been used in text classification tasks as a baseline. Maximum entropy is a general estimation technique that has been used in many fields such as information retrieval and machine learning. Some experiments in text classification show that maximum entropy often outperforms naive Bayes, e.g. on two of three data sets in Nigam et al. (1999). SVMs are a new learning method but have been reported by Joachims (1998) to be well suited for learning in text classification.</Paragraph>
    <Paragraph position="1"> We experiment with three kernel types of SVMs: linear, polynomial, and radial base function (RBF). Rainbow (McCallum, 2000) is used to examine these learning methods, except the RBF kernel for which libsvm (Chang and Lin, 2001) is used. Miliaraki and Androutsopoulos (2004) use a SVM with simple inner product (polynomial of first degree) kernel because higher degree polynomial kernels were reported as giving no improvement. However we want to experiment with  entropy (ME), and three SVM settings at the different attribute configurations. the RBF (gaussian) kernel by selecting model parameters C (penalty for misclassification) and g (function of the deviation of the Gaussian Kernel) so that the classifier can accurately predict testing data. This experiment is based on the argument that if a complete model selection using the gaussian kernel has been conducted, there is no need to consider linear SVM, because the RBF kernel with certain parameters (C, g) has the same performance as the linear kernel with a penalty parameter ~C (Keerthi and Lin, 2003).</Paragraph>
    <Paragraph position="2"> Given the finite dataset, we use k-fold cross-validation (k = 20) to estimate the future performance of each classifier induced by its learning method and dataset. This estimation method introduces lower bias compared to a bootstrap method which has extremely large bias on some problems (Kohavi, 1995).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML