File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0717_intro.xml

Size: 9,955 bytes

Last Modified: 2025-10-06 14:06:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0717">
  <Title>I- I Incorporating Knowledge in Natural Language Learning: A Case Study</Title>
  <Section position="3" start_page="0" end_page="122" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A variety of inductive learning techniques have been used in recent years in natural language processing. Given a large training corpus as input and relying on statistical properties of language usage, statistics-based and machine learning algorithms are used to induce a classifier which can be used to resolve a disambiguation task. Applications of this line of research include ambiguity resolution at different levels of sentence analysis: part-of speech tagging, word-sense disambiguation, word selection in machine translation, context-sensitive spelling correction, word selection in speech recognition, and identification of discourse markers.</Paragraph>
    <Paragraph position="1"> Many natural language inferences, however, seem to rely heavily on semantic and pragmatic knowledge about the world and the language, that is not explicit in the training data. The ability to incorporate knowledge from other sources of information, be it knowledge that is acquired across modalities: prepared by a teacher or by an expert, is crucial for going beyond low level natural language inferences.</Paragraph>
    <Paragraph position="2"> Within Machine Learning, the use of knowledge is often limited to that of constraining the hypothesis space (either before learning or by probabilistically biasing the search for the hypothesis) or to techniques such as EBL (DeJong, 1981; Mitchell et al., 1986; DeJong and Mooney, 1986) which rely on explicit domain knowledge that can be used to explain (usually, prove deductively) the observed examples.</Paragraph>
    <Paragraph position="3"> The knowledge needed to perform language-understanding related tasks, however, does not exist in any explicit form that is amenable to techniques of this sort, and many believe that it will never be available in such explicit forms. An enormous amount of useful &amp;quot;knowledge&amp;quot; may be available, though. Pieces of information that may be found valuable in language-understanding related tasks may include: the root form of a verb; a list of nouns that are in some relation (e.g., are all countries) and can thus appear in similar contexts; a list of verbs that can be followed by a food item; a list of items you can see through, things that are furniture, a list of dangerous things, etc.</Paragraph>
    <Paragraph position="4"> This rich collection of information pieces does not form any domain theory to speak of and cannot be acquired from a single source of information.</Paragraph>
    <Paragraph position="5"> This knowledge is noisy, incomplete and ambiguous.</Paragraph>
    <Paragraph position="6"> While some of it may be acquired from text, a lot if it may only be acquired from other modalities, as those used by humans. We believe that integration of such knowledge is essential for NLP to attain high-level natural-language inference.</Paragraph>
    <Paragraph position="7"> Contrary to this intuition, experiments in text retrieval and natural language have not shown much improvement when incorporating information of the kind humans seem to use (Krovetz and Croft, 1992; Kosmynin and Davidson, 1996; Kar0v and Edelman,  1996; Junker, 1997). The lack of significant improvement in the presence of more &amp;quot;knowledge&amp;quot; may be explained by the type of-knowledge used, the way it is incorporated, and the learning algorithms employed. null In the present paper we study an effective way of incorporating incomplete and ambiguous information sources of the abovementioned type within a specific learning approach, and focus on the knowledge sources that can be effective in doing so. The long-term goal of our work is understanding (1) what types of knowledge sources can be used for performance improvement, and at what granularity level and (2) which computational mechanisms can make the best use of these sources.</Paragraph>
    <Paragraph position="8"> In particular, the effect of noun-class information on learning Prepositional Phrase Attachment (PPA, cf. Sec. 2) is studied. This problem is studied within SNO IF, a sparse architecture utilizing an on-line learning algorithm based on Winnow (Littlestone, 1988). That algorithm has been applied for natural language disambiguation tasks and related problems and perform remarkably well (Golding and Roth, 1996; Dagan et al., 1997; Roth and Zelenko, 1998).</Paragraph>
    <Paragraph position="9"> The noun-class data was derived from the Word-Net database (Miller, 1990) which was compiled for general linguistic purposes, irrespective of the PPA problem. We derived the classes at different granularities. At the highest level, nouns are classified according to their synsets. The lower levels are obtained by successively using the hypernym relation defined in WordNet. In addition, we use the Corelex database (Buitelaar, 1998). Consisting of 126 coarse-grained semantic types covering around 40,000 nouns, Corelex defines a large number of systematic polysemous classes that are derived from an analysis of sense distributions in WordNet.</Paragraph>
    <Paragraph position="10"> The results indicate that a statistically significant improvement in performance is achieved when the noun-class information is incorporated into the data.</Paragraph>
    <Paragraph position="11"> The absolute performance achieved on the task is slightly better than other systems, although it is still significantly worse than the performance of a human subject tested on this task. The granularity of the class information appears to be crucial for improving performance. The addition of too many overlapping classes does not help performance, but with fewer classes - the improvement is significant.</Paragraph>
    <Paragraph position="12"> In addition to semantic information, using classes carries with it some structural information. A class feature may be viewed as a disjunction of other features, thereby increasing the expressivity of the hypothesis used for prediction. In order to control for the possibility that the performance improvements seen are due mainly to the structural information, we generated random classes. Some of these had  exactly the same distribution over the original features as do the semantic classes. Surprisingly, we find that a non-negligible part of the improvement is due merely to the structural information, although most of it can be attributed to the semantic content of the classes.</Paragraph>
    <Paragraph position="13"> Along with promoting work on the incorporation of problem-independent incomplete knowledge into the learning process, the encouraging results with incorporating noun-class data provide a motivation for carrying out more work on generating better linguistic knowledge sources.</Paragraph>
    <Paragraph position="14"> The paper is organized as follows: we start by presenting the task, PPA and the SNOW architecture and algorithm. In section 4 we describe the classes and present the main experiments with the semantic and random classes. Section 5 concludes.</Paragraph>
    <Paragraph position="15"> 2 Prepositional phrase attachment The PPA problem is to decide whether the prepositional phrase (PP) attaches to the direct object NP as in Buy the car with the steering wheel (nattachment) or to the verb phrase buy, as in Buy the car with his money (v-attachment). PPA is * a common cause of structural ambiguity in natural language.</Paragraph>
    <Paragraph position="16"> Earlier works on this problem (Ratnaparkhi et al., 1994; Brill and Resnik, 1994; Collins and Brooks, 1995; Zavrel et al., 1997) represented an example by the 4-tuple &lt;v, nl, p, n2&gt; containing the VP head, the direct object NP head, the preposition, and the indirect object NP head respectively. The first example in the previous paragraph is thus represented by &lt;buy, car, with, wheel&gt;.</Paragraph>
    <Paragraph position="17"> The experiments reported here were done using data extracted by Ratnaparkhi et al. (1994) from the Penn Treebank (Marcus et al., 1993) WSJ corpus. It consists of 20801 training examples and 3097 separate test examples.</Paragraph>
    <Paragraph position="18"> The preposition of turns out to be a very strong indicator for noun attachment. Among the 3097 test examples, 925 contain the preposition of; in all but 9 of these examples, of has an n attachment.</Paragraph>
    <Paragraph position="19"> Since almost all (99.1%) of these test cases are classified correctly regardless of the SNOW architecture or parameter choice, we omit the examples which include of from the test set, as they obscure the real performance. Only the last table will include those examples, so results may be compared with other systems evaluated on this data set.</Paragraph>
    <Paragraph position="20"> In summary, our data set consists of 15224 training examples, (5338 tagged n, 9886 tagged v) and 2172 test examples (910 and 1262, resp.). This leads to a baseline performance of 58.1% if we simply predict according to the most common attachment in the training corpus: v. (Simply breaking this down to different prepositions does not yield better re! null ! suits.) For reference, assuming a binomial distribu- tion, the standard deviation on the test set is 0.85%. That figure is a crude estimator of the standard deviation of the results.</Paragraph>
    <Paragraph position="21"> A study of the possible features which may be extracted from the data, shows that the best feature set is that composed of all the possible conjunctions of words in the input 4-tuple. In addition, lemmatizing all the nouns and verbs yielded a further performance improvement. In the following section we will use the lemmatized data &amp;quot;lemma&amp;quot; as a basic set.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML