File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0909_intro.xml
Size: 5,948 bytes
Last Modified: 2025-10-06 14:07:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0909"> <Title>Unsupervised Lexical Learning with Categorial Grammars</Title> <Section position="2" start_page="0" end_page="59" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this paper we discuss a potential solution to two problems in Natural Language Processing (NLP), using a combination of statistical and symbolic machine learning techniques. The first problem is learning the syntactic roles, or categories, of words of a language i.e. learning a lexicon. Secondly, we discuss a method of annotating a corpus with parses.</Paragraph> <Paragraph position="1"> The aim is to learn Categorial Grammar (CG) lexicons, starting from a set of lexical categories, the functional application rules of CG and an unannotated corpus of positive examples. The CG formalism (discussed in Section 2) is chosen because it assigns distinct categories to words of different types, and the categories describe the exact syntactic role each word can play in a sentence.</Paragraph> <Paragraph position="2"> This problem is similar to the unsupervised part of speech tagging work of, for example, Brill (Brill, 1997) and Kupiec (Kupiec, 1992).</Paragraph> <Paragraph position="3"> In Brill's work a lexicon containing the parts of speech available to each word is provided and a simple tagger attaches a complex tag to each word in the corpus, which represents all the possible tags that word can have. Transformation rules are then learned which use the context of a word to determine which simple tag it should be assigned. The results are good, generally achieving around 95% accuracy on large corpora such as the Penn Treebank.</Paragraph> <Paragraph position="4"> Kupiec (Kupiec, 1992) uses an unsupervised version of the Baum-Welch algorithm, which is a way of using examples to iteratively estimate the probabilities of a Hidden Markov Model for part of speech tagging. Instead of supplying a lexicon, he places the words in equivalence classes. Words in the same equivalence class must take one of a specific set of parts of speech. This improves the accuracy of this algorithm to about the same level as Brill's approach.</Paragraph> <Paragraph position="5"> In both cases, the learner is provided with a large amount of background knowledge - either a complete lexicon or set of equivalence classes. In the approach presented here, the most that is provided is a small partial lexicon. In fact the system learns the lexicon.</Paragraph> <Paragraph position="6"> The second problem - annotating the corpus - is solved because of the approach we use to learn the lexicon. The system uses parsing to determine which are the correct lexical entries for a word, thus annotating the corpus with the parse derivations (also providing less probable parses if desired). An example of another approach to doing this is the Fidditch parser of Hindle (Hindle, 1983) (based on the deterministic parser of Marcus (Marcus, 1980)), which was used to annotate the Penn Treebank (Marcus et al., 1993). However, instead of learning the lexicon, a complete grammar and lexicon must be supplied to the Fidditch parser.</Paragraph> <Paragraph position="7"> Our work also relates to CG induction, which has been attempted by a number of people. Osborne (Osborne, 1997) has an algorithm that.</Paragraph> <Paragraph position="8"> learns a grammar for sequences of part-of-speech tags from a tagged corpora, using the Minimum Description Length (MDL) principle - a well-defined form of compression. While this is a supervised setting of the problem, the use of the more formal approach to compression is of interest for future work. Also, results of 97% coverage are impressive, even though the problem is rather simpler. Kanazawa (Kanazawa, 1994) and Buszkowski (Buszkowski, 1987) use a unification based approach with a corpus annotated with semantic structure, which in CG is a strong indicator of the syntactic structure. Unfortunately, they do not present results of experiments on natural language corpora and again the approach is essentially supervised.</Paragraph> <Paragraph position="9"> Two unsupervised approaches to learning CGs are presented by Adriaans (Adriaans, 1992) and Solomon (Solomon, 1991). Adriaans, describes a purely symbolic method that uses the context of words to define their category.</Paragraph> <Paragraph position="10"> An oracle is required for the learner to test its hypotheses, thus providing negative evidence.</Paragraph> <Paragraph position="11"> This would seem to be awkward from a engineering view point i.e. how one could provide an oracle to achieve this, and implausible from a psychological point of view, as humans do not seem to receive such evidence (Pinker, 1990).</Paragraph> <Paragraph position="12"> Unfortunately, again no results on natural language corpora seem to be available.</Paragraph> <Paragraph position="13"> Solomon's approach (Solomon, 1991) uses unannotated corpora, to build lexicons for simple CG. He uses a simple corpora of sentences from children's books, with a slightly ad hoc and non-incremental, heuristic approach to developing categories for words. The results show that a wide range of categories can be learned, but the current algorithm, as the author admits, is probably too naive to scale up to working on full corpora. No results on the coverage of the CGs learned are provided.</Paragraph> <Paragraph position="14"> In Section 3 we discuss our learner. In Section 4 we describe experiments on three corpora containing examples of a subset of English and Section 5 contains the results, which are encouraging with respect to both problems. Finally, in Section 6, we compare the results with the systems mentioned above and discuss ways the system can be expanded and larger scale experiments may be carried out. Next, however, we describe Categorial Grammar.</Paragraph> </Section> class="xml-element"></Paper>