File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2186_intro.xml
Size: 5,852 bytes
Last Modified: 2025-10-06 14:06:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2186"> <Title>Part of Speech Tagging Using a Network of Linear Separators</Title> <Section position="2" start_page="0" end_page="1136" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Learning problems in the natural language domain often map the text to a space whose dimensions are the measured features of the text, e.g., its words. Two characteristic properties of this domain are that its dimensionality is very high and that both the learned concepts and the instances reside very sparsely in the feature space. In this paper we present a learning algorithm and an architecture with properties suitable for this domain.</Paragraph> <Paragraph position="1"> The SNOW algorithm presented here builds on recently introduced theories of multiplicative weight-updating learning algorithms for linear functions. Multiplicative weight-updating algorithms such as Winnow (Littlestone, 1988) and Weighted Majority (Littlestone and Warmuth, 1994) have been studied extensively in the COLT literature. Theoretical analysis has shown that they have exceptionally good behavior in the presence of irrelevant attributes, noise, and even a target function changing in time (Littlestone, 1988; Littlestone and Warmuth , 1994; Herbster and Warmuth, 1995).</Paragraph> <Paragraph position="2"> Only recently have people started to test these claimed abilities in applications. We address these claims empirically by applying SNOW to one of the fundamental disambiguation problems in natural language: part-of speech tagging.</Paragraph> <Paragraph position="3"> Part of Speech tagging (POS) is the problem of assigning each word in a sentence the part of speech that it assumes in that sentence. The importance of the problem stems from the fact that POS is one of the first stages in the process performed by various natural language related processes such as speech, information extraction and others.</Paragraph> <Paragraph position="4"> The architecture presented here, SNOW, is a Sparse Network Of Linear separators which utilizes the Winnow learning algorithm. A target node in the network corresponds to a candidate in the disambiguation task; all subnetworks learn autonomously from the same data in an online fashion, and at run time, they compete for assigning the correct meaning. A similar architecture which includes an additional layer is described in (Golding and Roth, 1998).</Paragraph> <Paragraph position="5"> The POS problem suggests a special challenge to this approach. First, the problem is a multi-class prediction problem. Second, determining the POS of a word in a sentence may depend on the POS of its neighbors in the sentence, but these are not known with any certainty. In the SNOW architecture, we address these problems by learning at the same time and from the same input, a network of many classifiers. Each sub-network is devoted to a single POS tag and learns to separate its POS tag from all others.</Paragraph> <Paragraph position="6"> At run time, all classifiers are applied simultaneously and compete for deciding the POS of this word.</Paragraph> <Paragraph position="7"> We present an extensive set of experiments in which we study some of the properties that SNOWexhibits on this problem, as well as compare it to other algorithms. In our first experiment, for example, we study the quality of the learned classifiers by, artificially, supplying each classifier with the correct POS tags of its neighbors. We show that under these conditions our classifier is almost perfect. This observation motivates an improvement in the algorithm which aims at trying to gradually improve the input supplied to the classifier.</Paragraph> <Paragraph position="8"> We then perform a preliminary study of learning the POS tagger in an unsupervised fashion. We show that we can reduce the requirements from the training corpus to some degree, but do not get good results, so far, when it is trained in a completely unsupervised fashion.</Paragraph> <Paragraph position="9"> Unlike most of the algorithms tried on this and other disambiguation tasks, SNOW is an online learning algorithm. That is, during training, every example is used once to update the learned hypothesis, and is then discarded.</Paragraph> <Paragraph position="10"> While on-line learning algorithms may be at disadvantage because they see each example only once, the algorithms are able to adapt to testing examples by receiving feedback after prediction.</Paragraph> <Paragraph position="11"> We evaluate this claim for the POS task, and discover that indeed, allowing feedback while testing, significantly improves the performance of SNOWon this task.</Paragraph> <Paragraph position="12"> Finally, we compare our approach to a state-of-the-art tagger, based on Brill's transformation based approach; we show that SNOW-based taggers already achieve results that are comparable to it, and outperform it, when we allow online update.</Paragraph> <Paragraph position="13"> Our work also raises a few methodological questions with regard to the way we measure the performance of algorithms for solving this problem, and improvements that can be made by better defining the goals of the tagger.</Paragraph> <Paragraph position="14"> The paper is organized as follows. We start by presenting the SNOW approach. We then describe our test task, POS tagging, and the way we model it, and in Section 5 we describe our experimental studies. We conclude by discussing the significance of the approach to future research on natural language inferences.</Paragraph> <Paragraph position="15"> In the discussion below, s is an input example, zi's denote the features of the example, and c, t refer to parts of speech from a set C of possible POS tags.</Paragraph> </Section> class="xml-element"></Paper>