File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1082_intro.xml

Size: 6,539 bytes

Last Modified: 2025-10-06 14:02:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1082">
  <Title>Tagging with Hidden Markov Models Using Ambiguous Tags</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Taggers are commonly used as pre-processors for more sophisticated treatments like full syntactic parsing or chunking. Although taggers achieve high accuracy, they still make some mistakes that quite often impede the following stages. There are at least two solutions to this problem. The flrst consists in devising more sophisticated taggers either by providing the tagger with more linguistic knowldge or by reflning the tagging process, through better probability estimation, for example. The second strategy consists in allowing some ambiguity in the output of the tagger. It is the second solution that was chosen in this paper. We believe that this is an instance of a more general problem in sequential natural language processing chains, in which a module takes as input the output of the preceding module. Since we cannot, in most cases, expect a module to produce only correct solutions, modules should be able to deal with ambiguous input and ambiguous output. In our case, the input is non ambiguous while the output is ambiguous. From this perspective, the quality of the tagger is evaluated by the trade-ofi it achieves between accuracy and ambiguity. The introduction of ambiguous tags in the tagger output raises the question of the processing of these ambiguous tags in the post-tagging stages of the application. Leaving some ambiguity in the output of the tagger only makes sense if these other processes can handle it. In the case of a chunker, ambiguous tags can be taken into account through the use of weighted flnite state machines, as proposed in (Nasr and Volanschi, 2004). In the case of a syntactic parser, such a device can usually deal with some ambiguity and discard the incorrect elements of an ambiguous tag when they do not lead to a complete analysis of the sentence. The parser itself acts, in a sense, as a tagger since, while parsing the sentence, it chooses the right tag among a set of possible tags for each word. The reason why we still need a tagger and don't let the parser do the job is time and space complexity.</Paragraph>
    <Paragraph position="1"> Parsers are usually more time and space consuming than taggers and highly ambiguous tags assignments can lead to prohibitive processing time and memory requirements.</Paragraph>
    <Paragraph position="2"> The tagger described in this paper is based on the standard Hidden Markov Model architecture (Charniak et al., 1993; Brants, 2000).</Paragraph>
    <Paragraph position="3"> Such taggers assign to a sequence of words</Paragraph>
    <Paragraph position="5"> ability P(T;W) where T ranges over all possible tag sequences of length n. The probability P(T;W) is itself decomposed into a product of 2n probabilities, n lexical probabilities P(wijti) (emission probabilities of the HMM) and n syntactic probabilites (transition probabilities of the HMM). Syntactic probabilities model the probability of the occurrence of tag ti given a history which is the knowledge of the h preceding tags (ti!1 ...ti!h). Increasing the length of the history increases the predictive power of the tagger but also the number of parameters to estimate and therefore the amount of training data needed. Histories of length 2 constitute a common trade-ofi for part of speech tagging.</Paragraph>
    <Paragraph position="6"> We deflne an ambiguous tag as a tag that denotes a subset of the original tagset. In the remainder of the paper, tags will be represented as subscripted capitals T : T1;T2 :::. Ambiguous tags will be noted with multiple subscripts. T1;3;5 for example, denotes the set fT1;T3;T5g.</Paragraph>
    <Paragraph position="7"> We deflne the ambiguity of an ambiguous tag as the cardinality of the set it denotes. This notion is extended to non ambiguous tags, which can be seen as singletons, their ambiguity is therefore equal to 1.</Paragraph>
    <Paragraph position="8"> Ambiguous tags are actually new tags whose lexical and syntactic probability distributions are computed on the basis of lexical and syntactic distributions of their constituents. The lexical and syntactic probability distributions of Ti1;:::;in should be computed in such a way that, when a word in certain context can be tagged as Ti1;:::;Tin with probabilities that are close enough, the tagger should choose the ambiguous tag Ti1;:::;in.</Paragraph>
    <Paragraph position="9"> The idea of changing the tagset in order to improve tagging accuracy has already been tested by several researchers. (Tufl&gt;&gt;s et al., 2000) reports experiments of POS tagging of Hungarian with a large tagset (about one thousand difierent tags). In order to reduce data sparseness problems, they devise a reduced tagset which is used for tagging. The same kind of idea is developed in (Brants, 1995). The major difierence between these approaches and ours, is that they devise the reduced tagset in such a way that, after tagging, a unique tag of the extended tagset can be recovered for each word. Our perspective is signiflcantly difierent since we allow unrecoverable ambiguity in the output of the tagger and leave to the other processing stages the task of reducing it. In the HMM based taggers framework, our work bears a certain resemblance with (Brants, 2000) who distinguishes between reliable and unreliable tag assignments using probabilities computed by the tagger. Unreliable tag assignments are those for which the probability is below a given threshold. He shows that taking into account only reliable assignments can signiflcantly improve the accuracy, from 96:6% to 99:4%. In the latter case, only 64:5% of the words are reliably tagged. For the remaining 35:5%, the accuracy is 91:6%. These flgures show that taking into account probabilities computed by the tagger discriminates well these two situations. The main difierence between his work and ours is that he does not propose a way to deal with unreliable assignments, which we treat using ambiguous tags.</Paragraph>
    <Paragraph position="10"> The paper is structured as follows: section 2 describes how the probability distributions of the ambiguous tags are estimated. Section 3 presents an iterative method to automatically discover good ambiguous tags as well as an experiment on the Brown corpus. Section 4 concludes the paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML