File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0127_intro.xml

Size: 5,044 bytes

Last Modified: 2025-10-06 14:06:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0127">
  <Title>I : I I I I I</Title>
  <Section position="2" start_page="0" end_page="298" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Word c'lassification play an important role in computational _i:~gu~s~.~. Many casks in computational linguistics, whether they use statistical or symbolic methods, reduce the complexity of the probl~m by dealing with classes of words rather than individual words.</Paragraph>
    <Paragraph position="1"> we know that some words share similar sorts of linguistic properties, thus they should belong to the same class. Some words have several functions, thus they could belong to more than one class. The questions are: What attributes distinguish one word from another? How should we group similar words together so that the partition of word  spaces is most likely, to reflect the linguistic properties of languaqe? What meaningful label or name should be given to each word group? These questions constitute the problem of finding a word classification. At present, no method can find the optimal word classification. However, researchers have been trying hard to find sub-optimal strategies which lead to useful classification.</Paragraph>
    <Paragraph position="2"> From practical point of view, word classification addresses questions of data sparseness and generalization in statistical language models. Specially, it can be used as an alternative to grammatical part-of-speech tagging (Brili,1993; Cutting, Kupiec, Pederson and Sibun, 1992; Chang and Chen 1993a; Chang and Chen 1993b; Lee and Chang Chien, 1992; Kupiec,1992; Lee, 1993; Merialdo,1994; Pop,1996; Peng, 1993; Zhou, 1995; Schutze, 1995;) on statistical language modeling(Huang, Alleva, Hwang, Lee and Rosenfeld 1993; Rosefield, 1994;), because Chinese language models u@ing part-of-speech information have had only a very limited success(e.g. Chang, 1992; Lee, Dung, Lai, and Chang Chien, 1993;). The reason why there are so many of the difficulties in Chineuc part of-speech tagging are described by Chang and Chen (1995) and Zhao (1995).</Paragraph>
    <Paragraph position="3"> Much relative work on word classification has been done. The work is based on some similarity metrics. ( Bahl, Brown, DeSouza and Mercer, 1989; Brown, Pietra, deSouza and Mercer,1992; Chang, 1995; DeRose,1988; Garside, 1987; Hughes, 1994; Jardino,1993; Jelinek, Mercer, and Roukos, 1990b; Wu, Wang, Yu and Wang, 1995; Magerman, 1994; McMahon, 1994; McMahon, 1995; Pereira, 1992; Resnik, 1992; Zhao, 1995;) Brill (1993) and Pop (1996) present a transformation-based tagging.</Paragraph>
    <Paragraph position="4"> Before a part-of-speech tagger can be built, the word classifications are performed to help us choose a set of part-of-speech. They use the sum of two relative entropies obtained from neighboring words as the similarity metric to compare two words.</Paragraph>
    <Paragraph position="5"> Schutze (1995) shows a long-distance left and right context of a word as left vector and right vector and the dimensions of each vector are 50. MR ~ses Cosine as metric to measure the similarity between two words. To solve the sparseness of the data, he applies a singular value decomposition. Comparing with Brill,E.'s method, Schutze,H.</Paragraph>
    <Paragraph position="6"> takes 50 neighbors into account for each word.</Paragraph>
    <Paragraph position="7"> Chang and Chen (1995) proposed a simulated annealing method, the same as Jardino and Adda 's (1993). The pel-plexity, which is the inverse of the probability over the whole text, is measured. The new value of the perplexity and a control parameter Cp(Metropolis algorithm) will decide whether'a new classification ( obtained by moving only one word from its class to another, both word and class being randomly chosen) will replace a previous one. Compared to the two methods described above, this method attempts to optimize the clustering using perplexity as a global measure.</Paragraph>
    <Paragraph position="8"> Pereira, Tishby and Lee (1993) investigate how to factor word association tendencies into associations of words to certain hidden senses classes and associations between the classes themselves. More  specifically, they model senses as probabilistic concepts or clusters C with correspondin~ cluster membership probabilities P(Clw ) for each word w. That is, while most other class-based modeling techniques for natural language rely on &amp;quot;hard&amp;quot; Boolean classes, Pereira, F. et al. (1993) propose a method for &amp;quot;soft&amp;quot; class clustering. He suggests a deterministic annealing procedure for clustering. But as stated in their paper, they only considers the special case of classifying nouns according to the distribution as direct objects of verbs.</Paragraph>
    <Paragraph position="9"> To addless the problems and utilize the advantages of the methods presented above, we put forward a new algorithm to automatically classify the words.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML