File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1006_intro.xml
Size: 910 bytes
Last Modified: 2025-10-06 14:06:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1006"> <Title>A METHOD FOR IMPROVING AUTOMATIC WORD CATEGORIZATION</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper presents a new approach to automatic word categorization which improves both the efficiency of the algorithm and the quality of the formed clusters. The unigram and the bigram statistics of a corpus of about two million words are used with an efficient distance function to measure the similarities of words, and a greedy algorithm to put the words into clusters.</Paragraph> <Paragraph position="1"> The notions of fuzzy clustering like cluster prototypes, degree of membership are used to form up the clusters. The algorithm is of unsupervised type and the number of clusters are determined at run-time.</Paragraph> </Section> class="xml-element"></Paper>