File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-1006_concl.xml
Size: 2,771 bytes
Last Modified: 2025-10-06 13:57:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1006"> <Title>A METHOD FOR IMPROVING AUTOMATIC WORD CATEGORIZATION</Title> <Section position="7" start_page="44" end_page="44" type="concl"> <SectionTitle> 5 Discussion And Conclusion </SectionTitle> <Paragraph position="0"> It can be claimed that the results obtained in this research are encouraging. Although the corpus used for the clustering is quite small compared to other researches, the clusters formed seem to represent the linguistic categories. It is believed that the incorrect ones are due to the poorness of the knowledge conveyed through the corpus. With a larger training data, an increase in the convergence of frequencies, thus an increase in the quality of clusters is expected.</Paragraph> <Paragraph position="1"> Since the distance function depends on only the difference of the bigram statistics, the running time of the algorithm is quite low compared to algorithms using mutual information. Though the complexity of the two algorithms are the same there is an increase in the efficiency due to the lack of time consuming mathematical operations like division and multiplication needed to calculate the mutual information of the whole corpus.</Paragraph> <Paragraph position="2"> This research has focussed on adding fuzziness to the categorization process. Therefore different similarity metrics have not been tested for the algorithm. For further research the algorithm could be tested with different distance metrics. The metrics from the statistical theory given in (de Marcken, 1996) could be used to improve the algorithm. Also the algorithm could be used to infer the phrase structure of a natural language. Finch (Finch, 1993) again uses the mutual information to find out such structures. Using fuzzy membership degrees could be another way to repeat the same process. To find out the phrases, most frequent sentence segments of some length could be collected from a corpus. In addition to the frequencies and bigrams of words, the statistics for these frequent segments could be gathered and then they could also be passed to the clustering inference mechanism and the resulting clusters would then be expected to hold such phrases together with the words.</Paragraph> <Paragraph position="3"> To conclude, it can claimed that automatic word categorization is the initial step for the acquisition of the structure in a natural language and the same method could be used with modifications and improvements to find out more abstract structures in the language and moving this abstraction up to the sentence level succesfuly might make it possible for Korkmaz ~ (/~oluk a computer to acquire the whole grammar of any natural language automatically.</Paragraph> </Section> class="xml-element"></Paper>