File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2103_concl.xml
Size: 2,754 bytes
Last Modified: 2025-10-06 13:54:20
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2103"> <Title>Linguistic Preprocessing for Distributional Classification of Words</Title> <Section position="8" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> In this study we examined the impact which linguistic preprocessing of distributional data produce on the effectiveness and efficiency of semantic classification of nouns.</Paragraph> <Paragraph position="1"> Our study extends previous work along the following lines. First, we have compared different types of syntactic dependencies of the target noun in terms of the informativeness of the distributional features constructed from them. We find that the most useful dependencies are the adjectives and nouns used as attributes to the target nouns and the verbs near which the target nouns are used as direct or prepositional objects. The most effective representation overall is obtained when using all the syntactic dependencies of the noun. We find that it is clearly more advantageous than the windowing technique both in terms of effectiveness and efficiency. The combination of the attribute and object dependencies also produces very good classification accuracy, which is only insignificantly worse than that of the combination of all the dependency types, while using several times more compact feature space.</Paragraph> <Paragraph position="2"> We further looked at the influence of stemming and lemmatization of context words on the performance. The study did not reveal any considerable differences in effectiveness obtained by stemming or lemmatization of context words versus the use of their original forms. Lemmatization, however, allows to achieve the greatest reduction of the feature space. Similarly, the removal of rare word co-occurrences from the training data could not be shown to consistently improve effectiveness, but was very beneficial in terms of dimensionality reduction, notably for features corresponding to word collocations.</Paragraph> <Paragraph position="3"> Finally, we examined whether morphological decomposition of context words helps to obtain more informative features, but found that indiscriminative decomposition of all context words into morphemes and using them as separate features actually more often decreases performance rather than increases it. These results seem to indicate that morphological analysis of context words should be accompanied by some feature selection procedure, which would identify those affixes which are too general and can be safely stripped off and those which are sufficiently specific and whose unity with the root best captures relevant context information.</Paragraph> </Section> class="xml-element"></Paper>