XML Viewer - e06-1018

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/e06-1018_concl.xml
Size: 2,487 bytes
Last Modified: 2025-10-06 13:55:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1018">
  <Title>Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation</Title>
  <Section position="6" start_page="142" end_page="143" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> It has been shown that the approach presented in this work enables automatic and knowledge-free word sense induction on a given corpus with high precision and sufficient recall values. The induced senses of the words are inherently domain-specific to the corpus used. Furthermore, the induced senses are only the most apparent ones while the type of ambiguity matters less than expected. But there is a clear preference for topical distinctions over syntactic ambiguities. The latter effect is due to the underlying bag-of-words model, hence alternative contextual representations might yield different (as opposed to better/worse) results. This bag-of-words limitation also implies some senses to be found that would be considered as spurious in other circumstances. For example, the word challenger induces 5 senses, three of them describing the opponent in a game. The differences found are strong, however, as the senses distinguished are between a chess-challenger, a Grand Prix challenger and a challenger in boxing, each have a large set of specific words distinguishing the senses.</Paragraph>
    <Paragraph position="1">  There are several questions that remain open.</Paragraph>
    <Paragraph position="2"> As the frequency of a word has a great impact on the possibility to disambiguate it correctly using the presented methods, the question is to what extent corpus size plays a role in this equation as compared to balancedness of the corpus and therefore the senses to be found. Another question is connected to the limitation of the presented algorithm which requires that any sense to be induced has to be representable by a rather large amount of words. The question then is, whether this (or any other similar) algorithm can be improved to discern 'small' senses from random noise. A combination with algorithms finding collocational usages of words probably offers a feasible solution. The evaluation method employed can be used for automatic optimization of the algorithm's own parameters using genetic algorithms. Moreover, it would be interesting to employ genetic programming in order to let an optimal word sense induction algorithm design itself.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML