File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-1326_concl.xml

Size: 2,382 bytes

Last Modified: 2025-10-06 13:52:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1326">
  <Title>One Sense per Collocation and Genre/Topic Variations</Title>
  <Section position="10" start_page="213" end_page="213" type="concl">
    <SectionTitle>
10 -Conclusions
</SectionTitle>
    <Paragraph position="0"> This paper shows that the one sense per collocation hypothesis is weaker for fine-grained word sense distinctions (e.g. those in WordNet): from the 99% precision mentioned for 2-way ambiguities in (Yarowsky, 1993) we drop to 70% figures. These figures could perhaps be improved using more available data.</Paragraph>
    <Paragraph position="1"> We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to other, following genre and topic variations. This explains the low results when performing word sense disambiguation across corpora. In fact, we demonstrated that when two independent corpora share a related genre/topic, the word sense disambiguation results would be better.</Paragraph>
    <Paragraph position="2"> This has considerable impact in future work on word sense disambiguation, as genre and topic are shown to be crucial parameters. A system trained on a specific genre/topic would have difficulties to adapt to new genre/topics.</Paragraph>
    <Paragraph position="3"> Besides, methods that try to extend automatically the amount of examples for training need also to account for genre and topic variations.</Paragraph>
    <Paragraph position="4"> As a side effect, we have shown that the results on usual WSD exercises, which mix training and test data drawn from the same documents, are higher than those from a more realistic setting.</Paragraph>
    <Paragraph position="5"> We also discovered several hand-tagging errors, which distorted extracted collocations. We did not evaluate the extent of these errors, but they certainly affected the performance on cross-corpora tagging.</Paragraph>
    <Paragraph position="6"> Further work will focus on evaluating the separate weight of genre and topic in word sense disambiguation performance, and on studying the behavior of each particular word and features through genre and topic variations. We plan to devise ways to integrate genre/topic parameters into the word sense disambiguation models, and to apply them on a system to acquire training examples automatically.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML