File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/e06-2018_concl.xml

Size: 2,598 bytes

Last Modified: 2025-10-06 13:55:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2018">
  <Title>Exploring the Sense Distributions of Homographs</Title>
  <Section position="5" start_page="157" end_page="157" type="concl">
    <SectionTitle>
4 Conclusions and future work
</SectionTitle>
    <Paragraph position="0"> Our experiments showed that associations belonging to the same sense of a homograph have far higher co-occurrence counts than associations belonging to different senses. This is especially true when we look at the concordances of the homographs, but - to a somewhat lesser extend also when we look at the full corpus. The discrepancy between the two approaches can probably be enlarged by increasing the size of the corpus. However, further investigations are necessary to verify this claim.</Paragraph>
    <Paragraph position="1"> With the approach based on the concordances of the homographs best results were achieved with concordance widths that are about an order of magnitude larger than average sentence length. However, human performance shows that the context within a sentence usually suffices to disambiguate a word. A much larger corpus could possibly solve this problem as it should allow to reduce concordance width without loosing accuracy. However, since human language acquisition seems to be based on the reception of only in the order of 100 million words (Landauer &amp; Dumais, 1997, p. 222), and because the BNC already is of that size, there also must be another solution to this problem.</Paragraph>
    <Paragraph position="2"> Our suggestion is to not look at the co-occurrence frequencies of single word pairs, but at the average co-occurrence frequencies between several pairs derived from larger groups of words.</Paragraph>
    <Paragraph position="3"> Let us illustrate this by coming back to our example in the introduction, where we stated that context words such as finger and arm are typical of the hand meaning of palm, whereas coconut and oil are typical of its tree meaning. The sparse-data-problem may possibly prevent our expectation come true, namely that finger and arm co-occur more often than finger and coconut. But if we add other words that are typical of the hand meaning, e.g. hold or wrist, then an incidental lack of observed co-occurrences between a particular pair can be compensated by co-occurrences between other pairs. Since the number of possible pairs increases quadratically with the number of words that are considered, this should have a significant positive effect on the sparse-data-problem, which is to be examined in future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML