File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/90/j90-1003_concl.xml

Size: 3,943 bytes

Last Modified: 2025-10-06 13:56:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="J90-1003">
  <Title>X and Y Separation Relation Word x Word y Mean Variance</Title>
  <Section position="10" start_page="0" end_page="0" type="concl">
    <SectionTitle>
9 CONCLUSIONS
</SectionTitle>
    <Paragraph position="0"> We began this paper with the psycholinguistic notion of word association norm, and extended that concept toward the information theoretic definition of mutual information.</Paragraph>
    <Paragraph position="1"> This provided a precise statistical calculation that could be applied to a very large corpus of text to produce a table of associations for tens of thousands of words. We were then able to show that the table encoded a number of very interesting patterns ranging from doctor.., nurse to save ....from. We finally concluded by showing how the patterns in the association ratio table might help a lexicographer organize a concordance.</Paragraph>
    <Paragraph position="2"> In point of fact, we actually developed these results in basically the reverse order. Concordance analysis is still extremely labor-intensive and prone to errors of omission.</Paragraph>
    <Paragraph position="3"> The ways that concordances are sorted don't adequately support current lexicographic practice. Despite the fact that a concordance is indexed by a single word, often lexicographers actually use a second word such as from or an equally common semantic concept such as a time adverbial to decide how to categorize concordance lines. In other words, they use two words to triangulate in on a word sense.</Paragraph>
    <Paragraph position="4"> This triangulation approach clusters concordance lines together into word senses based primarily on usage (distribu28 Computational Linguistics Volume 16, Number 1, March 1990 Kenneth Church and Patrick Hanks Word Association Norms, Mutual Information, and Lexicography tional evidence), as opposed to intuitive notions of meaning.</Paragraph>
    <Paragraph position="5"> Thus, the question of what is a word sense can be addressed with syntactic methods (symbol pushing), and need not address semantics (interpretation), even though the inventory of tags may appear to have semantic values.</Paragraph>
    <Paragraph position="6"> The triangulation approach requires &amp;quot;art.&amp;quot; How does the lexicographer decide which potential cut points are &amp;quot;interesting&amp;quot; and which are merely due to chance? The proposed association ratio score provides a practical and objective measure that is often a fairly good approximation to the &amp;quot;art.&amp;quot; Since the proposed measure is objective, it can be applied in a systematic way over a large body of material, steadily improving consistency and productivity.</Paragraph>
    <Paragraph position="7"> But on the other hand, the objective score can be misleading. The score takes only distributional evidence into account. For example, the measure favors set ... for over set ... down; it doesn't know that the former is less interesting because its semantics are compositional. In addition, the measure is extremely superficial; it cannot cluster words into appropriate syntactic classes without an explicit preprocess such as Church's parts program or Hindle's parser. Neither of these preprocesses, though, can help highlight the &amp;quot;natural&amp;quot; similarity between nouns such as picture and photograph. Although one might imagine a preprocess that would help in this particular case, there will probably always be a class of generalizations that are obvious to an intelligent lexicographer, but lie hopelessly beyond the objectivity of a computer.</Paragraph>
    <Paragraph position="8"> Despite these problems, the association ratio could be an important tool to aid the lexicographer, rather like an index to the concordances. It can help us decide what to look for; it provides a quick summary of what company our words do keep.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML