File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2124_intro.xml

Size: 1,988 bytes

Last Modified: 2025-10-06 14:06:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2124">
  <Title>Word Clustering and Disambiguation Based on Co-occurrence Data</Title>
  <Section position="3" start_page="749" end_page="749" type="intro">
    <SectionTitle>
2 Probability Model
</SectionTitle>
    <Paragraph position="0"> Suppose available to us are co-occurrence data over two sets of words, such as the sample of verbs and the head words of their direct objects given in Fig. 1.</Paragraph>
    <Paragraph position="1"> Our goal is to (hierarchically) cluster the two sets of words so that words having similar co-occurrence patterns are classified in the same class, and output a thcsaurus for each set of words.</Paragraph>
    <Paragraph position="2">  We can view this problem as that of estimating the best probability model from among a class of models of (probability distributions) which can give rise to the co-occurrence data.</Paragraph>
    <Paragraph position="3"> In this paper, we consider the following type of probability models. Assume without loss of generality that the two sets of words are a set of nouns A/&amp;quot; and a set of verbs ~;. A partition T,~ of A/&amp;quot; is a set of noun-classes satisfying UC,,eT,,Cn = A/&amp;quot; and VCi, Cj E Tn, Ci CI Q = 0. A partition Tv of 1; can be defined analogously. We then define a probability model of noun-verb co-occurrence by defining the joint probability of a noun n and a verb v as the product of the joint probability of the noun and verb classes that n and v belong to, and the conditional probabilities of n and v given their classes, that is,</Paragraph>
    <Paragraph position="5"> where Cn and Cv denote the (unique) classes to which n and v belong. In this paper, we refer to this model as the 'hard clustering model,' since it is based on a type of clustering in which each word can belong to only one class. Fig. 2 shows an example of the hard clustering model that can give rise to the co-occurrence data in Fig. 1.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML