File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2124_intro.xml
Size: 1,988 bytes
Last Modified: 2025-10-06 14:06:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2124"> <Title>Word Clustering and Disambiguation Based on Co-occurrence Data</Title> <Section position="3" start_page="749" end_page="749" type="intro"> <SectionTitle> 2 Probability Model </SectionTitle> <Paragraph position="0"> Suppose available to us are co-occurrence data over two sets of words, such as the sample of verbs and the head words of their direct objects given in Fig. 1.</Paragraph> <Paragraph position="1"> Our goal is to (hierarchically) cluster the two sets of words so that words having similar co-occurrence patterns are classified in the same class, and output a thcsaurus for each set of words.</Paragraph> <Paragraph position="2"> We can view this problem as that of estimating the best probability model from among a class of models of (probability distributions) which can give rise to the co-occurrence data.</Paragraph> <Paragraph position="3"> In this paper, we consider the following type of probability models. Assume without loss of generality that the two sets of words are a set of nouns A/&quot; and a set of verbs ~;. A partition T,~ of A/&quot; is a set of noun-classes satisfying UC,,eT,,Cn = A/&quot; and VCi, Cj E Tn, Ci CI Q = 0. A partition Tv of 1; can be defined analogously. We then define a probability model of noun-verb co-occurrence by defining the joint probability of a noun n and a verb v as the product of the joint probability of the noun and verb classes that n and v belong to, and the conditional probabilities of n and v given their classes, that is,</Paragraph> <Paragraph position="5"> where Cn and Cv denote the (unique) classes to which n and v belong. In this paper, we refer to this model as the 'hard clustering model,' since it is based on a type of clustering in which each word can belong to only one class. Fig. 2 shows an example of the hard clustering model that can give rise to the co-occurrence data in Fig. 1.</Paragraph> </Section> class="xml-element"></Paper>