File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1016_intro.xml
Size: 4,713 bytes
Last Modified: 2025-10-06 14:06:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1016"> <Title>Automatic construction of a hypernym-labeled noun hierarchy from text</Title> <Section position="3" start_page="0" end_page="120" type="intro"> <SectionTitle> 2 Building the noun hierarchy </SectionTitle> <Paragraph position="0"> The first stage in constructing our hierarchy is to build an unlabeled hierarchy of nouns using bottom-up clustering methods (see, e.g., Brown et al. (1992)). Nouns are clustered based on conjunction and appositive data collected from the Wall Street Journal corpus. Some of the data comes from the parsed files 2-21 of the Wall Street Journal Penn Treebank corpus (Marcus et al., 1993), and additional parsed text was obtained by parsing the 1987 Wall Street Journal text using the parser described in Charniak et al.</Paragraph> <Paragraph position="1"> (1998).</Paragraph> <Paragraph position="2"> From this parsed text, we identified all conjunctions of noun phrases (e.g., &quot;executive vice-president and treasurer&quot; or &quot;scientific equipment, apparatus and disposables&quot;) and all appositives (e.g., &quot;James H. Rosenfield, a former CBS Inc. executive&quot; or &quot;Boeing, a defense contractor&quot;). The idea here is that nouns in conjunctions or appositives tend to be semantically related, as discussed in Riloff and Shepherd (1997) and Roark and Charniak (1998). Taking the head words of each NP and stemming them results in data for about 50,000 distinct nouns.</Paragraph> <Paragraph position="3"> A vector is created for each noun containing counts for how many times each other noun appears in a conjunction or appositive with it. We can then measure the similarity of the vectors for two nouns by computing the cosine of the angle between these vectors, as V*W cos (v, w) - Ivi Iwi To compare the similarity of two groups of nouns, we define similarity as the average of the cosines between each pair of nouns made up of one noun from each of the two groups.</Paragraph> <Paragraph position="5"> where v ranges over all vectors for nouns in group A, w ranges over the vectors for group B, and size(x) represents the number of nouns which are descendants of node x. We want to create a tree of all of the nouns in this data using standard bottom-up clustering techniques as follows: Put each noun into its own node. Compute the similarity between each pair of nodes using the cosine method. Find the two most similar nouns and combine them by giving them a common parent (and removing the child nodes from future consideration). We can then compute the new node's similarity to each other node by computing a weighted average of the similarities between each of its children and the other node.</Paragraph> <Paragraph position="6"> In other words, assuming nodes A and B have been combined under a new parent C, the similarity between C and any other node</Paragraph> <Paragraph position="8"> Once again, we combine the two most similar nodes under a common parent. Repeat until all nouns have been placed under a common ancestor.</Paragraph> <Paragraph position="9"> Nouns which have a cosine of 0 with every other noun are not included in the final tree. In practice, we cannot follow exactly that algorithm, because maintaining a list of the cosines between every pair of nodes requires a tremendous amount of memory. With 50,000 nouns, we would initially require a 50,000 x 50,000 array of values (or a triangular array of about half this size). With our current hardware, the largest array we can comfortably handle is about 100 times smaller; that is, we can build a tree starting from approximately 5,000 nouns.</Paragraph> <Paragraph position="10"> The way we handled this limitation is to process the nouns in batches. Initially 5,000 nouns are read in. We cluster these until we have 2,500 nodes. Then 2,500 more nouns are read in, to bring the total to 5,000 again, and once again we cluster until 2,500 nodes remain. This process is repeated until all nouns have been processed.</Paragraph> <Paragraph position="11"> Since the lowest-frequency nouns are clustered based on very little information and have a greater tendency to be clustered badly, we chose to filter some of these out.</Paragraph> <Paragraph position="12"> By reducing the number of nouns to be read, a much nicer structure is obtained. We now only consider nouns with a vector of length at least 2.</Paragraph> <Paragraph position="13"> There are approximately 20,000 nouns as the leaves in our final binary tree structure. Our next step is to try to label each of the internal nodes with a hypernym describing its descendant nouns.</Paragraph> </Section> class="xml-element"></Paper>