File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-3020_evalu.xml

Size: 3,853 bytes

Last Modified: 2025-10-06 13:59:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3020">
  <Title>Automatic Part-of-Speech Induction from Text</Title>
  <Section position="6" start_page="78" end_page="79" type="evalu">
    <SectionTitle>
4 Results and Evaluation
</SectionTitle>
    <Paragraph position="0"> Our results are presented as dendrograms which in contrast to 2-dimensional dot-plots have the advantage of being able to correctly show the true distances between clusters. The two dendrograms in figure 2 where both computed by applying the procedure as described in the previous section, with  the only difference that in generating the upper dendrogram the SVD-step has been omitted, whereas in generating the lower dendrogram it has been conducted. Without SVD the expected clusters of verbs, nouns and adjectives are not clearly separated, and the adjectives widely and rural are placed outside the adjective cluster. With SVD, all 50 words are in their appropriate clusters and the three discovered clusters are much more salient.</Paragraph>
    <Paragraph position="1"> Also, widely and rural are well within the adjective cluster. The comparison of the two dendrograms indicates that the SVD was capable of making appropriate generalizations. Also, when we look inside each cluster we can see that ambiguous words like suit, drop or brief are somewhat closer to their secondary class than unambiguous words.</Paragraph>
    <Paragraph position="2"> Having obtained the three expected clusters, the next investigation concerns the assignment of the ambiguous words to additional clusters. As described previously, this is done by computing differential vectors, and by assigning these to the most similar other cluster. Hereby for the cosine similarity we set a threshold of 0.8. That is, only if the similarity between the differential vector and its closest centroid was higher than 0.8 we assigned the word to this cluster and continued to compute differential vectors. Otherwise we assumed that the differential vector was caused by sampling errors and aborted the process of searching for additional class assignments.</Paragraph>
    <Paragraph position="3"> The results from this procedure are shown in table 2 where for each of the 50 words all computed classes are given in the order as they were obtained by the algorithm, i.e. the dominant assignments are listed first. Although our algorithm does not name the classes, for simplicity we interpret them in the obvious way, i.e. as nouns, verbs and adjectives. A comparison with WordNet 2.0 choices is given in brackets. For example, +N means that WordNet lists the additional assignment noun, and -A indicates that the assignment adjective found by the algorithm is not listed in WordNet.</Paragraph>
    <Paragraph position="4"> According to this comparison, for all 50 words the first reading is correct. For 16 words an additional second reading was computed which is correct in 11 cases. 16 of the WordNet assignments are missing, among them the verb readings for reform, suit, and rain and the noun reading for serve. However, as many of the WordNet assignments seem rare, it is not clear in how far the omissions can be attributed to shortcomings of the algorithm.</Paragraph>
    <Paragraph position="5">  accident N expensive A reform N (+V) belief N familiar A (+N) rural A birth N (+V) finance N V screen N (+V) breath N grow V N (-N) seek V (+N)  brief A N imagine V serve V (+N) broad A (+N) introduction N slow A V busy A V link N V spring N A V (-A) catch V N lovely A (+N) strike N V critical A lunch N (+V) suit N (+V) cup N (+V) maintain V surprise N V dangerous A occur V N (-N) tape N V discuss V option N thank V A (-A) drop V N pleasure N thin A (+V) drug N (+V) protect V tiny A empty A V (+N) prove V widely A N (-N) encourage V quick A (+N) wild A (+N) establish V rain N (+V)</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML