File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1161_evalu.xml
Size: 7,573 bytes
Last Modified: 2025-10-06 13:59:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1161"> <Title>Acquisition of Semantic Classes for Adjectives from Distributional Evidence</Title> <Section position="5" start_page="0" end_page="35" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> The experiments were performed using CLUTO,4 a free clustering toolkit. We tested the several clustering approaches available in the tool: two hierarchical and one flat algorithm, one of them agglomerative and the other two partitional, with several criterion functions, always using the cosine distance measure. Two different combinations of features and feature normalisations were tested for each parameter. The best result was obtained with the k-means algorithm and the parameters listed in Table 4. However, the results were quite robust through all parametrisations.5 un, bin bas, ev, obj number of clusters 2 3 number of features 10 32 feature normalisation none a16a18a17a20a19a22a21a24a23a25a27a26a29a28a31a30a32a16a18a17a33a19a34a21a35a28</Paragraph> <Section position="1" start_page="0" end_page="35" type="sub_section"> <SectionTitle> 4.1 Unary vs. binary </SectionTitle> <Paragraph position="0"> Figure 1 depicts the clustering solution for the unary/binary parameter.</Paragraph> <Paragraph position="1"> The agreement between GS and this clustering solution resulted in 0.97% and a11 =0.87 (a11 ranging from 0.67 to 0.89 with human judges), thus fully comparable to the interjudge agreement.</Paragraph> <Paragraph position="2"> As can be seen in Figure 1, all binary adjectives are together in cluster 1, while most unary ones are raw percentage a37a39a38a41a40a24a42a44a43a45a47a46a33a48 was divided by the prior probability of the feature a37a39a38a41a40a24a42a49a48 , so that the distance from the expected percentage, rather than the percentage value as such, was obtained. null vs. unary (gray) and binary (white) adjectives.</Paragraph> <Paragraph position="3"> in cluster 0 (only 2 unary adjectives were misclassified as binary). The clustering clearly recognizes a majority of objects bearing no complement and a minority having a regular complement. This parameter, then, is quite easy and reliable to obtain. Indeed, the most relevant features for each cluster matched very closely the hypotheses discussed in Section 2. They are depicted in Table 5.</Paragraph> <Paragraph position="4"> cl high values low values</Paragraph> <Paragraph position="6"> resented as in examples 1 and 2).</Paragraph> <Paragraph position="7"> Objects in cluster 1, corresponding to binary adjectives, have high values for most of the features containing a preposition after the adjective (observe +1pe, 'preposition to the right'). Objects in cluster 0 (unary adjectives), symmetrically, have low values for these features, and high values for the default adjective position in Catalan (directly postnominal: -1cn). The behaviour of the objects in cluster 0 (the biggest cluster by far) presents more cohesion than that of the objects in cluster 1, which have a medium mean value for most features. That is, binary adjectives do not have low values in those features that characterize unary ones, but still significantly lower.</Paragraph> <Paragraph position="8"> 4.2 Basic vs. event vs. object Figure 2 depicts the clustering solution for the basic/event/object parameter.</Paragraph> <Paragraph position="9"> The agreement between the GS and the clustering solution was much lower than for the unary/binary vs. basic (white), event (light gray) and object (dark gray) adjectives.</Paragraph> <Paragraph position="10"> parameter: 0.73% and a11 =0.56 (+/-0.14 at 95% c.i.; a11 ranging from 0.51 to 0.57 with human judges).</Paragraph> <Paragraph position="11"> Our diagnosis is that this is due to the lack of syntactic homogeneity of the event-adjective class, which migh be due to a wrong characterisation of the class. As can be seen in Figure 2, while object adjectives are all in cluster 0 and basic adjectives are concentrated in cluster 2, event adjectives are scattered through clusters 1 and 2. In fact, cluster 1 contains seven out of the eight binary adjectives in GS, and only four unary ones. It seems, then, that what is being spotted in cluster 1 are again binary, rather than event, adjectives. If we look at the morphological type, it turns out that six out of seven event adjectives in cluster 1 (against three out of seven in cluster 2) are participles. A tentative conclusion we can draw is that participles and other kinds of deverbal adjectives do not behave alike; moreover, it seems that other kinds of deverbal adjectives behave quite similarly to basic adjectives.</Paragraph> <Paragraph position="12"> It should be remarked, however, that although event adjectives do not form a homogeneous class with respect to the features used, basic and object adjectives are quite clearly distinguished from each other in the clustering solution.</Paragraph> <Paragraph position="13"> As for the features that were most relevant for each cluster, listed in Table 6, they confirm the analysis just made and again match the hypotheses discussed in Section 2.</Paragraph> <Paragraph position="14"> Lemmata in cluster 0 (object adjectives) have high values for the expected &quot;rigid&quot; position, right after the noun (-1cn) and before any other adjective cl high values low values</Paragraph> <Paragraph position="16"> (represented as in examples 1 and 2 above).</Paragraph> <Paragraph position="17"> (+1aj). They are further characterised by not occuring as predicates (low value for -1ve). As for objects in cluster 1, their features are very similar to the binary cluster 1 above. Finally, cluster 2 (basic adjectives) presents the predicted flexibility: its adjectives occur in coordinating constructions (-1co, +1co) and appear further from the head noun than other adjectives (low value for -1cn+1aj).</Paragraph> </Section> <Section position="2" start_page="35" end_page="35" type="sub_section"> <SectionTitle> 4.3 What about morphology? </SectionTitle> <Paragraph position="0"> One of the hypotheses we wanted to test, as stated in Section 2.4, is that syntactic information is more reliable than morphological information in order to establish semantic classes for adjectives. We therefore expect agreement between the clustering solution and GS to be higher than the agreement with a classification based on morphological class.</Paragraph> <Paragraph position="1"> From the manual annotation in Sanrom`a (2003), we mapped the classes as in Table 7, following the discussion in Section 2.6 The agreement between this classification and the GS was 0.65% and a11 =0.49, much lower than the agreement between clustering and GS reported above (0.73% and a11 0.56).</Paragraph> <Paragraph position="2"> Actually, 13 out of 35 denominal adjectives, 7 out of 13 deverbal adjectives and 5 out of 15 participles were considered to be basic in the GS. Most of these mismatches are caused by changes in meaning (e.g. mec`anic, 'mechanical' does not only mean 'related to mechanics', but 'monotone'). The morphological mapping works best for nonderived adjectives: 14 out of 16 were basic in denotation (the remaining two were classified as object). Thus, our hypothesis seems to be backed up by the data available.</Paragraph> <Paragraph position="3"> 6Note that this test cannot be performed for the unary/binary parameter, for there is no clear hypothesis with respect to the morphology-semantics mapping.</Paragraph> </Section> </Section> class="xml-element"></Paper>