File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/p99-1044_evalu.xml
Size: 4,838 bytes
Last Modified: 2025-10-06 14:00:41
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1044"> <Title>Syntagmatic and Paradigmatic Representations of Term Variation</Title> <Section position="6" start_page="344" end_page="346" type="evalu"> <SectionTitle> 6 Evaluation </SectionTitle> <Paragraph position="0"> We provide two evaluations of term variant conflation. First, we calculate precision rates through a manual scanning of the variants. Secondly, we evaluate the numbers of variations extracted through the four experiments.</Paragraph> <Section position="1" start_page="344" end_page="346" type="sub_section"> <SectionTitle> Precision </SectionTitle> <Paragraph position="0"> Because of the large volumes of data, only experiments on the French corpus are evaluated. \[AGRIC\] + AGROVOC produces 2,739 variations and 2,485 of them are selected as correct. Since the number of synonym links proposed by Word97 is higher, the number of variants produced by \[AGRIC\] + Word97 is higher: 3,860. 3,110 of them are accepted after human inspection.</Paragraph> <Paragraph position="1"> The two experiments produce the same set of non-semantic variants (syntactic and morpho-syntactic variants). Associated values of precision are reported in Tables 4 and 5. The semantic variations are divided into two subsets: &quot;pure&quot; semantic variations and semantic variations involving a syntactic transformation and/or a morphological link. Their precisions are given in Tables 6 and 7.</Paragraph> <Paragraph position="2"> As far as precision is concerned, these tables show that variations are divided into two levels of quality. On the one hand, syntactic, morpho-syntactic and pure semantic variations are extracted with a high level of precision (above 78%, see the &quot;Total&quot; values in Tables 4 to 6). On the other hand, the traction (\[AGRIC\] corpus).</Paragraph> <Paragraph position="3"> texts in which words are disambiguated.</Paragraph> <Paragraph position="4"> Numbers of Variants Table 8 shows the numbers of term variants extracted by the four experiments. For each experiment and for each type of variation, three values are reported: the number of variants v of this type and two percentages indicating the ratio of these variants. The first percentage is ~ in which V is the total number of variants produced by this experiv in which T ment. The second percentage is is the number of (non-variant) term occurrences extracted by this experiment.</Paragraph> <Paragraph position="5"> morphology results in poor precision (55% precision in average with the AGROVOC semantic links and 29.4% precision with the Word97 links, see line &quot;Total&quot; in Table 7).</Paragraph> <Paragraph position="6"> The lower precision of hybrid variations is due to a cumulative effect of semantic shift through combined variations. For instance, former un rdseau continu (build a continuous network) is incorrectly extracted as a variant of formation permanente (continuing education) through a Noun-to-Verb variation with a semantic link between argument words. The verb former and the associated deverbal noun formation are two polysemous words. In formation permanente, the meaning is related to a human activity (to train) while, in former un rdseau continu, the meaning is related to a physical construction (to build).</Paragraph> <Paragraph position="7"> Despite the relatively poor precision of hybrid variations, the average precision of term conflation is high because hybrid variations only represent a small fraction of term variations (5.4% and 0.9%, see lines '% sem&quot; in Table 8 below). The average precision on \[AGRIC\] + Word97 is 79.8% and the average precision on \[AGRIC\] + AGROVOC is 91.1%.</Paragraph> <Paragraph position="8"> The exploitation of semantic links extracted from WordNet in term variant extraction does not suffer from the problem of ambiguity pointed out for query expansion in (Voorhees, 1998). The robustness to polysemy is due to the fact that we are dealing with multiword terms that build restricted linguistic con-The last line of Table 8 shows that variants represent a significant proportion of term occurrences (from 27.3% to 37.3%). The distribution of the different types of variants depends the semantic database and on the language under study. Word-Net 1.6 is a productive source of knowledge for the extraction of semantic variants: In the experiment \[MEDIC\] + WordNet, semantic variants represent 58.6% of the variants, while they only represent 4.9% of the variants in the \[AGRIC\] + AGROVOC experiment. These values are reported in the line &quot;Tot. Sem&quot; of Table 8. Such results confirm the relevance of non-specialized semantic links in the extraction of specialized semantic variants (Hamon et al., 1998).</Paragraph> </Section> </Section> class="xml-element"></Paper>