File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1087_evalu.xml

Size: 8,382 bytes

Last Modified: 2025-10-06 13:59:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1087">
  <Title>Enhancing automatic term recognition through recognition of variation</Title>
  <Section position="7" start_page="4" end_page="4" type="evalu">
    <SectionTitle>
4 Evaluation and discussion
</SectionTitle>
    <Paragraph position="0"> In order to assess the effectiveness of incorporating specific types of term variation into ATR, we compared the performance of the baseline C-value method (without considering variations) with the approach including recognition and conflation of term variants. Here we are not interested in an absolute measure of the ATR performance, but rather in the comparison of results obtained through handling different variation types.</Paragraph>
    <Paragraph position="1"> We conducted two sets of experiments: in the first experiment, we analysed the incorporation of term candidates resulting from considering term variations individually, while, in the second, we experimented with the integration of combined variations in the ATR process.</Paragraph>
    <Paragraph position="2"> The evaluation was carried out using the GENIA corpus (GENIA, 2004), which contains 2,000 abstracts in the biomedical domain with 76,592 manually marked occurrences of terms. These occurrences (which include different term variants) correspond to 29,781 different, unique terms. Each occurrence of a term in the corpus (except occurrences of acronyms) is linked to the corresponding &amp;quot;normalised&amp;quot; term (typically a singular form), while coordinated terms are identified, marked and normalised within term coordinations. A third of occurrences of GENIA terms are affected by inflectional variations, and almost half of GENIA terms have inflectional variants appearing in the corpus. On the other hand, only 0.5% of terms contain a preposition, while 2% of all term occurrences are coordinated, involving 9% of distinct GENIA terms (for a detailed analysis of GENIA terms see (Nenadic et al., 2004)).</Paragraph>
    <Paragraph position="3"> We used the list of GENIA terms as a gold standard for the evaluation. Since our ATR method produces a ranked list of suggested synterms, we considered precision at fixed rank cut-offs (intervals): precision was calculated as the ratio between the number of correctly recognised terms and the total number of entities recognised in a given interval (where an interval included all terms from the top ranked synterms).</Paragraph>
    <Paragraph position="4"> ++ The baseline method (original C-value) was treated in the same way, as term candidates suggested by the original C-value could be seen as singleton synterms. In order to estimate the influence on recall, we also used all variants from suggested synterms.</Paragraph>
    <Paragraph position="5"> The incorporation of individual variations affecting term constituents into ATR had considerable positive effects, especially on the most frequently occurring terms (see Figures 2a and 2b): for some intervals, inflectional variants, for example, improved precision by almost 50%.</Paragraph>
    <Paragraph position="6"> Similarly, the integration of acronyms improved precision, in particular for frequent terms (up to 70%), as acronyms are typically introduced for such terms. As one would expect, the combined constituent-level variations further improved interval precisions compared both to the baseline method and individual variations (see Figure 2c). However, the incorporation of structural variants (in particular for prepositional terms) negatively influenced precision compared to the baseline method, as many false candidates were introduced. In order to assess the quality of extracted prepositional term candidates, we evaluated a set of the 117 most frequently occurring candidates with prepositions: 80% of suggested expressions were deemed relevant by domain experts, although they were not included in the gold GENIA standard (such as expression of genes or binding of NF kappa B). Still, the recognition of prepositional term candidates is difficult as they are infrequent and there are no clear morphosyntactic cues that can differentiate between terminologically relevant and irrelevant prepositional phrases.</Paragraph>
    <Paragraph position="7"> The incorporation of coordinated term candidates had only marginal influence on precision, mainly because they were not frequent in the GENIA corpus. Furthermore, simple term conjunctions ++ It was an open question whether to count the recognition of each term form (e.g. singular and plural forms, an acronym and its EF, prepositional and non-prepositional forms) separately (i.e. as two positive &amp;quot;hits&amp;quot;) or as one positive &amp;quot;hit&amp;quot; (see also (Church, 1995)). Since the evaluation of the baseline method (original C-value) typically counts such hits separately, we decided to follow this approach, and consequently count all positive hits from synterms.</Paragraph>
    <Paragraph position="8"> were far more frequent than term coordinations, which made their extraction highly ambiguous.</Paragraph>
    <Paragraph position="9"> Still, using only the patterns from Table 3, we have correctly extracted 35.76% of all GENIA coordinated terms, with more than a half of all suggested candidates being found among those that appeared exclusively in coordinations. However, these patterns also generated a number of false coordination expressions, and consequently a number of false term candidates.</Paragraph>
    <Paragraph position="10"> The integration of term variants was also useful for re-ranking of true positive term candidates: the combined rank was typically higher than the separate ranks of term variants. Furthermore, some terms, not suggested by the baseline method at all, were ranked highly when variants were conflated (for example, the term T-lymphocyte was recognised only as a coordinated term candidate, while replication of HIV-1 was extracted only by considering prepositional term candidates). In order to estimate the overall influence on recall of ATR, we used all terms from the respective synterms (see Table 4 for the detailed results). In general, the incorporation of inflectional variants increased recall by 1/4, while acronyms improved recall by almost 2/3 when only the most frequent terms were considered. It is interesting that acronym acquisition can further improve recall by extracting variants that have more complex internal structures (such as EFs containing prepositions (REA = repressor of estrogen activity) and/or coordinations (SMRT = silencing mediator of retinoic and thyroid receptor)). Prepositional and coordination candidate terms had some influence on recall, in particular as they increased the likelihood of some candidates to be suggested as terms. Low recall of term coordinations may be increased by adding more patterns (which would probably negatively affect precision).</Paragraph>
    <Paragraph position="11"> Summarising, experiments performed on the GENIA corpus have shown that the incorporation of term variations into the ATR process resulted in significantly better precision and recall. In general, acronyms and inflectional unification are the most important variation types (at least in the domain of biomedicine). Individually, they increased precision by 20-70% for the top ranked synterm intervals, while recall is generally improved, in some cases up to 25%. Other term variations had only marginal influence on the performance, mainly because they were infrequent in the test corpus (compared to the total number of term occurrences, and not only with regard to specific individual candidates, but also in general). For these variations, larger-scale corpora may show their stronger influence.</Paragraph>
    <Paragraph position="12">  of the baseline method (=1) to ATR precisions with integrated recognition of individual term variants  of the baseline method (=1) to ATR precisions with integrated recognition of individual term variants  of the baseline method (=1) to ATR precisions with integrated recognition of combined term variants (terms with frequency &gt; 0) term sets prep. coord. infl. acro.</Paragraph>
    <Paragraph position="13"> freq. [?] 5 +5.30% +12.42% +17.52% +60.49% freq. &gt; 0 +2.36% +2.53% +25.25% +8.52% Table 4: Improvement in recall when variations are considered as an integral part of ATR</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML