File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-2009_evalu.xml

Size: 4,417 bytes

Last Modified: 2025-10-06 13:59:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2009">
  <Title>Learning Meronyms from Biomedical Text</Title>
  <Section position="7" start_page="51" end_page="53" type="evalu">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> Table 3 shows the results of running PartEx in various configurations, and evaluating over the same ten folds. The first configuration, labelled BASE, used PartEx as described in Section 3.2, to give a recall of 0.80 and precision of 0.25. A failure analysis for this configuration is given in Table 2. It shows that the largest contribution to spurious relations (i.e. to lack of precision), was due to relations discovered by some pattern that is ambiguous for meronymy (category PATTERN). For example, the pattern &amp;quot;[noun] and [noun]&amp;quot; finds the incorrect meronym &amp;quot;median partOf lateral&amp;quot; from the text &amp;quot;median and lateral glossoepiglottic folds&amp;quot;. The algorithm learned the pattern from a correct meronym, and applying it in the next iteration, learned spurious relations, compounding the error.</Paragraph>
    <Paragraph position="1">  half the spurious relations across ten folds) and configuration FILT (all spurious relations in ten folds). In each case, a small number of relations are in two categories.  lations, mean precision (P) and mean recall (R) for various configurations, as discussed in the text.</Paragraph>
    <Paragraph position="2"> The bulk of the spurious results of this type were learnt from patterns using the tokens and, is, and or.</Paragraph>
    <Paragraph position="3"> This problem needs a principled solution, perhaps based on pruning patterns against a held-out portion of training data, or by learning ambiguous patterns from a large general corpus. Such a solution is being developed. In order to mimic it for the purpose of these experiments, a filter was built to remove patterns derived from problematic contexts. Table 3 shows the results of this change, as configuration FILT: precision rose to 0.43, and recall dropped. All other experiments reported used this filter.</Paragraph>
    <Paragraph position="4"> A failure analysis of missing relations from configuration FILT is shown in Table 1. The drop in recall is explained by PartEx filtering ambiguous patterns. The biggest contribution to lack of recall was over-specific patterns (for example, the pattern &amp;quot;[term] is part of [term]&amp;quot; would not identify the meronym in &amp;quot;finger is a part of the hand&amp;quot;. Generalisation of patterns is essential to improve recall. Improvements could also be made with more sophisticated context, and by examining compounds.</Paragraph>
    <Paragraph position="5"> A failure analysis of spurious relations for configuration FILT is shown in Table 2. The biggest impact on precision was made by relations that could be considered correct, as discussed in Section 4.1.</Paragraph>
    <Paragraph position="6"> A corrected precision of 0.58 was calculated, shown as configuration CORR in Table 3. Two other factors affecting precision can be deduced from Table 2. First, some relations were encoded in deeper linguistic structures than those considered (category DEEP). Improvements could be made to precision by considering these deeper structures. Second, some spurious relations were found between fragments of terms, due to failure of term recognition.</Paragraph>
    <Paragraph position="7"> The algorithm used by PartEx is iterative, the implementation completing in two iterations. Configurations ITR1 and ITR2 in Table 3 show that both recall and precision increase as learning progresses.</Paragraph>
    <Paragraph position="8"> Four other experiments were run, to assess the impact of term recognition. Results are shown in Table 3. Configuration TERM continued to label terms in the training phase, but did not label new terms found during iteration (as discussed in Section 3.1).</Paragraph>
    <Paragraph position="9">  TOK and NP used no term recognition, instead finding relations between tokens and noun phrases respectively (the gold standard being amended to reflect the new task). POS omitted part-of-speech tags from patterns. In all cases, there was a large increase in spurious results, impacting precision. Term recognition seemed to provide a constraint in relation discovery, although the nature of this is unclear.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML