File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/e95-1003_metho.xml

Size: 9,264 bytes

Last Modified: 2025-10-06 14:14:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="E95-1003">
  <Title>Criteria for Measuring Term Recognition</Title>
  <Section position="3" start_page="18" end_page="19" type="metho">
    <SectionTitle>
3 How Can Recognition be
</SectionTitle>
    <Paragraph position="0"> Measured? Once a consensus has been reached about what is to be recognized, there must be some agreement concerning the way in which performance is to be measured. Fortunately, established performance measurements used in information retrieval - recall and precision - can be adapted quite readily for measuring the term-recognition task. These measures have, in fact, been used previously in measuring term recognition (Smadja, 1993; Bourigault, 1994; Lauriston, 1994). No study, however, adequately discusses how these measurements are applied to term recognition.</Paragraph>
    <Section position="1" start_page="18" end_page="19" type="sub_section">
      <SectionTitle>
3.1 Recall and Precision
</SectionTitle>
      <Paragraph position="0"> Traditionally, performance in document retrieval is measured by means of a few simple ratios (Salton, 1989). These are based on the premise that any given document in a collection is either pertinent or non-pertinent to a particular user's needs. There is no scale of relative pertinence. For a given user query, retrieving a pertinent document constitutes a hit, failing to retrieve a pertinent document constitutes a miss, and retrieving a non-pertinent document constitutes a false hit. Recall, the ratio of the number of hits to the number of pertinent documents in the collection, measures the effectiveness of retrieval. Precision, the ratio of the number of hits to the number of retrieved documents, measures the e~iciency of retrieval. The complement of recall is omission (misses/total pertinent). The complement of precision is noise (false hits/total retrieved).  Ideally, recall and precision would equal 1.0, omission and noise 0.0. Practical document retrieval involves a trade-off between recall and precision.</Paragraph>
      <Paragraph position="1"> The performance measurements in document retrieval are quite apparently applicable to term recognition. The basic premise of a pertinent/nonpertinent dichotomy, which prevails in document retrieval, is probably even better justified for terms than for documents. Unlike an evaluation of the pertinence of the content of a document, the term/nonterm distinction is based on a relatively simple and cohesive semantic contentS.User judgements of document pertinence would appear to be much more subjective and difficult to quantify.</Paragraph>
      <Paragraph position="2"> If all termforms were simple, i.e. single words, and only simple termforms were recognized, then using document retrieval measurements would be perfectly .straightforward. A manually bracketed term would give rise to a hit or a miss and an automatically recognized word would be a hit or a false hit. Since complex termforms are prevalent in sublanguage texts, however, further clarification is necessary. In particular, &amp;quot;hit&amp;quot; has to be defined more precisely. Consider the following sentence: The latest committee draft reports progress toward constitutional reform.</Paragraph>
      <Paragraph position="3"> A terminologist would probably recognize two terms in this sentence: commiLtee draft and constitutional reform. The termform of each is complex. Regardless of whether symbolic or statistical techniques are used, &amp;quot;hits&amp;quot; of debatable usefulness are apt to be produced by automatic term-recognition systems. A syntactically based system might have particular difficulty with the three consecutive cases of noun-verb ambiguity draft, reports, progress. A statistically based system might detect draft reports, since this cooccurrence might well be frequent as a termform elsewhere in the text. Consequently, the definition of &amp;quot;hit&amp;quot; needs further qualification.</Paragraph>
    </Section>
    <Section position="2" start_page="19" end_page="19" type="sub_section">
      <SectionTitle>
3.2 Perfect and Imperfect Recognition
</SectionTitle>
      <Paragraph position="0"> Two types of hits must be distinguished. A perfect hit occurs when the boundaries assigned by the term-recognition system coincide with those of a term's maximal termform (\[committee draft\] and \[constitutional reform\] above). An imperfect hit occurs when the boundaries assigned do not coincide with those of a term's maximal termform but contain at least one wordform belonging to a term's maximal termform. A hit is imperfect if bracketing either indudes spurious wordforms (\[latest committee draft\] Sln practice, terminologists have some difficulty agreeing on the exact delimitation of complex termforms.</Paragraph>
      <Paragraph position="1"> Still five experienced terminologists scanning a 2,861 word text were found to agree on the identity and boundsties of complex termforms three-quarters of the time (Lauriston, 1993).</Paragraph>
      <Paragraph position="2">  or \[committee draft reports\]), fails to bracket a term constituent (committee \[draft\])or both (committee \[draft reports\]). Bracketing a segment containing no wordform that is part of a term's maximal termform is, of course, a false hit (\[reports progress\]).</Paragraph>
      <Paragraph position="3"> The problematic case is clearly that of an imperfect hit. In calculating recall and precision, should imperfect hits be grouped with perfect hits, counted as misses, or somehow accounted for separately (Figure 2)? How do the perfect recall and precision ratios compare with imperfect recall and precision (including imperfect hits in the numerator) when these performance measurements are applied to real texts? Counting imperfectly recognized termforms as hits will obviously lead to higher ratios for recall and precision, but how much higher? To answer these questions, a complex-termform recognition algorithm based on weighted syntactic term-formation rules, the details of which are given in Lauriston (1993), was applied to a tagged 2,861 word text. The weightings were based on the analysis of a 117,000 word corpus containing 11,614 complex termforms as determined by manual bracketing.</Paragraph>
      <Paragraph position="4"> The recognition algorithm includes the possibility of weighting of the terminological strength of particular adjectives. This was carried out to produce the results shown in Figure 3.</Paragraph>
      <Paragraph position="5"> Recall and precision, both perfect and imperfect, were plotted as the algorithm's term-recognition threshold was varied. By choosing a higher threshold, only syntactically stronger links between adjacent words are considered &amp;quot;terminological links&amp;quot;. Thus the higher the threshold, the shorter the average complex termform, as weaker modifiers are</Paragraph>
      <Paragraph position="7"/>
    </Section>
    <Section position="3" start_page="19" end_page="19" type="sub_section">
      <SectionTitle>
Ratios
</SectionTitle>
      <Paragraph position="0"> stripped from the nucleus. Lower recall and higher precision can be expected as the threshold rises since only constituents that are surer bets are included in the maximal termform.</Paragraph>
      <Paragraph position="1"> This Figure 3 shows that both recall and precision scores are considerably higher when imperfect hits are included in calculating the ratios. As expected, raising the threshold results in lower recall regardless of whether the ratios are calculated for perfect or imperfect recognition. There is a marked reduction in perfect recall, however, and only a marginal reduction in imperfect recall. The precision ratios provide the most interesting point of comparison. As the threshold is raised, imperfect precision increases just as the principle of recall-precision tradeoff in document retrieval would lead one to expect. Perfect precision, on the other hand, actually declines slightly. The difference between perfect and imperfect precision (between the P-bar and p-bar in each group) increases appreciably as the threshold is raised. This difference is due to the greater number of recognized complex termforms either containing spurious words or only part of the maximal termform.</Paragraph>
      <Paragraph position="2"> Two conclusions can be drawn from Figure 3.</Paragraph>
      <Paragraph position="3"> Firstly, the recognition algorithm implemented is poor at perfect recognition (perfect recall ~, 0.70; perfect precision ~, 0.40) and only becomes poorer as more stringent rule-weighting is applied. Secondly, and more importantly for the purpose of this paper, Figure 3 shows that allowing for imperfect bracketing in term recognition makes it possible to obtain artificially high performance ratios for both recall and precision. Output that recognizes almost all terms but includes spurious words in complex termforms or fails short of recognizing the entire termform leaves a burdensome filtering task for the human user and is next to useless if the &amp;quot;user&amp;quot; is another level of automatic text processing. Only the exact bracketing of the maximal termform provides a useful standard for measuring and comparing the performance of term-recognition systems.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML