File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1025_intro.xml
Size: 2,993 bytes
Last Modified: 2025-10-06 14:01:12
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1025"> <Title>Methods for the Qualitative Evaluation of Lexical Association Measures</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 The Qualitative Evaluation of </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Association Measures 2.1 State-of-the-art </SectionTitle> <Paragraph position="0"> A standard procedure for the evaluation of AMs is manual judgment of the a6 -best candidates identified in a particular corpus by the measure in question. Typically, the number of true positives (TPs) where several other AMs are discussed as well.</Paragraph> <Paragraph position="1"> among the 50 or 100 (or slightly more) highest ranked word combinations is manually identified by a human evaluator, in most cases the author of the paper in which the evaluation is presented. This method leads to a very superficial judgment of AMs for the following reasons: (1) The identification results are based on small subsets of the candidates extracted from the corpus. Consequently, results achieved by individual measures may very well be due to chance (cf. sections 4.1 and 4.2), and evaluation with respect to frequency strata is not possible (cf. section 4.3). (2) For the same reason, it is impossible to determine recall values, which are important for many practical applications. (3) The introduction of new measures or changes to the calculation methods require additional manual evaluation, as new a6 -best lists are generated.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Requirements </SectionTitle> <Paragraph position="0"> To improve the reliability of the evaluation results, a number of properties need to be controlled. We distinguish between two classes: (1) Characteristics of the set of candidate data employed for collocation identification: (i) the syntactic homogeneity of the base data, i.e., whether the set of candidate data consists only of adjective-noun, noun-verb, etc. pairs or whether different types of word combinations are mixed; (ii) the grammatical status of the individual word combinations in the base set, i.e., whether they are part of or constitute a phrase or simply co-occur within a given text window; (iii) the percentage of TPs in the base set, which is typically higher among high-frequency data than among low-frequency data.</Paragraph> <Paragraph position="1"> (2) The evaluation strategies applied: Instead of examining only a small sample of a6 -best candidates for each measure as it is common practice, we make use of recall and precision values for a6 -best samples of arbitrary size, which allows us to plot recall and precision curves for the whole set of candidate data. In addition, we compare precision curves for different frequency strata.</Paragraph> </Section> </Section> class="xml-element"></Paper>