File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/e06-2021_evalu.xml
Size: 3,014 bytes
Last Modified: 2025-10-06 13:59:33
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-2021"> <Title>Automatic Acronym Recognition</Title> <Section position="5" start_page="168" end_page="169" type="evalu"> <SectionTitle> 4 Evaluation and Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="168" end_page="168" type="sub_section"> <SectionTitle> 4.1 Evaluation Corpus </SectionTitle> <Paragraph position="0"> The data set used in this experiment consists of 861 acronym-definition pairs. The set was extracted from Swedish medical texts, the MEDLEX corpus (Kokkinakis, 2006) and was manually annotated using XML tags. For the majority of the cases there exist one acronym-definition pair per sentence, but there are cases where two or more pairs can be found.</Paragraph> </Section> <Section position="2" start_page="168" end_page="168" type="sub_section"> <SectionTitle> 4.2 Experiment and Results </SectionTitle> <Paragraph position="0"> The rule-based algorithm was evaluated on the untagged MEDLEX corpus samples. Recall, precision and F-score were used to calculate the acronym-expansion matching. The algorithm recognized 671 acronym-definition pairs of which 47 were incorrectly identified. The results obtained were 93% precision and 72.5% recall, yielding F-score of 81.5%.</Paragraph> <Paragraph position="1"> A closer look at the 47 incorrect acronym pairs that were found showed that the algorithm failed to make a correct match when: (1) words that appear in the definition string don't have a corresponding letter in the acronym string, (2) letters in the acronym string don't have a corresponding word in the definition string, such as &quot;PGA&quot; from &quot;glycol alginate l&quot;osning&quot;, (3) letters in the definition string don't match the letters in the acronym string.</Paragraph> <Paragraph position="2"> The error analysis showed that the reasons for missing 190 acronym-definition pairs are: (1) letters in the definition string don't appear in the acronym string, due to a mixture of a Swedish definition with an acronym written in English, (2) mixture of Arabic and Roman numerals, such as &quot;USH3&quot; from &quot;Usher typ III&quot;, (3) position of numbers/letters, (4) acronyms of three characters which appear in lower case letters.</Paragraph> </Section> <Section position="3" start_page="168" end_page="169" type="sub_section"> <SectionTitle> 4.3 Machine Learning Experiment </SectionTitle> <Paragraph position="0"> The acronym-definition pairs recognized by the rule-based algorithm were used as the training material in this experiment. The 671 pairs were presented as feature vectors according to the features described in Section 3.3. The material was divided into two data files: (1) 80% training data; (2) 20% test data. Four different algorithms were used to create models. These algorithms are: IB1, IGTREE, TRIBL and TRIBL2. The results obtained are given in Table 1.</Paragraph> </Section> </Section> class="xml-element"></Paper>