File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1087_concl.xml
Size: 2,790 bytes
Last Modified: 2025-10-06 13:53:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1087"> <Title>Enhancing automatic term recognition through recognition of variation</Title> <Section position="8" start_page="4" end_page="4" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> In this paper we discussed possibilities for the extraction and conflation of different types of variation of term candidates. We demonstrated that the incorporation of treatment of term variation enhanced the performance of an ATR system, and that tackling term variation phenomena was an essential step for ATR. In our case, precision was boosted by considering joint frequencies of occurrence and termhoods for all candidate terms from candidate synterms, while recall benefited from the introduction of new candidates through consideration of different variation types. Although we experimented with a biomedical corpus, our techniques are general and can be applied to other domains.</Paragraph> <Paragraph position="1"> Variations affecting single term candidate constituents are the most frequent phenomena, and also straightforward for implementation as part of an ATR process. The conflation of such term candidate variants can be further tuned for a specific domain by using lists of combining forms and affixes. The incorporation of acronyms had a significant high positive effect, in particular on more frequent terms (since acronyms are introduced for terms that are used more frequently).</Paragraph> <Paragraph position="2"> However, more complex structural phenomena had a moderate positive influence on recall, but, in general, the negative effect on precision. The main reason for such performances is structural and terminological ambiguity of these expressions, in addition to their low frequency of occurrence (compared to the total number of term occurrences). For handling such complex variants, a knowledge-intensive and domain-specific approach is needed, as coordinated term candidates or candidates with prepositions need to be additionally semantically analysed in order to suggest more reliable term candidates, and to introduce fewer false candidates.</Paragraph> <Paragraph position="3"> Apart from being useful for boosting precision and recall, the integration of term variation into ATR is particularly important for smaller corpora (where linking related occurrences is vital for successful terminology management) as well as for many text-mining tasks (such as IR, IE, term or document clustering and classification, etc.).</Paragraph> <Paragraph position="4"> Finally, as future work, we plan to investigate more knowledge intensive, domain-specific treatment of prepositional and coordinated terms, as well as pronominal term references.</Paragraph> </Section> class="xml-element"></Paper>