File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1802_intro.xml
Size: 1,314 bytes
Last Modified: 2025-10-06 14:02:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1802"> <Title>Metalinguistic Information Extraction for Terminology</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Mining terminological information from free or semi-structured text in large-scale technical corpora is slowly becoming a reasonably mature NLP technology, with term extraction systems leading the way. Automatically obtaining information about terms from free text has been a field less explored, but recent experiences have shown that compiling the extensive resources that modern scientific and technical disciplines need to manage the explosive growth of their knowledge is both feasible and practical. A good example of this NLP-based processing need is the National Library of Medicine's MedLine abstract database, which incorporates around 40,000 new Life Sciences papers each month. In order to maintain and update UMLS knowledge resources1 the NLM staff needs to manually review 400,000 highly-technical papers each year (Powell et al.</Paragraph> <Paragraph position="1"> 2002). Most of these terminological knowledge sources have been compiled from existing glossaries and vocabularies that might become</Paragraph> </Section> class="xml-element"></Paper>