File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1045_intro.xml
Size: 1,052 bytes
Last Modified: 2025-10-06 14:06:53
<?xml version="1.0" standalone="yes"?> <Paper uid="E99-1045"> <Title>Encoding a Parallel Corpus for Automatic Terminology Extraction</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Text corpora are valuable resources in all areas dealing with natural language processing in one form or another. Terminology is one of these fields, where researchers explore domain-specific language material to investigate terminological issues. The manual acquisition of terminological data from text material is a very work-intensive and error-prone task. Recent advances in automatic corpus analysis favored a modern form of terminology acquisition: (1) a corpus is a collection of language material in machine-readable form and (2) computer programs scan the corpus for terminologically relevant information and generate lists of term candidates which have to be post-edited by humans. The following project CATEx adopts this approach.</Paragraph> </Section> class="xml-element"></Paper>