File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1045_intro.xml

Size: 1,052 bytes

Last Modified: 2025-10-06 14:06:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1045">
  <Title>Encoding a Parallel Corpus for Automatic Terminology Extraction</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Text corpora are valuable resources in all areas dealing with natural language processing in one form or another. Terminology is one of these fields, where researchers explore domain-specific language material to investigate terminological issues. The manual acquisition of terminological data from text material is a very work-intensive and error-prone task. Recent advances in automatic corpus analysis favored a modern form of terminology acquisition: (1) a corpus is a collection of language material in machine-readable form and (2) computer programs scan the corpus for terminologically relevant information and generate lists of term candidates which have to be post-edited by humans. The following project CATEx adopts this approach.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML