File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1802_intro.xml

Size: 3,047 bytes

Last Modified: 2025-10-06 14:01:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1802">
  <Title>Some Considerations on Guidelines for Bilingual Alignment and Terminology Extraction</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Multilingual terminology is an important language resource for a range of natural language processing tasks such as machine translation and cross-lingual information retrieval. The compilation of multilingual terminology is often time-consuming and involves much manual labour to be of practical use. Aligning texts of typologically different languages such as Chinese and English is even more challenging because of the significant differences in lexicon, syntax, semantics and styles. The discussion in the paper is based on issues arising from the extraction of bilingual legal terms from aligned Chinese-English legal corpus in the implementation of a bilingual a text retrieval system for the Judiciary of the Hong Kong Special Administrative Region (HKSAR) Government.</Paragraph>
    <Paragraph position="1"> Much attention in computational terminology has been directed to the development of algorithms for extraction from parallel texts. For example, Chinese-English (Wu and Xia 1995), Swedish-English-Polish (Borin 2000), and Chinese-Korean (Huang and Choi 2000). Despite considerable progress, bilingual terminology so generated is often not ready for immediate and practical use. Machine extraction is often the first step of terminology extraction and must be used in conjunction with rigorous and well-managed manual efforts which are critical for the production of consistent and useable multilingual terminology. However, there has been relatively little discussion on the significance of human intervention. The process is far from being straightforward because of the different purposes of alignment, the requirements of target users and the corpus type. Indeed, there remain many problematical issues that will not be easy to be resolved satisfactorily by computational means in the near future, especially when typologically different languages are involved, and must require considerable manual intervention. Unfortunately, such critical manual input has often been treated as an obscure process. As with other human cognitive process (T'sou et al. 1998), manual terminology markup is not a straightforward task and many issues deserve closer investigation.</Paragraph>
    <Paragraph position="2"> In this paper, we will present some significant issues for Chinese-English alignment and term extraction for the construction of a bilingual legal glossary. Section 2 describes the background of the associated bilingual alignment project. Section 3 discusses the necessity of manual input in bilingual alignment, and some principles adopted in the project to address these issues. Section 4 provides an outline for further works to improve terminology management, followed by a conclusion in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML