File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2213_concl.xml

Size: 2,380 bytes

Last Modified: 2025-10-06 13:54:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2213">
  <Title>Building parallel corpora for eContent professionals</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
4 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper, we described the methodology followed in the construction of a multilingual parallel corpus; this task has been interpreted as a test application endeavor in the process of defining a business model for the LRs production. The effort was to identify gaps and shortcomings in the process usually employed by LRs producers (or users who might wish to create their own LRs) and to suggest ways of remedying them. Our findings include: problems faced during the acquisition phase: although an increasing supply of raw data (e.g.</Paragraph>
    <Paragraph position="1"> over Internet) and tools capable of exploiting this data (e.g. web crawlers that can identify and download texts in a given language) is attested, there is also a need for the enhancement of these tools with more intelligent techniques (e.g. incorporation of alignment techniques during the acquisition process in order to spot potential parallel texts, identification and mark-up of large foreign language excerpts), problems faced during the processing phase: in order to enhance the LRs production effort, the re-use of existing tools is considered crucial. It is true that an increasing number of tools are available for text processing; however, this is oriented mainly towards the major languages. Moreover, information concerning the existence, availability and operation of existing tools is not easy to locate - a gap that the other pillar of INTERA tries to remedy through the building of an integrated European Language Resources area.</Paragraph>
    <Paragraph position="2"> Additionally, tools must be enhanced with respect to two directions: improvement of the tools themselves (e.g. more robust alignment techniques) and interoperability of all relevant tools currently used at different phases of processing. The issue of interoperability is closely related with the issue of standards. The promotion and deployment of existing standards as well as the creation of new standards, when these are lacking, is important to ensure viability and re-use of LRs, given the cost of their production.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML