File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2215_concl.xml

Size: 1,742 bytes

Last Modified: 2025-10-06 13:54:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2215">
  <Title>PolyphraZ : a tool for the management of parallel corpora</Title>
  <Section position="4" start_page="0" end_page="2" type="concl">
    <SectionTitle>
3 Conclusion
</SectionTitle>
    <Paragraph position="0"> The CXM and CPXM levels of PolyphraZ are already used. They have allow us to import the BTEC multilingual corpus of parallel sentences (into the common CPM format), to transform it (163000 sentenes in 5 languages) into files in CPXM formats, and to visualize it  on the web.</Paragraph>
    <Paragraph position="1"> The Tanaka corpus should be available when this paper will be presented. The &amp;quot;inner&amp;quot; level of MPM (Multilingual Polyphrase Memory) is almost completed. It will also support versioning. In the future, we plan to use MPMs not only to handle multilingual corpora of parallel sentences, but also like &amp;quot;pivots&amp;quot;, to establish the sentence-level correspondence between parallel monolingual structured documents. If no high quality TWS (like Trados, TM2, Deja Vu; Transit, etc.) is available, PolyphraZ could be used as a &amp;quot;bare bone&amp;quot; TWS, directly through the web, in the Montaigne  spirit.</Paragraph>
    <Paragraph position="2"> We are also studying how to integrate into a MPM structure &amp;quot;generators&amp;quot; specifying classes of sentences (automata for messages with variables and variants, regular expressions for CSTAR IF expressions, etc.), and to use them to extend a MPM not only &amp;quot;in width&amp;quot; (addition of new languages), but also &amp;quot;in height&amp;quot;, by the automatic creation of new &amp;quot;statements&amp;quot;, natural and/or formal.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML