File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/c00-2173_abstr.xml

Size: 6,096 bytes

Last Modified: 2025-10-06 13:41:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2173">
  <Title>XMLTrans: a Java-based XML Transformation Language for Structured Data</Title>
  <Section position="2" start_page="0" end_page="1136" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> The recently completed MLIS DieoPro project addressed the need tbr a uniform, platform-independent interface for: accessing multiple dictionaries and other lexical resources via the Internet/intranets. Lexical data supplied by dictionary publishers for the project was in a variety of SGML forn\]ats. In order to transforrn this data to a convenient standard format (IJTML), a high level transformation language was developed. This language is simple to use, yet powerful enough to perlbrm complex transformations not possible with similar transformation tools.</Paragraph>
    <Paragraph position="1"> XMLTrans provides rooted/recursive transductions, simila.r to tr,~nsducers used for na.tura.l language translation. The tool is written in standard .lava and is available to the general public.</Paragraph>
    <Paragraph position="2"> l Introduction The MMS l)icoPro project 1, which ran from April 11998 to Sept 1999, addressed the need for a uniIbrm, plattbrm-indel)endent interface for accessing multiple dictionaries and other lexical resources via the lnternet/intranets. One project deliverable was a client-server tool enabling trm~slators and other language professionals connected to an intranet to consult dictionaries and related lexica.1 data from multiple sources.</Paragraph>
    <Paragraph position="3"> Dictionary data was supplied by participating dictionary publishers in a variety of proprietary formats 2. One important DicoPro module wa.s a transformation language capable of 1DicoPro was a project funded within the MullAlingum hfformation Society programme (MLIS), an EU initiative launched by the European Commission's DG XIlI and the Swiss Federal OIrtce of Education and Science. 2Project participants were: IlarperCollins, Hachette Livre, Oxford Unlversit~y Press.</Paragraph>
    <Paragraph position="4"> standardizing tile variety of lexical data. Tile language needed to be straightforward enough tbr ~ non-programnmr to master, yet powerful enough to perform all tile transfbrmations necessary to achieve tile desired output. The result of our efforts, XMLTrans, takes as input a well-lbrmed XML file and a file containing a set of transformation rules and gives as output the.application of the rules to the input file.</Paragraph>
    <Paragraph position="5"> The transducer was designed tbr the processing of large XML files, keeping only the minimum necessary part of the document in memory at all times. This tool should be of use for: anyone wishing to tr~msform large amounts of (particularly lexical) data from one XML representation to another.</Paragraph>
    <Paragraph position="6"> At; the time XM1;l?rans was being developed (mid 11998), XML was only an emerging standard. As a. consequence, we first looked to more esta.blished SGMI~ resources to find a. suitable trans\[brmation tool. Initial experimentation began with I)SSSL (Binghaln, :1996) as a possible solution. Some time was invested in developing a user-friendly &amp;quot;front-end&amp;quot; to the I)SSSL engine .jade developed by James Clark (Clark, 1998). This turned out to be extremely cumbersome to implement, and was ~ba.ndoned. There were a number of commercial products such as Omnimark Light (Ominimark Corp; :1998), TXL (Legasys Corp; 1.998) and PatMI, (IBM Corp; 1998) which looked promising but could not be used since we wanted our transducer to be ill tile 1)ublic domain.</Paragraph>
    <Paragraph position="7"> We subsequently began to examine available XML transduction resources. XSL (Clark, Deach, 11998) was still not mature enough to rely on as a core tbr tile language. In addition, XSL dkl not (at the time) provide for rooted, recursive transductions needed to convert the complex data structures found in l)icoPro's lexica.1  d a.ta.</Paragraph>
    <Paragraph position="8"> F, din1)llrgh's La.ngua.ge 'lhchnology Group ha,d l)roduced a. nun~l)er of usefi,1 SGM\]ffXMI, ma.nipulaCion tools (I;.I'G, 11999). Un\['ortunately none of these ma.tched our specific needs. \]~br instance, ~.qmltrans does not permit matching of com l)lex expressions invoh, ing elements, text, and aPStributes. A nether I/FG tool, ~.qu)g is more powerful, 1)ut its control files have (in our opinion) a. non-intuitive and COml)lex syntax 3.</Paragraph>
    <Paragraph position="9"> Since a, large number of standardized XML APIs had been developed tbr the Java. programruing language this appeared to be a. prondsing direction. Ill addition, Java's portal)fifty was a. strong dra.wing point. The API model which best suited our needs was 1;he &amp;quot;Document Oh: ject Model&amp;quot; (DOM) with an underlying &amp;quot;Simple A Pl for XMI2' (SA X) I&gt;arser.</Paragraph>
    <Paragraph position="10"> The event-based SAX parser reads into lnelnory only the elements in the input document releva.nt to the tra.nsfornl alien. In efti.'(;t, X MI,-Tra.ns is intended 1;o 1)recess lexicaJ entries which a.re indel)en(lent of ca.cA other and tha.t ha.ve a. few basic formats. Since only one entry is ever in memory at a.ny given point in time, extremely la.rge files can be I)rocessed wil;h low nmmory overhea.d.</Paragraph>
    <Paragraph position="11"> The \])OM AI)I is used in the tra.nsforma.tion l)rocess to access the the element which is currently in menlory. The element is tra.nsformed a.ccording to rules sl)ecilied in a. rule tile. These rules a.re interpreted by XMl/l'rans as operalions to l&gt;erfbrnl on the data through I;llo I)OM A.PI.</Paragraph>
    <Paragraph position="12"> We begin with a s\]ml)le examl&gt;le to illustra.te the kinds of transformations l&gt;erlbrmed by XMLTrans. Then we introduce the language concepts a.nd structure of XMLTrans rules and rule files. A comparison of XMLT,:a.ns with XSLT will help situate our work with respecl; to the state-of-the-art in XML data processing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML