File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2173_metho.xml
Size: 11,442 bytes
Last Modified: 2025-10-06 14:07:16
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2173"> <Title>XMLTrans: a Java-based XML Transformation Language for Structured Data</Title> <Section position="3" start_page="1136" end_page="1136" type="metho"> <SectionTitle> 2 An example transformation </SectionTitle> <Paragraph position="0"> A typical dictiona, ry entry might ha.ve a. surprisingly complex structure. The various components of the entry: headword, pa.rt-ofst)eech , pronunciation, definitions, translations, nla.y themselves contain complex substructures. For \])icoPro, these structures were interl)reted in of aThe UI'G have since developed another interesting t, ransformation tool called XMIA)erl.</Paragraph> <Paragraph position="1"> der 1;o construct IITML output for tyl)ographical rendition and also to extract indexing inibrmarion. null A fictitious source entry might be of tile form: \'Ve would like to convert this entry to HTML, extra.cling tile headword fbr indexing pnrl)oses. Apl)lying the rules which are shown in section d, XML\]'rans generates the following outl)uC: If&quot; this were an actual dictionary, the XMI/l'rans 1,ransducer would itera.te over all the entries in the dictiona.ry, converting ea.(:h in turn to the OUtl)Ut format above.</Paragraph> </Section> <Section position="4" start_page="1136" end_page="1139" type="metho"> <SectionTitle> 3 Aspects of the XMLTrans </SectionTitle> <Paragraph position="0"> language Each XMLTrans rule file contains a number of rule sets as described in tile next sections. 'l.'he transducer attempts to match each rule in tile set sequentially until either a rule m~tches or there are no more rules.</Paragraph> <Paragraph position="1"> The document I)TD is not used to check the validity of the input document. Consequenl;ly, input documents need not be valid XMI,, but must still be well-formed to be accel)ted by the parser.</Paragraph> <Paragraph position="2"> The rule syntax borrows heavily from tha.t of regular expressions and in so doing it allows for very concise and compact rule specifica.tion. As will be seen shortly, many simple rules can be expressed in a single short line.</Paragraph> <Section position="1" start_page="1137" end_page="1137" type="sub_section"> <SectionTitle> 3.1 Rule Sets </SectionTitle> <Paragraph position="0"> At tile top of an XMLTrans rule file at least one &quot;trigger&quot; is required to associate an XML element(e.g, an element containing a dictionary entry) with a collection of rules, called a &quot;rule set ~&quot; The syntax for a &quot;trigger&quot; is as follows: element_name : ~ rule_set_name Multiple triggers can be used to allow different kinds of rules to process different kinds of elements. For example: ; the rules for this set follow ; the declaration...</Paragraph> <Paragraph position="1"> The rule set: is terminated either by the end of the file oi: with the declaration of another rule set.</Paragraph> </Section> <Section position="2" start_page="1137" end_page="1137" type="sub_section"> <SectionTitle> 3.2 Variables </SectionTitle> <Paragraph position="0"> In XMLTrans rule syntax, variables (prefaced with &quot;$&quot;) m:e implicitly declared with their first use. There are two types of variables: * Element varial)les: created by an assignment of a pattern of elements to a. vari-M)le...For example: $a = LI, where <LI> is an element. Element variables can contain one or more elements. If a given variable $a contains a list of elements { A, B, C, ...}, transforming $a will apply the transformation in sequence to <A>, <13>, <C> and so on.</Paragraph> <Paragraph position="1"> * Attribute variables: created by an assignment of a pattern of attributes to a variable. For Example: LI \[ $a=TYPE \], where TYPE is a standard XML attribute.</Paragraph> <Paragraph position="2"> While variables are not strongly typed (i.e. a list of elements is not distinguished from an individual element), attribute variables cannot be used in the place of element variables and vice versa.</Paragraph> <Paragraph position="3"> 4XML~l}'ans comments are preceded by a semicolon.</Paragraph> </Section> <Section position="3" start_page="1137" end_page="1139" type="sub_section"> <SectionTitle> 3.3 Rules </SectionTitle> <Paragraph position="0"> The basic control structure of XMLTrans is the rule, consisting of a left-hand side (LHS) and a right-hand side (RHS) separated by an arrow (&quot;- >&quot;). The LHS is a pattern of XML element(s) to match while the RHS is a specitication for a transfbrmation on those elements.</Paragraph> <Paragraph position="1"> a.a.1 The Left-hand Side The basic building block of the M tS is the element pattern involving a single element, its attributes and children.</Paragraph> <Paragraph position="2"> XMLTrans allows for complex regular expressions of elements on the tits to match over the children of the element being examined. The following rule will match an element <Z> which has exactly two children, <X> and <Y> (in the examples that \[Bllow &quot;...&quot; indicates any completion of the rule):</Paragraph> <Paragraph position="4"> XMH?rans supports the notion of a logical NOT over an element expression. This is represented by the standard &quot;\[&quot; symbol. Support for general regular expressions is built into the language grammar: &quot;Y*&quot; will match 0 or more occurences of the element <Y>, &quot;Y+&quot; one or more occurences, and &quot;g?&quot; 0 o1&quot; l occurences. In order to create rules of greater generality, elements and attributes in the LHS of a. rule can be assigned to variables. Per instance, we might want to transform a given element <X> in a certain way without specifying its children.</Paragraph> <Paragraph position="5"> The following rule would be used in such a case: ; Match X with zero or more unspecified ; children.</Paragraph> <Paragraph position="7"> In tile rule above, the variable $a will be either empty (if <X> has no children), a single element (if <X> has one child), or a list of elements (if <X> has a series of children. Sinlilarly, the pattern X{$a} matches an dement <X> with exactly one child.</Paragraph> <Paragraph position="8"> If an expression contains complex patterns, it is often useful to assign specific parts to different variables. This allows child nodes to be processed in groul)s on the billS, perhaps being re-used several times or reordered. Consider the following rule:</Paragraph> <Paragraph position="10"> in this case $a contains a (possibly e,npty) list o\[' {<X>, <Y>} element l)airs. The variable Sb will contain exactly one <Q>. If' this pal;tern cannot be matched the rule will fail.</Paragraph> <Paragraph position="11"> Attribul;es may a,lso 1)e assigned to variables.</Paragraph> <Paragraph position="12"> 'l&quot;he following three rules demonstrate some l>OSsibilities: null ; Match any X which has an attribute ATT X\[ Satt = ATT \] -> ...; ; Match any X which has an attribute ; ATT with the value &quot;VALUE&quot;.</Paragraph> <Paragraph position="13"> X\[ Satt = ATT == &quot;VALUE&quot;\] -> ...; ; Match any X with an attribute ; which is NOT equal to &quot;VALUE&quot; X\[ Satt = ATT != &quot;VALUE&quot;\] -> ...; The last tyl>e of exl)ressions used <)u the IAIS a.re string expressions. Strings are considered to l)e elements in their own right, but; they ~l,re enclosed in (luotes and cannot have atl;ribute patterns like regular e,h'ments (:an. A special syntax,/.*/, is used to mean a, ny element which is a string. The following are some sample string matching rules: The R, II,q SUl)l)lies a COllStruction pa.ttern R)r tile tra, nsformed 1;tee node.</Paragraph> <Paragraph position="14"> A simple rule might be used to tel)lace a,n demenI, and its contents wit\]l some text: X -> &quot;Hello world&quot; l&quot;or the input <X>Text</X>, this rule, yiekls the oul;l)ut string Hello world. A more useful rule might strip off the enclosing element using a variable refhrence on the \]JIS : $X{$a*} -> $a For the input <X>Text</X>, this rule generates glle oul;l)lll; Text. Elements lnay also be rennmed while dmir contents remain unmodified. The tbllowing rule demonstrates this facility: $X{$a*} -> Y{$a} \]ibr the input <X>Text</X>, the rule yields the outl)ut <Y>Text</Y>. Note that any children o\[' <X> will be reproduced, regardless of whether ghey are text elements or not.</Paragraph> <Paragraph position="15"> Attribute varialJes may also be ,sed in XMLTrans rules. The rule below shows how this is aecomplished: X \[$a=ATT\] {$b*} -> Y \[OLDATT=$a\] {$b} Given the input <X ATT=&quot;VAL&quot;>Text</X>, the r.le yields the output <Y OLDATT=&quot;VAL&quot; >Text </Y >.</Paragraph> <Paragraph position="16"> l{ecursion is a fundamenta,\[ concept used ill writing XMLTrans rules. The exl>ression @set_name(variablemame) tells the XMLTrans transformer to continue processing on the elements contained ill tile indica.l;ed variable. l'br instance, @setl($a) indicates that the el-ements contained in the va.l'ial)le $a shoukl be processed by the rules in the set setl. A spe: cial notation C/(variable~ame) is used to tell t;he trausi'ormer to contin,e processing with the current rule set. Thus, if dm current rule set is set2, the expression @($a) indicates that \[)recessing sho,l<l coudnue on tile elelnent,s in Sa using the rule set set2. the following rule (lemonstra,tes how 1;r~llSOFlllalJOllS ca,n \])e applied recusively to an element: Initially, set1 is invoked to process the el<;= merit <X>, but then the rule set set2 is inyoked to 1)recess its children. Consequently, for the input <\>Text</\>, the outing; is <Y>Nothing</Y>.</Paragraph> </Section> </Section> <Section position="5" start_page="1139" end_page="1139" type="metho"> <SectionTitle> 4 Rules for the example </SectionTitle> <Paragraph position="0"> transforlnation The transformation of the example ill section 2 can be achieved with a few XMLTrans rules. The main rule treats the <entry> element, creating a HTML document fl'om it, and copying the headword to several places. The subsequent rules generate the HTML output from section 2: The advent of stable versions of XSLT (Clark, 2000) has dramatically changed the landscape of XML transformations, so it is interesting to compare XMLTrans with recent developments with XSLT.</Paragraph> <Paragraph position="1"> lit is evident that the set of transformations described by the XMLTrans transformation language is a subset of those described by XSLT. In addition, XSLT is integrated with XSL allowing the style sheet author to access to the rendering aspects of XSL such as \[brmatting objects.</Paragraph> <Paragraph position="2"> Untbrtunately, it takes some time to learn the syntax of XSL and the various aspects of XSLT, such as XPath specifications. This task may be particularly difficult for those with no prior experience with SGML/XML documents.</Paragraph> <Paragraph position="3"> In contrast, one needs only have a knowledge of regular expressions to begin writing rules with XMLTrans.</Paragraph> </Section> class="xml-element"></Paper>