File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-2024_concl.xml

Size: 2,684 bytes

Last Modified: 2025-10-06 13:55:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2024">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Towards A Modular Data Model For Multi-Layer Annotated Corpora</Title>
  <Section position="7" start_page="188" end_page="189" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> Corpus-based research projects often choose to implement custom tools and encoding formats.</Paragraph>
    <Paragraph position="1"> Small projects do not want to lose valuable time learning complex frameworks and adapting them to their needs. They often employ a custom XML format to be able to use existing XML processing tools like XQuery or XSLT processors.</Paragraph>
    <Paragraph position="2">  ATLAS or NXT are very powerful, yet they suffer from lack of accessibility to programmers who have to adapt them to project-specific needs. Most specialized annotation editors do not build upon these frameworks and neither offer conversion tools between their data formats.</Paragraph>
    <Paragraph position="3"> Projects such as DDD do not make use of the frameworks, because they are not easily extensible, e.g. with a SQL backend instead of an XML storage. Instead, againahighlevelquerylanguage is developed and a completely new framework is created which works with a SQL backend.</Paragraph>
    <Paragraph position="4"> In the previous sections, objects from selected approaches with different foci in their work with annotated corpora have been collected and forged into a comprehensive data model. The potential for modularization of corpus annotation frameworks has been shown with respect to data models and query languages. As a next step, an existing framework should be taken and refactored into an extensible modular architecture. From a practical point of view reusing existing technology as much as possible is a desirable goal. This means reusing existing facilities provided for XML data, such as XPath, XQuery and XSchema and where necessary trying to extend them, instead of creating a new data model from scratch. For the annotational tiers, as LPath has shown, a good starting point to do so is to extend existing languages like XPath.</Paragraph>
    <Paragraph position="5"> Locational and medial operators seem to be best implemented as XQuery functions. The possibility to map between SQL and XML provides access to additional efficient resources for storing and querying annotation data. Support for various kinds of base data or locational information can be encapsulated in modules. Which modules exactly should be created and what they should cover in detail has to be further examined.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML