File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-3028_intro.xml
Size: 2,840 bytes
Last Modified: 2025-10-06 14:03:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-3028"> <Title>A Flexible Stand-Off Data Model with Query Language for Multi-Level Annotation</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Growing interest in richly annotated corpora is a driving force for the development of annotation tools that can handle multiple levels of annotation. We find it crucial in order to make full use of the potential of multi-level annotation that individual annotation levels be treated as self-contained modules which are independent of other annotation levels.</Paragraph> <Paragraph position="1"> This independence should also include the storing of each level in a separate file. If these principles are observed, annotation data management (incl. level addition, removal and replacement, but also conversion into and from other formats) is greatly facilitated. null The way to keep individual annotation levels independent of each other is by defining each with direct reference to the underlying basedata, i.e. the text or transcribed speech. Both sequential and hierarchical (i.e. embedding or dominance) relations between markables on different levels are thus only expressed implicitly, viz. by means of the relations of their basedata elements.</Paragraph> <Paragraph position="2"> While it has become common practice to use the stand-off mechanism to relate several annotation levels to one basedata file, it is also not uncommon to find this mechanism applied for relating markables to other markables (on a different or the same level) directly, expressing the relation between them explicitly. We argue that this is unfavourable not only with respect to annotation data management (cf. above), but also with respect to querying: Users should not be required to formulate queries in terms of structural properties of data representation that are irrelevant for their query. Instead, users should be allowed to relate markables from all levels in a fairly unrestricted and ad-hoc way. Since querying is thus considerably simplified, exploratory data analysis of annotated corpora is facilitated for all users, including non-experts.</Paragraph> <Paragraph position="3"> Our multi-level annotation tool MMAX21 (M&quot;uller & Strube, 2003) uses implicit relations only. Its query language MMAXQL is rather complicated and not suitable for naive users. We present an alternative query method consisting of a simpler and more intuitive query language and a method to generate MMAXQL queries from the former. The new, simplified MMAXQL can express a wide range of queries in a concise way, including queries for associative relations representing e.g.</Paragraph> <Paragraph position="4"> coreference.</Paragraph> </Section> class="xml-element"></Paper>