File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2709_intro.xml

Size: 2,716 bytes

Last Modified: 2025-10-06 14:04:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2709">
  <Title>ANNIS: Complex Multilevel Annotations in a Linguistic Database</Title>
  <Section position="3" start_page="0" end_page="61" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Research Scenario
</SectionTitle>
      <Paragraph position="0"> The database ANNIS is being developed in the Collaborative Research Center SFB 632 on Information Structure, which consists of 13 individual research projects from disciplines such as theoretical linguistics, psycholinguistics, first and second language acquisition, typology and historical linguistics.1 In the research center, data of various languages is collected and annotated at the levels of phonology, morphology, syntax, semantics, and pragmatics--levels that contribute in ways yet to be determined to the information structural partitioning of discourse and utterances.</Paragraph>
      <Paragraph position="1"> For annotation, task-specific tools are being used, e.g. EXMARaLDA, annotate, RSTTool, and MMAX.2 Data is then converted into a standoff data interchange format, which is fed into the linguistic database ANNIS. ANNIS aims at providing functionalities for exploring and querying the data, offering suitable means for both visualization and export.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="61" type="sub_section">
      <SectionTitle>
User Requirements
</SectionTitle>
      <Paragraph position="0"> Central requirements evolving out of the scenario sketched above and, as we believe, for multilevel annotation in general are Data heterogeneity, Data reuse, and Accessibility (cf. (Dipper and G&amp;quot;otze, 2005)).</Paragraph>
      <Paragraph position="1"> Data heterogeneity is a result of: (i) the language data to be annotated, varying with respect to size (single sentences vs. narrations), modality (monologue vs. dialogue, text vs. speech) and language; (ii) the annotations, which use different  data structures (attribute-value pairs, trees, pointers, etc.); and (iii) data formats that stem from different task-specific annotation tools.</Paragraph>
      <Paragraph position="2"> Data reuse must be supported, e.g. for further or re-annotation, statistical analyses, or reuse of the data in other tools.</Paragraph>
      <Paragraph position="3"> Accessibility of both tools and data is an obvious prerequisite for data reuse.</Paragraph>
      <Paragraph position="4"> In the following section, we will address those aspects that are particularly relevant for these requirements and discuss their treatment in ANNIS.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML