File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1003_intro.xml
Size: 4,485 bytes
Last Modified: 2025-10-06 14:00:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1003"> <Title>The MATE Markup Framework</Title> <Section position="2" start_page="0" end_page="19" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Spoken language engineering products proliferate in the market, commercial and research applications constantly increasing in variety and sophistication. These developments generate a growing need for tools and standards which can help improve the quality and efficiency of product development and evaluation. In the case of spoken language dialogue systems (SLDSs), for instance, the need is obvious for standards and standard-based tools for spoken dialogue corpus annotation and automatic infomaation extraction. Information extraction from annotated corpora is used in SLDSs engineering for many different purposes.</Paragraph> <Paragraph position="1"> For several years, annotated speech corpora have been used to train and test speech recognisers.</Paragraph> <Paragraph position="2"> More recently, corpus-based approaches are being applied regularly to other levels of processing, such as syntax and dialogue. For instance, annotated corpora can be used to construct lexicons and grammars or train a grammar to acquire preferences for frequently used rules. Similarly, programs for dialogue act recognition and prediction tend to be based on annotated corpus data. Evaluation of user-system interaction and dialogue success is also based on annotated corpus data. As SLDSs and other language products become more sophisticated, the demand will grow for corpora with multilevel and cross-level annotations, i.e. annotations which capture information in the raw data at several different conceptual levels or mark up phenomena which refer to more than one level. These developments will inevitably increase the demand for standard tools in support of the annotation process.</Paragraph> <Paragraph position="3"> The production (recording, transcription, annotation, evaluation) of corpus data fer spoken language applications continues to be time-consuming and costly. So is the construction of tools which facilitate annotation and information extraction. It is therefore desirable that already available annotated corpora and tools be used whenever possible. Re-use of annotated data and tools, however, confronts systems developers with numerous problems which basically derive from the lack of common standards. So far, language engineering projects usually have either developed the needed resources from scratch using homegrown formalisms and tools, or painstakingly adapted resources from previous projects to novel purposes.</Paragraph> <Paragraph position="4"> In recent years, several projects have addressed annotation formats and tools in support of annotation and information extraction (for an overview, see http://www.ldc.uperm.edu/annotation/). Some projects have addressed the issue of markup standardisation from different perspectives. Examples are the Text Encoding ilc.pi.cnr.it/EAGLES96/home.hurd). Whilst these initiatives have made good progress on written language and current coding practice, none of them have focused on the creation of standards and tools for cross-level spoken language corpus annotation. It is only recently that there has been a major effort in this domain. The project Multi-level ~anotafion Tools Engineering (MATE) (http://mate.nis.sdu.dk) was launched in March 1998 in response to the need for standards and tools in support of creating, annotating, evaluating and exploiting spoken language resources. The central idea of MATE has been to work on both annotation theory and practice in order to connect the two through a flexible framework which can ensure a common and user-friendly approach across annotation levels. On the tools side, this means that users are able to use level-independent tools and an interface representation which is independent of the internal coding file representation.</Paragraph> <Paragraph position="5"> This paper presents the MATE markup framework and its use in the MATE Workbench.</Paragraph> <Paragraph position="6"> In the following, Section 2 briefly reviews the MATE approach to annotation and tools standardisafion. Section 3 presents the MATE markup framework. Section 4 concludes the paper by reporting on early experiences with the practical use of the markup framework and discussing future work. &quot;</Paragraph> </Section> class="xml-element"></Paper>