The Automatic Generation of Formal Annotations in a Multimedia
Indexing and Searching Environment
Thierry Declerck
DFKI GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbruecken
Germany
declerck@dfki.de
Peter Wittenburg
MPI for Psycholinguistics
Wundtlaan 1, PB 310
NL-6500 AH Nijmegen
The Netherlands
Peter.Wittenburg@mpi.nl
Hamish Cunningham
Dept. of Computer Science
University of Sheffield
Regent Court, 211 Portobello
GB-Sheffield S1 4DP
Great Britain
hamish@dcs.shef.ac.uk
Abstract
We describe in this paper the MU-
MIS Project (Multimedia Indexing and
Searching Environment)1, which is
concerned with the development and in-
tegration of base technologies, demon-
strated within a laboratory prototype, to
support automated multimedia index-
ing and to facilitate search and retrieval
from multimedia databases. We stress
the role linguistically motivated annota-
tions, coupled with domain-specific in-
formation, can play within this environ-
ment. The project will demonstrate that
innovative technology components can
operate on multilingual, multisource,
and multimedia information and create
a meaningful and queryable database.
1 Introduction
MUMIS develops and integrates basic technolo-
gies, which will be demonstrated within a labora-
tory prototype, for the automatic indexing of mul-
timedia programme material. Various technology
components operating offline will generate for-
mal annotations of events in the data material pro-
cessed. These formal annotations will form the
basis for the integral online part of the MUMIS
project, consisting of a user interface allowing the
querying of videos. The indexing of the video ma-
terial with relevant events will be done along the
1MUMIS is an on-going EU-funded project within the
Information Society Program (IST) of the European Union,
section Human Language Technology (HLT). See for more
information http://parlevink.cs.utwente.nl/projects/mumis/.
line of time codes extracted from the various doc-
uments.
For this purpose the project makes use of data
from different media sources (textual documents,
radio and television broadcasts) in different lan-
guages (Dutch, English and German) to build a
specialized set of lexicons and an ontology for
the selected domain (soccer). It also digitizes
non-text data and applies speech recognition tech-
niques to extract text for the purpose of annota-
tion.
The core linguistic processing for the anno-
tation of the multimedia material consists of
advanced information extraction techniques for
identifying, collecting and normalizing signifi-
cant text elements (such as the names of players
in a team, goals scored, time points or sequences
etc.) which are critical for the appropriate anno-
tation of the multimedia material in the case of
soccer.
Due to the fact that the project is accessing and
processing distinct media in distinct languages,
there is a need for a novel type of merging tool in
order to combine the semantically related annota-
tions generated from those different data sources,
and to detect inconsistencies and/or redundancies
within the combined annotations. The merged an-
notations will be stored in a database, where they
will be combined with relevant metadata.2
Finally the project will develop a user interface
to enable professional users to query the database,
by selecting from menus based on structured an-
2We see in this process of merging extracted informa-
tions and their combination with metadata a fruitful base for
the identification and classification of content or knowledge
from distinct types of documents.
notations and metadata, and to view video frag-
ments retrieved to satisfy the query, offering thus
a tool to formulate queries about multimedia pro-
grammes and directly get interactive access to the
multimedia contents. This tool constitutes the on-
line component of the MUMIS environment.
2 State of the Art
MUMIS differs in many significant ways from ex-
isting technologies and already achieved or ad-
vanced projects3. Most closely related to the the-
matic focus of MUMIS are the HLT projects Pop-
Eye [POP] and OLIVE [OLI]. Pop-Eye used sub-
titles to index video streams and offered time-
stamped texts to satisfy a user query, on request
displaying a storyboard or video fragment corre-
sponding to the text hit. OLIVE used automatic
speech recognition to generate transcriptions of
the sound tracks of news reports, which were then
indexed and used in ways similar to the Pop-Eye
project; both projects used fuzzy matching IR al-
gorithms to search and retrieve text, offering lim-
ited multilingual access to texts. Instead of using
IR methods to index and search the transcriptions,
MUMIS will create formal annotations to the in-
formation, and will fuse information annotations
from different media sources. The fusion result
is then used to direct retrieval, through interface
techniques such as pop-up menus, keyword lists,
and so on. Search takes the user direct to the sto-
ryboard and video clippings.
The Informedia project at Carnegie-Mellon-
University [INF] has a similar conceptual base-
line to MUMIS. The innovative contribution of
MUMIS is that it uses a variety of multilingual
information sources and fuses them on the ba-
sis of formal domain-specific annotations. Where
Informedia primarily focuses on special applica-
tions, MUMIS aims at the advancement and in-
tegratibility of HLT-enhanced modules to enable
information filtering beyond the textual domain.
Therefore, MUMIS can be seen as complemen-
tary to Informedia with extensions typical for Eu-
rope.
The THISL project [THI] is about spoken doc-
ument retrieval, i.e., automatic speech recognition
3We are aware of more related on-going projects, at least
within the IST program, but we can not compare those to
MUMIS now, since we still lack first reports.
is used to auto-transcribe news reports and then
information retrieval is carried out on this infor-
mation. One main focus of THISL is to improve
speech recognition. Compared to MUMIS it lacks
the strong language processing aspects, the fusion
of multilingual sources, and the multimedia deliv-
ery.
Columbia university is running a project [COL]
to use textual annotations of video streams to in-
dicate moments of interest, in order to limit the
scope of the video processing task which requires
extreme CPU capacities. So the focus is on find-
ing strategies to limit video processing. The Uni-
versity of Massachusetts (Amherst) is also run-
ning projects about video indexing [UMA], but
these focus on the combination of text and im-
ages. Associated text is used to facilitate indexing
of video content. Both projects are funded under
the NSF Stimulate programme [NSF].
Much work has been done on video and im-
age processing (Virage [VIR], the EUROMEDIA
project [EUR], Surfimage [SUR], the ISIS project
[ISI], IBM's Media Miner, projects funded under
the NSF Stimulate program [NSF], and many oth-
ers). Although this technology in general is in its
infancy, there is reliable technology to indicate,
for example, scene changes using very low-level
cues and to extract key frames at those instances
to form a storyboard for easy video access. Some
institutions are running projects to detect subtitles
in the video scene and create a textual annotation.
This task is very difficult, given a sequence of real
scenes with moving backgrounds and so on. Even
more ambitious tasks such as finding real patterns
in real movies (tracing the course of the ball in a
soccer match, for example) are still far from being
achieved.4
3 Formal Annotations for the Soccer
Domain
Soccer has been chosen as the domain to test and
apply the algorithms to be developed. There are a
number of reasons for this choice: availability of
people willing to help in analyzing user require-
ments, existence of many information sources in
4The URLs of the projects mentionned above are given
in the bibliography at the end of this paper.
several languages5, and great economic and pub-
lic interest. The prototype will also be tested by
TV professionals and sport journalists, who will
report on its practicability for the creation and
management of their programme and information
material.
The principles and methods derived from this
domain can be applied to other as well. This has
been shown already in the context of text-based
Information Extraction (IE), for which method-
ologies for a fast adaptation to new domains
have been developed (see the MUC conferences
and (Neumann et al., 2000)). And generally
speaking the use of IE for automatic annotation
of multimedia document has the advantage of
providing, besides the results of the (shallow)
syntactic processing, accurate semantic (or con-
tent/conceptual) information (and thus potential
annotation) for specific predefined domains, since
a mapping from the linguistically analyzed rele-
vant text parts can be mapped onto an unambigu-
ous conceptual description6. Thus in a sense it
can be assumed that IE is supporting the word
sense disambiguation task.
It is also commonly assumed (see among oth-
ers (Cunningham, 1999)) that IE occupies an in-
termediate place between Information Retrieval
(with few linguistic knowledge involved) and
Text Understanding (involving the full deep lin-
guistic analysis and being still not realized for the
time being.). IE being robust but offering only a
partial (but mostly accurate) syntactic and content
analysis, it can be said that this language technol-
ogy is actually filling the gap between available
low-level annotated/indexed documents and cor-
pora and the desirable full content annotation of
those documents and corpora. This is the reason
why MUMIS has chosen this technology for pro-
viding automatic annotation (at distinct linguistic
and domain-specific levels) of multimedia mate-
rial, allowing thus to add queryable “content in-
formation” to this material.7
5We would like to thank at this place the various institu-
tions making available various textual, audio and video data.
6This topic has already been object of a workshop dis-
cussing the relations between IE and Corpus Linguistics
(McNaught, 2000).
7MUMIS was not explicitly designed for supporting
knowledge management tasks, but we assume that the mean-
ingful organization of domain-specific multimedia material
proposed by the project can be adapted to the organization of
4 The Multimedia Material in MUMIS
The MUMIS project is about automatic index-
ing of videos of soccer matches with formal an-
notations and querying that information to get
immediate access to interesting video fragments.
For this purpose the project chose the European
Football Championships 2000 in Belgium and the
Netherlands as its main database. A major project
goal is to merge the formal annotations extracted
from textual and audio material (including the au-
dio part of videos) on the EURO 2000 in three
languages: English, German, Dutch. The mate-
rial MUMIS has to process can be classified in
the following way:
1. Reports from Newspapers (reports about
specific games, general reports) which is
classified as free texts (FrT)
2. Tickers, close captions, Action-Databases
which are classified as semi-formal texts
(SFT)
3. Formal descriptions about specific games
which are classified as formal texts (FoT)
4. Audio material recorded from radio and TV
broadcasts
5. Video material recorded from TV broadcasts
1-4 will be used for automatically generating
formal annotations in order to index 5. MUMIS
is investigating the precise contribution of each
source of information for the overall goal of the
project.
Since the information contained in formal texts
can be considered as a database of true facts, they
play an important role within MUMIS. But never-
theless they contain only few information about a
game: the goals, the substitutions and some other
few events (penalties, yellow and red cards). So
there are only few time points available for in-
dexing videos. Semi-formal texts (SFT), like live
tickers on the web, are offering much more time
points sequences, related with a higher diversity
the distributed information of an enterprise and thus support
the sharing and access to companies expertise and know-
how.
of events (goals scenes, fouls etc,) and seem to of-
fer the best textual source for our purposes. Nev-
ertheless the quality of the texts of online tick-
ers is often quite poor. Free texts, like newspa-
pers articles, have a high quality but the extrac-
tion of time points and their associated events in
text is more difficult. Those texts also offer more
background information which might be interest-
ing for the users (age of the players, the clubs they
are normally playing for, etc.). Figures 1 and 2 in
section 8 show examples of (German) formal and
semi-formal texts on one and the same game.
5 Processing Steps in MUMIS
5.1 Media Pre-Processing
Media material has been delivered in various
formats (AudioDAT, AudioCassettes, Hi-8 video
cassettes, DV video cassettes etc) and qualities.
All audio signals (also those which are part of
the video recordings) are digitized and stored in
an audio archive. Audio digitization is done with
20 kHz sample frequency, the format generated is
according to the de-facto wav standard. For dig-
itization any available tool can be used such as
SoundForge.
Video information (including the audio compo-
nent) of selected games have been digitized into
MPEG1 streams first. Later it will be encoded in
MPEG2 streams. While the quality of MPEG1 is
certainly not satisfying to the end-user, its band-
width and CPU requirements are moderate for
current computer and network technology. The
mean bit rate for MPEG1 streams is about 1.5
Mbps. Current state-of-the-art computers can ren-
der MPEG1 streams in real time and many net-
work connections (Intranet and even Internet) can
support MPEG1. MPEG2 is specified for about 3
to 5 Mbps. Currently the top-end personal com-
puters can render MPEG2, but MPEG2 is not yet
supported for the most relevant player APIs such
as JavaMediaFramework or Quicktime. When
this support is given the MUMIS project will also
offer MPEG2 quality.
For all separate audio recordings as for ex-
ample from radio stations it has to be checked
whether the time base is synchronous to that one
of the corresponding video recordings. In case of
larger deviations a time base correction factor has
to be estimated and stored for later use. Given that
the annotations cannot be created with too high
accuracy a certain time base deviation will be ac-
cepted. For part of the audio signals manual tran-
scriptions have to be generated to train the speech
recognizers. These transcripts will be delivered in
XML-structured files.
Since keyframes will be needed in the user in-
terface, the MUMIS project will develop software
that easily can generate such keyframes around a
set of pre-defined time marks. Time marks will
be the result of information extraction processes,
since the corresponding formal annotations is re-
ferring to to specific moments in time. The soft-
ware to be written has to extract the set of time
marks from the XML-structured formal annota-
tion file and extract a set of keyframes from the
MPEG streams around those time marks. A set of
keyframes will be extracted around the indicated
moments in time, since the estimated times will
not be exact and since the video scenes at such
decisive moments are changing rapidly. There
is a chance to miss the interesting scene by us-
ing keyframes and just see for example specta-
tors. Taking a number of keyframes increases the
chance to grab meaningful frames.
5.2 Multilingual Automatic Speech
Recognition
Domain specific language models will be trained.
The training can be bootstrapped from written re-
ports of soccer matches, but substantial amounts
of transcribed recordings of commentaries on
matches are also required. Novel techniques
will be developed to interpolate the base-line lan-
guage models of the Automatic Speech Recogni-
tion (ASR) systems and the domain specific mod-
els. Moreover, techniques must be developed to
adapt the vocabularies and the language models
to reflect the specific conditions of a match (e.g.,
the names players have to be added to the vocabu-
lary, with the proper bias in the language model).
In addition, the acoustic models must be adapted
to cope with the background noise present in most
recordings.
Automatic speech recognition of the sound
tracks of television and (especially) radio pro-
grammes will make use of closed caption subtitle
texts and information extracted from formal texts
to help in finding interesting sequences and auto-
matically transcribing them. Further, the domain
lexicons will help with keyword and topic spot-
ting. Around such text islands ASR will be used
to transcribe the spoken soundtrack. The ASR
system will then be enriched with lexica contain-
ing more keywords, to increase the number of se-
quence types that can be identified and automati-
cally transcribed.
5.3 Multilingual Domain Lexicon Building
All the collected textual data for the soccer do-
main are used for building the multilingual do-
main lexicons. This data can be in XML, HTML,
plain text format, etc. A number of automatic
processes are used for the lexicon building, first
on a monolingual and secondly on a multilin-
gual level. Manual browsing and editing is tak-
ing place, mainly in order to provide the semantic
links to the terms, but also for the fine-tuning of
the lexicon according to the domain knowledge.
Domain lexicons are built for four lan-
guages, namely English, German, Dutch and
Swedish. The lexicons will be delivered in a
fully structured, XML-compliant, TMX-format
(Translation Memory eXchange format). For
more information about the TMX format see
http://www.lisa.org/tmx/tmx.htm.
We will also investigate how far
EUROWORDNET resources (see
http://www.hum.uva.nl/ ewn/) can be of use
for the organization of the domain-specific
terminology.
5.4 Building of Domain Ontology and Event
Table
The project is currently building an ontology for
the soccer domain, taking into consideration the
requirements of the information extraction and
merging components, as well as users require-
ments. The ontology will be delivered in an XML
format8.
8There are still on-going discussions within the
project consortium wrt the best possible encoding for-
mat for the domain ontology, the alternative being
reduced probably to RDFS, OIL and IFF, see respec-
tively, and among others, http://www.w3.org/TR/rdf-
schema/, http://www.oasis-open.org/cover/oil.html and
http://www.ontologos.org/IFF/The%20IFF%20Language.
html
In parallel to building the ontology an event ta-
ble is being described. It contains the major event
types that can occur in soccer games and their
attributes. This content of the table is matching
with the content of the ontology. The event ta-
ble is a flat structure and guides the information
extraction processes to generate the formal event
annotations. The formal event annotations build
the basis for answering user queries. The event
table is specified as an XML schema to constrain
the possibilities of annotation to what has been
agreed within the project consortium.
5.5 Generation of Formal Annotations
The formal annotations are generated by the IE
technology and are reflecting the typical output of
IE systems, i.e.instantiated domain-specific tem-
plates or event tables. The slots to be filled by
the systems are basically entities (player, teams
etc.), relations (player of, opponents etc.) and
events (goal, substitution etc.), which are all de-
rived from the current version of the domain on-
tology and can be queried for in the online com-
ponent of the MUMIS prototype. All the tem-
plates associated with an event are including a
time slot to be filled if the corresponding informa-
tion is available in a least one of the sources con-
sulted during the IE procedure. This time infor-
mation is necessary for the indexing of the video
material.
The IE systems are applying to distinct sources
(FoT, FrT etc.) but they are not concerned with
achieving consistency in the IE result on distinct
sources about the same event (game): this is the
task of the merging tools, described below.
Since the distinct textual sources are differ-
ently structured, from “formal” to “free” texts, the
IE systems involved have adopted a modular ap-
proach: regular expressions for the detection of
Named Entities in the case of formal texts, full
shallow parsing for the free texts. On the base of
the factual information extracted from the formal
texts, the IE systems are also building dynamic
databases on certain entities (like name and age
of the players, the clubs they are normally playing
for, etc.) or certain metadata (final score), which
can be used at the next level of processing.
5.6 The Merging Tool
The distinct formal annotations generated are
passed to a merging component, which is respon-
sible for avoiding both inconsistencies and redun-
dancies in the annotations generated on one event
(in our case a soccer game).
In a sense one can consider this merging
component as an extension of the so-called co-
reference task of IE systems to a cross-document
(and cross-lingual) reference resolution task. The
database generated during the IE process will help
here for operating reference resolution for more
“verbose” types of texts, which in the context
of soccer are quite “poetic” with respect to the
naming of agents (the “Kaiser” for Beckenbauer,
the “Bomber” for Mueller etc...), which would
be quite difficult to achieve within the sole refer-
ential information available within the boundary
of one document. The project will also investi-
gate here the use of inferential mechanisms for
supporting reference resolution. So for example,
“knowing” from the formal texts the final score
of a game and the names of the scorers, follow-
ing formulation can be resolved form this kind
of formulation in a free text (in any language):
“With his decisive goal, the “Bomber” gave the
victory to his team.”, whereas the special nam-
ing “Bomber” can be further added to the entry
“Mueller”
The merging tools used in MUMIS will also
take into consideration some general representa-
tion of the domain-knowledge in order to filter out
some annotations generated in the former phases.
The use of general representations9 (like domain
frames), combined with inference mechanisms,
might also support a better sequential organiza-
tion of some event templates in larger scenarios.
It will also allow to induce some events which
are not explicitly mentioned in the sources under
consideration (or which the IE systems might not
have detected).
5.7 User Interface Building
The user first will interact with a web-portal to
start a MUMIS query session. An applet will be
9Like for example the Type Description Language
(TDL), a formalism supporting all kind of operations on
(typed) features as well as multiple inheritance, see (Krieger
and Schaefer, 1994).
down-line loaded in case of showing the MUMIS
demonstration. This applet mainly offers a query
interface. The user then will enter a query that
either refers to metadata, formal annotations, or
both. The MUMIS on-line system will search
for all formal annotations that meet the criteria
of the query. In doing so it will find the appro-
priate meta-information and/or moments in some
media recording. In case of meta-information it
will simply offer the information in scrollable text
widgets. This will be done in a structured way
such that different type of information can eas-
ily be detected by the user. In case that scenes of
games are the result of queries about formal anno-
tations the user interface will first present selected
video keyframes as thumbnails with a direct indi-
cation of the corresponding metadata.
The user can then ask for more metadata
about the corresponding game or for more media
data. It has still to be decided within the project
whether several layers of media data zooming in
and out are useful to satisfy the user or whether
the step directly to the corresponding video frag-
ment is offered. All can be invoked by simple
user interactions such as clicking on the presented
screen object. Playing the media means playing
the video and corresponding audio fragment in
streaming mode requested from a media server.
6 Standards for Multimedia Content
MUMIS is looking for a compliance with exist-
ing standards in the context of the processing of
multimedia content on the computer and so will
adhere to emerging standards such as MPEG4,
which defines how different media objects will be
decoded and integrated at the receiving station,
and MPEG7, which is about defining standards
for annotations which can be seen as multime-
dia objects. Further, MUMIS will also maintain
awareness of international discussions and devel-
opments in the aerea of multimedia streaming
(RTP, RTSP, JMF...), and will follow the discus-
sions within the W3C consortium and the EBU
which are also about standardizing descriptions of
media content.
7 Role of MUMIS for the Annotation of
Multimedia Content
To conclude, we would like to list the points
where we think MUMIS will, directly or indi-
rectly, contribute to extract and access multimedia
content:
a0 uses multimedia (MM) and multilingual in-
formation sources;
a0 carries out multimedia indexing by applying
information extraction to a well-delineated
domain and using already existing informa-
tion as constraints;
a0 uses and extends advanced language tech-
nology to automatically create formal anno-
tations for MM content;
a0 merges information from many sources
to improve the quality of the annotation
database;
a0 application of IE to the output of ASR and
the combination of this with already existing
knowledge;
a0 definition of a complex information annota-
tion structure, which is stored in a standard
document type definition (DTD);
a0 integration of new methods into a query in-
terface which is guided by domain knowl-
edge (ontology and multilingual lexica).
So in a sense MUMIS is contributing in defin-
ing semantic structures of multimedia contents,
at the level proposed by domain-specific IE anal-
ysis. The full machinery of IE, combined with
ASR (and in the future with Image Analysis)
can be used for multimedia contents development
and so efficiently support cross-media (and cross-
lingual) information retrieval and effective navi-
gation within multimedia information interfaces.
There seems thus that this technolgy can play a
highly relevant role for the purposes of knowl-
edge detection and management. This is prob-
ably specially valid for the merging component,
which is eliminating redundancies in the annota-
tions generated from sets of documents and estab-
lishing complex reference resolutions, thus sim-
plyfying the access to content (and knowledge)
distributed over multiple documents and media.

References
Doug E. Appelt. 1999. An introduction to information
extraction. AI Communications, 12.
Steven Bird and Mark Liberman. 2001. A formal
framework for linguistic annotation. Speech Com-
munication.
K. Bontcheva, H. Brugman, A. Russel, P. Wittenburg,
and H. Cunningham. 2000. An Experiment in
Unifying Audio-Visual and Textual Infrastructures
for Language Processing R&D. In Proceedings of
the Workshop on Using Toolsets and Architectures
To Build NLP Systems at COLING-2000, Luxem-
bourg. http://gate.ac.uk/.
Daan Broeder, Hamish Cunningham, Nancy Ide,
David Roy, Henry Thompson, and Peter Witten-
burg, editors. 2000. Meta-Descriptions and An-
notation Schemes for Multimodal/Multimedia Lan-
gauge Resources LREC-2000.
H. Brugman, K. Bontcheva, P. Wittenburg, and
H. Cunningham. 1999. Integrating Multimedia and
Textual Software Architectures for Language Tech-
nology. Technical report mpi-tg-99-1, Max-Planck
Institute for Psycholinguistics, Nijmegen, Nethed-
lands.
Hamish Cunningham. 1999. An introduction to infor-
mation extraction. Research memo CS - 99 - 07.
Thierry Declerck and G. Neumann. 2000. Using a pa-
rameterisable and domain-adaptive information ex-
traction system for annotating large-scale corpora?
In Proceedings of the Workshop Information Ex-
traction meets Corpus Linguistics, LREC-2000.
Kevin Humphreys, R. Gaizauskas, S. Azzam,
C. Huyck, B. Mitchell, H. Cunningham, and
Y. Wilks. 1998. University of sheffield:
Description of the lasie-ii system as used for
muc-7. In SAIC, editor, Proceedings of the
7th Message Understanding Conference, MUC-7,
http://www.muc.saic.com/. SAIC Information Ex-
traction.
Christopher Kennedy and B. Boguraev. 1996.
Anaphora for everyone: Pronominal anaphora res-
olution without a parser. In Proceedings of the
16th International Conference on Computational
Linguistics, COLING-96, pages 113–118, Copen-
hagen.
Hans-Ulrich Krieger and U. Schaefer. 1994. a1a3a2a5a4 —
a type description language for constraint-based
grammars. In Proceedings of the 15th Interna-
tional Conference on Computational Linguistics,
COLING-94, pages 893–899.
Shalom Lappin and H-H. Shih. 1996. A generalized
algorithm for ellipsis resolution. In Proceedings
of the 16th International Conference on Compu-
tational Linguistics, COLING-96, pages 687–692,
Copenhagen.
John McNaught, editor. 2000. Information Extraction
meets Corpus Linguistics, LREC-2000.
Ruslan Mitkov. 1998. Robust pronoun resolution with
limited knowledge. In Proceedings of the 17th In-
ternational Conference on Computational Linguis-
tics, COLING-98, pages 869–875, Montreal.
MUC, editor. 1995. Sixth Message Understanding
Conference (MUC-6). Morgan Kaufmann.
MUC, editor. 1998. Seventh Message Understanding
Conference (MUC-7), http://www.muc.saic.com/.
SAIC Information Extraction.
Guenter Neumann, R. Backofen, J. Baur, M. Becker,
and C. Braun. 1997. An information extrac-
tion core system for real world german text pro-
cessing. In Proceedings of the 5th Conference on
Applied Natural Language Processing, ANLP-97,
pages 209–216.
Guenter Neumann, C. Braun, and J. Piskorski. 2000.
A divide-and-conquer strategy for shallow parsing
of german free texts. In Proceedings of the 6th Con-
ference on Applied Natural Language Processing,
ANLP-00.
Jakub Piskorski and G. Neumann. 2000. An intel-
ligent text extraction and navigation system. In
Proceedings of the 6th Conference on Recherche
d'Information Assist´ee par Ordinateur, RIAO-2000.
