File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1035_intro.xml
Size: 2,908 bytes
Last Modified: 2025-10-06 14:06:16
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1035"> <Title>Software Infrastructure for Natural Language Processing</Title> <Section position="3" start_page="0" end_page="237" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> This paper reviews the currently available design strategies for software infrastructure for NLP and presents an implementation of a system called GATE - a General Architecture for Text Engineering. By software infrastructure we mean what has been variously referred to in the literature as: software architecture; software support tools; language engineering platforms; development enviromnents.</Paragraph> <Paragraph position="1"> Our gloss on these terms is: common models for the representation, storage and exchange of data in and between processing modules in NLP systems, along with graphical interface tools for the management of data and processing and the visualisation of data.</Paragraph> <Paragraph position="2"> NLP systems produce information about texts 1, and existing systems that aim to provide software infrastructure for NLP can be classified as belonging to one of three types according to the way in which they treat this information: additive, or markup-based: information produced is added to the text in the form of markup, e.g. in SGML (Thompson and McKelvie, 1996); referential, or annotation-based: information is stored separately with references back to the original text, e.g. in the TIPSTER architecture (Grishman, 1996); abstraction-based: the original text is preserved in processing only as parts of an integrated data structure that represents information about the text in a uniform theoretically-motivated model, e.g. attribute-value structures in the ALEP system (Simkins, 1994).</Paragraph> <Paragraph position="3"> A fourth category might be added to cater for those systems that provide communication and control infrastructure without addressing the text-specific needs of NLP (e.g. Verbmobil's ICE architecture (Amtrup, 1995)).</Paragraph> <Paragraph position="4"> We begin by reviewing examples of the three approaches we sketched above (and a system that falls into the fourth category). Next we discuss current trends in the field and motivate a set of requirements that have formed the design brief for GATE, which is then described. The initial distribution of the system includes a MUC-6 (Message Understanding Conference 6 (Grishman and Sundheim, 1996)) style information extraction (IE) system and an overview of these modules is given. GATE is now available for research purposes - see http ://ul;w. dcs. shef. ac. u_k/research/groups/ nlp/gate/ for details of how to obtain the system. It is written in C++ and Tcl/Tk and currently runs on UNIX (SunOS, Solaris, Irix, Linux and AIX are known to work); a Windows NT version is in preparation. null</Paragraph> </Section> class="xml-element"></Paper>