File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-1020_intro.xml
Size: 2,467 bytes
Last Modified: 2025-10-06 14:05:59
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1020"> <Title>Beyond Skeleton Parsing: Producing a Comprehensive Large-Scale General-English Treebank With Full Grammatical Analysis</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A treebank is a body of natural language text which has been grammatically annotated by hand, in terms of some previously-established scheme of grammatical analysis. Treebanks have been used within the field of natural language processing as a source of training data for statistical part og speech taggers (Black et al., 1992; Brill, 1994; Merialdo, 1994; Weischedel et al., 1993) and for statistical parsers (Black et al., 1993; Brill, 1993; aelinek et al., 1994; Magerman, 1995; Magerman and Marcus, 1991).</Paragraph> <Paragraph position="1"> In this article, we present the AT'R/Lancaster 7'reebauk of American English, a new resource tbr natural-language-, processing research, which has been prepared by Lancaster University (UK)'s Unit for Computer Research on the English Language, according to specifications provided by ATR (Japan)'s Statistical Parsing Group. First we provide a &quot;static&quot; description, with (a) a discussion of the mode of selection and initial processing of text for inclusion in the treebank, and (b) an explanation of the scheme of grammatical annotation we then apply to the text. Sec.ond, we supply a &quot;process&quot; description of the treebank, in which we detail the physical and computational mechanisms by which we have created it. Finally, we lay out plans for the further development of this new treebank.</Paragraph> <Paragraph position="2"> All of the features of the ATR/Lancaster Tree-bank that are described below represent a radical departure from extant large-scale (Eyes and Leech, 1993; Garside and McEnery, 1993; Marcus et al., 1993) treebanks. We have chosen in this article to present our treebank in some detail, rather than to compare and contrast it with other treebanks. But the major differences between this and earlier treebanks can easily be grasped via a corn*Current alfiliation: Renaissance Technologies Corp., 25 East Loop Road, Suite 211, Stony Brook, NY 11776 USA; Consultant, ATR Interpreting Telecommunications Laboratories, 3-12/94 parison of the descriptions below with those of the sources just, cited.</Paragraph> </Section> class="xml-element"></Paper>