File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2120_metho.xml
Size: 16,482 bytes
Last Modified: 2025-10-06 14:14:13
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2120"> <Title>NP_Modification Diathesis Tense Aspect Modalit Sentence Types Coordination Negation</Title> <Section position="4" start_page="0" end_page="329163" type="metho"> <SectionTitle> 3 TSNI,P Annotation Schema </SectionTitle> <Paragraph position="0"> A detailed annotation schema was designed tbr the test data which does not, presuppose a specific linguistic theory, a particular evaluation situation or application type.</Paragraph> <Paragraph position="1"> Test data and am~otations in TSNI,P test suites are organized at four distinct representational levels: null * (?ore Data The (:()re of the test data c<>nsists of the individual test items together with all ge, neral, categorial and structural inforlnation that is indepen{lent of a token phenomenon or application. Besi<les the actual input string, annota: tions at this level include (i) booklC/eeI>ing and documentation inR)rmation (sill;her, date, id numl>er), (it) the item format, its length, category and well--formedness eo<le, (iii) the (morpho)syntactic categories and string l)ositions of the lexi<'al and phrasal elements ('onstil;nting the test il;em, and (iv) ~'tIl (mMersl,eeilie(l) representation of its flmctional stru(:tm:e, gn<:oding a dependency or funetor-argument graph rather dmn a t)hrase st;ructure tree allows generalizations over pt>tentially <:ontroversial t>hrase structttre eonfigul'ations ~ilcl, thus, avoids imposing a specifi<: <:onstituent stru<:ture lint still ean be mapi>ed onto one.</Paragraph> <Section position="1" start_page="712" end_page="329163" type="sub_section"> <SectionTitle> * Phenomenon-Related Data Based on a hi= </SectionTitle> <Paragraph position="0"> erarchical classification of linguistic (an<l extra= linguistic) phenome+,a (e.g. verb wdency as a subtype of general complementation), each phenoiuenon ix identitied by a phenomenon id and by its supertype,(s). \]interaction with other phenomena as well as the l)henom<ma which must be presuplmse<l are also given, in addition, the (syntactic) parameters which are relevant for the phenomenon (e.g. the munber an<l tyt)e of con> plements in the case of verb valency) are describe<t. Individual test items can be assigned to one or several phenoluena and annotated ge<:ording to the eorresl)ondii~g parameters.</Paragraph> <Paragraph position="1"> * Test Sets 'lPest items emt optkmally be groul>ed into test sets. A tesl, s01, is a group (>f test items containing typically one I)ositive examl)le &lid one or nlore negative examples. The relation t)etween positive an<l negative Ix;st it;eros has l)een one (>f the most <:hallengiug <luestions in designing test data and, as has l)een men: tioned, is based on the systematic variation of phenomenon=specific paraineters.</Paragraph> <Paragraph position="2"> * User and Application Parameters Infornm: lion that typically correlates with the use of a I;est suite for difl'erent types of ewtluation and for different apl)li<:ations (e.g. ratings of fl'e(luency or relewm<'e \['or a particular <\[onlailt) i8 factore<\[ fl'om the remainder of the data into 'user \[:4 application profile.,< As part of the <:ustomization t)ro<:ess users of the TSNI,P \[;est suil;es are eli<:ouraged to extend this part; of the test suite database and a<ld whatever (formal or informal) information is necessary for Ch<',ir Sl>eeific requirements.</Paragraph> <Paragraph position="3"> In ad<lition to l;he parts of the annotation s(-henta that follow a formal speeifi<;ation, there is room for textual conmmnts at the wn'ious levels to accommodate informatioi~ that (:annot or need not be forlnalized.</Paragraph> <Paragraph position="4"> \[ Test Item -- \] item id: 2/~0~20101 author: issco date: jan-95 \] register: formal format: n.onc' origin: inve~,tcd\[ difficulty: l wellformedness: / category: 5' \[ input: L ' i'n,g&~,icur vie'at . length: 3 I (Estiwfl et al. (1994)) and using the annotation schema sketched above, the eonstru(:tion of test data was based on a classitication of the (synl;ac= tic) phenomena to bc <:overed. \[,i'om judgements on the linguistic relevance and frequency for dm individual languages, the following list; <)f (:ore pheo ,n, omcna for T,qNIA' was compiled:</Paragraph> <Paragraph position="6"> * modality, teltse, and asl)ect; * Selltence and clause tyt)es; * word order; (r) coordination; * negation; and * extragrammatical (e.g. parenthetieals and ternporal expressions).</Paragraph> <Paragraph position="7"> A fin'ther sul)-elassifieation of phenomena is made according to the relevanl ~ynl, actie domains in which a I>henonmnon occurs (e.g. sentences (S), clauses (C), n<mn 1)hrases (NP) et al.). Fignre 2 giw;s an overview of the test material awfilable. For ea<:h of the three languages some 5000 l,esl; items are l)rovided. Theret.'ore, TSNI,I' has already achieved a substantially broader and deeper <:overage than previous general-purpose test suites (the still very popular Hewlett-Paekard tes~ suite, for instance, has a (;overage of 3000 test items for English only).</Paragraph> <Paragraph position="8"> In order to enforce consistency of annotations across the three languages, canonical lists of the categories and fimctions used in the <leserit)tion of categorial and de4>endency structure were estal> lished (see Ix'\]mlann et al. (1996)). The <timensions <:hosen in the classification atl;eml)t to avoid 1995): relevance and breadth of individual phenon> ena present language-specific variation (the Immbers given are for grammatical vs. mlgramnmtical items). Individual phenmnenn are often further sub-classified according to phenomenon-internM dimensions.</Paragraph> <Paragraph position="9"> the presupposition of very si)ecific assumi)tions of a particular theory of grammar (or of a language), and rather try to capture those distinctions that seem to be relevant; across the set of TSNI,t' core phenomena.</Paragraph> </Section> </Section> <Section position="5" start_page="329163" end_page="329163" type="metho"> <SectionTitle> 5 Test Suite Technology </SectionTitle> <Paragraph position="0"> Because {;he test data construction proper as well as the custornization and application of a general-purpose test suite to a specific NLP system or domain are laborious, cost-intensive and error-prone tasks, TSNLP put strong emphasis on supplying suitable special-purpose tools to fitcilitate both the development as well as usage of the TSNIA' test data (Oepen et al. (1996a) give an overview).</Paragraph> <Section position="1" start_page="329163" end_page="329163" type="sub_section"> <SectionTitle> 5.1 Test Data Construction </SectionTitle> <Paragraph position="0"> To ease the tilne-consuming test data construction and to reduce erratic variations in filling in the TSNI,P annotation schema, a graphical test suite construction tool (tsct) was implemented. The tool instant, iates the annotation schema (see section 3) as a feral-based input mask and provides for (limited) consistency checking of the field values. Additionally, tsct allows reusing previously constructed and annotated data, as quite often when constructing a series of test; items it can be easier to duplicate and adapt a sintilar item rather than t)roduce annotations froul s(:ratch. For sorer; of the I;est data a DCG--lmsed test suite generati(m tool (Arnold et al. (1994)) was det)loyed to automatically produce systematically wu'ied (i.e.</Paragraph> <Paragraph position="1"> both grmnmatical and ungrammatical) test items togeth0r with some part, of the ~mnotations.</Paragraph> </Section> <Section position="2" start_page="329163" end_page="329163" type="sub_section"> <SectionTitle> 5.2 Test Data Maintenance and Retrieval </SectionTitle> <Paragraph position="0"> To implement the TSNI,P virtual test suite ai)preach (see section 1), the test data is mounted (m a relational datal)ase to satisfy the, folh)wing key database kernel is separated from client programs through a layer of interface flmctions.</Paragraph> <Paragraph position="1"> * usability: to facilitate the application of the methodology, technology, and test; data developed in TSNLP to a wide variety of diagnosis and evaluation purposes for ditferent applications by developers or users with varied backgrounds; * suitability: to meet the specific necessities of storing and maintaining natural language t;est data (e.g. in string 1)recessing) and to provide maximally flexible interfaces; * adaptability and extensibility: to enable and encourage users of the, database to add test data and annotations according to their needs without changes to the underlying data model; and * portability and simplicity: to make the re-SUIts of TSNI,P available on several different hard- and software plat;forms and easy to use.</Paragraph> <Paragraph position="2"> To a.ccount for the 1)otentially different requiremeats of NLP developers a.nd users and ill order to provide suitable interfaces to hmnan test suite users as well as to external applicatioi~ programs, a dual database inq)lementation was carried out: (i) while a proprietary implementation (called tsdb 1) allowed the fine-tuning of both the query \]anguage and interfaces, (it) a second version (tsdb2) builds on a commercial database product and, thus, is coml)liant to commol~ industry standards allowing (industrial) users of the TSNLP test; suite to acquire on-site technical SUl)l)ort where necessary. : The tsdb 1 inll)leanelfl;ation is a small and etlicient relational database engine in ANSI C. 11; was designed with an open and dot:unrented interface layer (see figure 3) that enalfles test suite users to 1)idirectiona.lly link an al)l)lication being tested to t;he database and run automated retrieve, 1)recess, and comi)arc, cycles. Diagnostic results obtained can be stored in the databnsc, as part of the %set&quot; 94 application prwJile for use in contitnlolts progress ewduation (section 6 gives mt exainple).</Paragraph> <Paragraph position="3"> An ASCii-based comnm.nd shell interprets a simplitied SQL-stylc query language and provides editing, completion, and command and query result history. A network database server gives remote (though read-only) access to the test data.</Paragraph> <Paragraph position="4"> For the alternative intt)lententation tsdb 2 the COml)etitively priced dat, a,l)asc l)a.ckage Microsoft. ~ File \[dit Database Becord Program Itun Ulilldou~ Bl'O~se l,'igurc 4: Screen dural) o\[&quot; the tsdb 1 test item window; the underlying relational d++tabase allows parallel browsing and editing of multil)le r(qai,iot,s.</Paragraph> <Paragraph position="5"> libxPro was deployed bex:aus(; it in awdlM)le for both Apple Macintosh and personal COmlml;ers running MS Windows 2 and has a very wi(t(; distribut;ion. Tit(; database provides (;oitv(+Jliotl|; graphical browsit|g and editing of tit(; data (using lmll~ down menus fbr tinit;e (hmtain fields; s(+,o \[igure 4) as well as standard import and export fa.cilities to exchange data with external applications.</Paragraph> <Paragraph position="6"> ,5.a Query and Retrieval: An Example '15 ilh|strat;c the capacity and flcxil)ility of i;hc TSNLP annotation schema in ctntjunction with a relational database retri(wal (;ngin(:, a query examplc in the' simplified SQL-likc query language interpreted by tsdb I together wit;h mt informal English paral)hrasc, in giwm: a t;esl; data, and tools, the lnOj(~t'.t rt;sttlt;s have b(;cn test(~(l against t+hr(~(~ (lifter(mr: al)plicati(m l;yl)o:q , viz. ;t commercial granltnar clmcker t})r French, a (:ontrolh',d 1;ml';uag(; (:he(:k(~r (%1,;(2(~) for l!;nglish and a pars(~r (<:he PAt II,; sys|;(~itl (hw(~l()l>('xl at I)I,'KI) _ + P'lhlildhi<c{ on the pOl)ular d~tM)asc llaclcag(! M~q Aci:(!ss, ~tiiothcr iliill\](mil~tli,aJ, ion ()\[' l.hc I,('M, suii,c dal,all~tsc i,<; curronl, ly llciiig devehtped. This vcisioli wili provide :-t siniila3&quot; funcl,ion~tlil;y \[,(i tsdb 2 iuid b<' ;ivail;dlh~ \[or 1,1i(~ MS Windows world.</Paragraph> <Paragraph position="7"> :~Addil;ionM sa.inplc qu(!ri<~s a, nd lll()l(! d<~t, ails Oll i, hc tl~d,,Mlasc s,;:\]i(~lii;-i. (inc\]udilip> r<~ia.l,ion ~t.iit\[ a,(,l,ril)ui;,~ i,un{~s) ctut be tbuud in ()t~l>(,t,. ,~i, M. (i99(;1>) :.m,:\] on t,lw 'I'SNI,I' WorM-Wide \Vel~ hotnc ll~t~,;c http ://t:snlp. d \['k:i., unJ -,~b. de/L :;it\] p/.</Paragraph> <Paragraph position="8"> for German. As in this setup the evaluation situat;ions ranged froilt user-level black box ewdua, tion of a (:ommercial prodttct to glass box diagnosis of a research 1)rol;otylm tamer develol)ment (the I)I,'-KI sysLcm), a tilllltber of interc+st, ing resull;s were ol)tained on both t, hc adequacy of tim TSNI+I' slY.</Paragraph> <Paragraph position="9"> proach as well as tim quality of the sys|;cms being l;est(;d.</Paragraph> <Paragraph position="10"> Iq'ench Grammar Checker %'ho real life c, wduation scenario (ix,. tim diagnosl;ic cvahmtion of a conint(~rcia\] NLP product) enal>led Acrosllatiale to give a precise accolllt|; of t,h(; t, yl)(', of informal;ion ol)tainable from th(', its(+' of TSNLP.</Paragraph> <Paragraph position="11"> Tit('+ folh)wing major 1)(~rforlna.lt('c, chara(:teristi('.s were revealed: it(~d (19% of the TSNI,P \[;os|; items were not flflly analysed).</Paragraph> <Paragraph position="12"> The itd:crl)rotatiou of the results lnoduc(~d t)y l;hc system and l;h(~ comparison wil;h l, hc ling|tis-l:ic information \]n'ovidod ill the TSNI3' amlotati(ms led to mi id(',ntifi(:ation ot:' tim major .qho|'tt:o|nings of tho syst:oin in terms of systemati(:il;y, lex ical and morl)ho-syntacl;ic deliciencic, s, and intcrf(~t'en(:(; wil,h oth(;r system coiltl)OllOllgS.</Paragraph> <Paragraph position="13"> English Cont;rolled Language Checker l&quot;,ssex tt~.%cxl l:hc (:oi|grolh;d la.nguagc (:hcc, ker %F,(X', (Adriacns (1994)). f,ike A(,r(,Sl,;~tiale , \]~ssox was mosl;ly in a black box sil:ttatiol~ with reSlmCt 1,o t,\]l(! SySlX~IlI~ CKCt~pl; l;hat, tItcy \]l~t(\] a(:ces.~ 1;o |;\]1(~ cott+-I;rolled grmnmar langttag(~ (h'~st:ril)l;ious (})ttl: uoi: 1,<) Ill(! sysl;C}lii rllles). ~l'\]t(} t;(}sl;ing involvod the writilI~-{ t)f a 1;/,t'~r(? tlllllllIt}l' OI'( IIS ,()ltilS( (l test. items, du(' to tim fat;I; l;hal; lnany C\], ,uh~,~ are h;xically b~tsc(\[, whcl(~l,s Lilt: (:oro \[,(~sI; sllil;(' (:OltC(~ltl;ral;c,s ()It syll\[;~l.(: tit: l)henomen;~. 'Phe l;e~Mng lnOVed very wdua.I)h~ in highlii~hl;ing deti(:i(~ncies in l;ho sys\[;(~III \[)(wfoF iltallCC, ;IS well aS iu die rldo dest:rit)tions mtd gave l)oinl;t;rs t;o Lhe l)osMlde SOllr(:(~ o\]' IIIos(~ olr()r;~, (~(}l'lIlall \[1+a\['s(~,l&quot; 111 C<)IlII(~CI;ill~ t;}l(! (.l(Wiltail 'I'SNIA' l;oSl, suJl.e tO t;h(~ I)FKI I'AtlE \])at's,,}l &quot;4 I)oLII 'I'\['}IC I)FKI I'AGE (\]'lal, form for A dwm(:ud ( h'a.lnmar l';nujtmcrittg) syslcln is a, s;l,+tt;c-.t~l'-l, lw att, NI, cot. ~tl t~itl<+ }l+lld ~/l'&llttll+tt&quot; (qi,l+illC(!ritlg pl+d, fot'm; it, is iu +u:tiv(+ tl~+C ,:l j; S(!V('~l'~l+l int(n'iw+l;ionaJ r(~scar(:h ittsi;itul, iott;q \].i. marily Ior Itt',q(l-~';Lylc ~rlfD, Iltlllit.F +lrwelopm~mt for (a,m+m, 14ngliMt, .I;+l)+Utcs % aml ltMian.</Paragraph> <Paragraph position="14"> 7 15 tile test data as well as tile TSNLP technology were validated. Building on the C version of the TSNLP database (tsdbl), a bidirectional interface to the application was established allowing the instantiation of a DFKI user & application profile for tile storage of application-specific data (including performance measures and a semantic specification of the expected output).</Paragraph> <Paragraph position="15"> The seamless coupling between the test suite and the NL system allows running flflly automated retrieve, process, and compare cycles in the continuous progress evaluation of the grammar and software such that after making changes to the system the irnpact on coverage and performance can be determined in an overnight batch ,job. The TSNLP test data and database technology proved to be a highly adequate tool for glass-box diagnostic evaluation; besides, the testing experience provided valuable feedback for both the test suite and the application tested (Dauphin et al. (1995b)).</Paragraph> </Section> </Section> class="xml-element"></Paper>