File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2002_intro.xml

Size: 7,921 bytes

Last Modified: 2025-10-06 14:04:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2002">
  <Title>A Framework for Incorporating Alignment Information in Parsing</Title>
  <Section position="4" start_page="24" end_page="24" type="intro">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> Our parsing domain is based on a &amp;quot;lean&amp;quot; phrase correspondence representation formultitextsfrom parallel corpora (i.e., tuples of sentences that are translations of each other). We defined an annotation scheme that focuses on translational correspondence of phrasal units that have a distinct, language-independent semantic status. It is a hypothesis ofourlonger-term projectthatsuchasemanticallymotivated,relativelycoarsephrasecor- null respondence relation is most suitable for weakly supervisedapproachestoparsingoflargeamounts of parallel corpus data. Based on this lean phrase structure format, we intend to explore an alternative to the annotation projection approach to cross-linguistic bootstrapping of parsers by (Hwa etal.,2005). Theydepartfromastandardtreebank parserforEnglish,&amp;quot;projecting&amp;quot; itsanalysestoanotherlanguage using word alignments overaparallel corpus. Ourplanned bootstrapping approach willnotstartoutwithagivenparserforEnglish(or any other language), but use a small set of manuallyannotated seeddatafollowingtheleanphrase correspondence scheme, and then bootstrap consensus representations on large amounts of unannotated multitext data. At the present stage, we only present experiments for training an initial systemonasetofseeddata.</Paragraph>
    <Paragraph position="1"> The annotation scheme underlying in the gold standard annotation consists of (A) a bracketing for each language and (B) a correspondence relation of the constituents across languages. Neither the constituents nor the embedding or correspondentrelationswerelabelled. null Theguidingprincipleforbracketing(A)isvery simple: all and only the units that clearly play the role of a semantic argument or modifier in a largerunitarebracketed. Thismeansthatfunction words, light verbs, &amp;quot;bleeched&amp;quot; PPs like in spite of etc. are included with the content-bearing elements. This leads to a relatively flat bracketing structure. Referring orquantified expressions that mayincludeadjectivesandpossessiveNPsorPPs arealsobracketedassingleconstituents(e.g.,[the president of France ]), unless the semantic relations reflected by the internal embedding are part of the predication of the sentence. A few more specific annotation rules were specified for cases likecoordination anddiscontinuous constituents.</Paragraph>
    <Paragraph position="2"> The correspondence relation (B) is guided by semantic correspondence of the bracketed units; the mapping need not preserve the tree structure.</Paragraph>
    <Paragraph position="3"> Neither does a constituent need to have a correspondent in all (or any) of the other languages (since the content of this constituent may be implicitinotherlanguages, orsubsumedbythecontentofanotherconstituent). &amp;quot;Semanticcorrespondence&amp;quot;isnotrestrictedtotruth-conditional equivalence, but is generalized to situations where two unitsjustservethesamerhetorical function inthe originaltextandthetranslation.</Paragraph>
    <Paragraph position="4"> Figure 5 is an annotation example. Note that index 4 (the audience addressed by the speaker) is realized overtly only in German (Sie 'you'); in Spanish, index 3 is realized only in the verbal inflection(whichisnotannotated). Amoredetailed discussion of the annotation scheme is presented in(KuhnandJellinghaus, toappear).</Paragraph>
    <Paragraph position="5"> For the current parsing experiments, only the bracketing within each of three languages (English, French, German) is used; the cross-linguistic phrasecorrespondences areignored (although we intend to include them in future experiments). We automatically tagged the training and test data in English, French, and German withSchmid'sdecision-tree part-of-speech tagger (Schmid,1994).</Paragraph>
    <Paragraph position="6">  ThetrainingdataweretakenfromthesentencealignedEuroparlcorpusandconsistedof188sen- null tences for each of the three languages, with max- null 95%confidenceintervals.</Paragraph>
    <Paragraph position="7"> imal length of 21 words in English (French: 38; German: 24)andanaveragelength of14.0words in English (French 16.8; German 13.6). The test data were 50 sentences for each language, picked arbitrarily with the same length restrictions. The training and test data were manually aligned followingtheguidelines. null  For the word alignments used as learning features,weusedGIZA++,relyingonthedefaultpa- null rameters. We trained the alignments on the full Europarl corpus for both directions of each languagepair. null As a baseline system we trained Bikel's reimplementation (Bikel, 2004) of Collins' parser (Collins, 1999) on the gold standard (En null A subset of 39 sentences was annotated by two people independently,leadingtoanF-Scoreinbracketingagreement between84and90forthethreelanguages. Sincefindingan annotationschemethatworkswellinthebootstrapping set-up isanissue on ourresearch agenda, wepostpone amore detailed analysis of the annotation process until it becomes clearthataparticularschemeisindeeduseful.</Paragraph>
    <Paragraph position="8"> glish) training data, applying a simple additional  smoothingprocedureforthemodifiereventsinordertocounteractsomeobviousdatasparsenessis- null sues.</Paragraph>
    <Paragraph position="9">  Since we were attempting to learn unlabeled trees, in this experiment we only needed to learn the probabilistic model of Section 3 with no labeling schemes. Hence weneed only to learn the  In other words, we need to learn the probability that a given span is a tree constituent, given someset offeatures ofthe words andpreterminal tagsofthesentences, aswellastheprevious span decisions we have made. The main decision that   Forthenonterminallabels,wedefinedtheleft-mostlexical daughter in each local subtree of depth 1 to project its part-of-speech category to the phrase level and introduced aspecial nonterminal label fortherarecase ofnonterminal nodesdominatingnopreterminalnode.</Paragraph>
    <Paragraph position="10">  remains,then,iswhichfeaturesettouse. Thefeaturesweemployareverysimple. Namely,forspan (i, j) we consider the preterminal tags of words i [?] 1, i, j,andj +1, as well as the French and German preterminal tags of the words to which these English words align. Finally, we also use the length of the span as a feature. The features considered aresummarizedinFigure6.</Paragraph>
    <Paragraph position="11"> To learn the conditional probability distribututions, we choose to use maximum entropy models because of their popularity and the availability of software packages. Specifically, we use the MEGAM package (Daum'e III, 2004) from USC/ISI.</Paragraph>
    <Paragraph position="12"> We did experiments for a number of different feature sets, with and without alignment features.  Theresults(precision,recall,F-score,andthepercentageofsentenceswithnocross-bracketing) are summarized in Figure 7. Note that with a very simplesetoffeatures (theprevious, first,last, and next preterminal tags of the sequence), ourparser performs on par with the Bikel baseline. Adding the length of the sequence as a feature increases the quality of the parser to a statistically significant difference over the baseline. The crosslingual information provided (which is admittedly naive) does not provide a statistically significant improvement overthe vanilla set offeatures. The conclusion tobedrawnisnotthatcrosslingual informationdoesnothelp(suchaconclusionshould null not be drawn from the meager set of crosslingual  featureswehaveusedherefordemonstrationpurposes). Rather, the take-away point is that such information can be easily incorporated using this framework.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML