File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2305_metho.xml

Size: 21,429 bytes

Last Modified: 2025-10-06 14:10:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2305">
  <Title>Robust Parsing: More with Less</Title>
  <Section position="3" start_page="26" end_page="30" type="metho">
    <SectionTitle>
3 Weighted Constraint Dependency
Grammar
</SectionTitle>
    <Paragraph position="0"> In WCDG (Schr&amp;quot;oder, 2002), natural language is modelled as labelled dependency trees, in which each word is assigned exactly one other word as its regent (only the root of the syntax tree remains unsubordinated) and a label that describes the nature of their relation.</Paragraph>
    <Paragraph position="1"> The set of acceptable trees is defined not by way of generative rules, but only through constraints on well-formed structures. Every possible dependency tree is considered correct unless one of its edges or edge pairs violates a constraint. This permissiveness extends to many properties that other grammar formalisms consider non-negotiable; for instance, a WCDG can allow non-projective(or, indeed, cyclical) dependenciessimply by not forbidding them. Since the constraints can be arbitrary logical formulas, a grammar rule can also allow some types of non-projective relations and forbid others, and in fact the grammar in question does precisely that.</Paragraph>
    <Paragraph position="2"> Weighted constraints can be written to express the fact that a construction is considered acceptable but not fully so. This mechanismis used extensivelyto achieve robustness against proper errors such as wrong inflection, ellipsis or mis-ordering;all of these are in fact expressed through defeasible constraints. But it can also express more subtle dispreferences against a specific  phenomenonbywritingonlyaweakconstraintthatforbids it; most of the phenomena listed in Table 1 are associated with such constraints to ensure that the parser assumes a rare construction only when this is necessary. null We employ a previously existing wide-coverage WCDG of modern German (Foth et al., 2005) that covers all of the presented rare phenomena. It comprises about 1,000 constraints, 370 of which are hard constraints. The entire parser and the grammar of German are publicly available at http://nats-www.informatik.</Paragraph>
    <Paragraph position="3"> uni-hamburg.de/Papa/PapaDownloads.</Paragraph>
    <Paragraph position="4"> The optimal structure could be defined as the tree that violates the least importantconstraint (as in Optimality Theory),or the tree that violates the fewest constraints; in fact a multiplicative measure is used that combines bothaspectsbyminimizingthecollectivedispreference for all phenomena in a sentence. Unfortunately, the resultingcombinatorialproblemisNP-completeandad- null mits of no efficient exact solution algorithm. However, variants of a heuristic local search can be used, which try to find the optimal tree by constructing a complete treeandthenchangingitin thoseplacesthatviolateimportant constraints. This involves a trade-off between parsing accuracy and processing time, because the correct structure is more likely to be foundif there is more time to try out more alternatives. Given enough time, the method works well enough that the overall system exhibits a competitive accuracy even though the theoreticalaccuracyofthe languagemodelmaybecompromised by search errors.</Paragraph>
    <Paragraph position="5"> As an example of the process, consider the following analysis of the German proverb &amp;quot;Wer anderen eine Grube gr&amp;quot;abt, f&amp;quot;allt selbst hinein.&amp;quot; (He who digs a hole for others, will fall into it himself.) The transformation starts with the following initial assumption  wer anderen eine Grube grabt , fallt selbst hinein .</Paragraph>
    <Paragraph position="6"> global score: 0.000001892 which, besides producing two isolated fragments instead of a spanning tree, also lacks a subject for the second clause.</Paragraph>
    <Paragraph position="7">  wer anderen eine Grube grabt , fallt selbst hinein .</Paragraph>
    <Paragraph position="8"> global score: 0.0001888 To mend this problem the relative pronoun from the first clause has been taken as a subject for the second one, with the result that the conflict has simply been moved to the first part of the sentence. Nevertheless, the global score improved considerably, since the verb-second condition for German main clauses is violated less often.</Paragraph>
    <Paragraph position="9">  wer anderen eine Grube grabt , fallt selbst hinein .</Paragraph>
    <Paragraph position="10"> global score: 0.0004871 Here, the indefinite plural pronoun 'anderen' is taken as the subject for the second clause, creating, however, an agreement error with the finite verb, which is singular. Both subclauses have still not been integrated into a single spanning tree.</Paragraph>
    <Paragraph position="11">  wer anderen eine Grube grabt , fallt selbst hinein .</Paragraph>
    <Paragraph position="12"> global score: 0.002566 The integration is then achieved, but unfortunately as a coordination without an appropriate conjunction being available. Moreover there is a problem with the hypothesized main clause, since it again does not obey  wer anderen eine Grube grabt , fallt selbst hinein .</Paragraph>
    <Paragraph position="13"> global score: 0.1026 Therefore the interpretation is changed to a relative clause, which however cannot appear in isolation.</Paragraph>
    <Paragraph position="14"> The valency requirements of the verb 'gr&amp;quot;abt' are satisfied by taking the indefinite pronoun 'anderen' as  wer anderen eine Grube grabt , fallt selbst hinein .</Paragraph>
    <Paragraph position="15"> global score: 0.5502 Finally, the analysis switches to an interpretation which accepts the second part of the sentence as the main clause and subordinates the first part as a subject clause. The problem with the apposition reading persists.</Paragraph>
    <Paragraph position="16">  wer anderen eine Grube grabt , fallt selbst hinein . global score: 0.7249 By interpreting the indefinite pronoun as an ethical dative, the direct object valence is freed for the NP 'eine Grube'. Although this structure still violates some constraints (e.g. the ethical dative is slightly penalized for being somewhat unusual) a better one cannot be found. Note that the algorithm does not take the shortest possible transformationsequence; in fact, the first analysis could have been transformed directly into the last by only one exchange. Because the algorithm is greedy, it chooses a different repair at that point, but it still finds the solutionin aboutthreesecondsona 3 GHz Pentium machine.</Paragraph>
    <Paragraph position="17"> In contrast to stochastic parsing approaches, a WCDG can be modified in a specifically targeted manner. It thereforeprovidesus with a grammarformalism which is particularlywell suited to precisely measure the contributions of different linguistic knowledge sources to the overall parsing quality. In particular it allows us to  1. switchoffconstraints,i.e.increasethespaceofacceptable constructions and/or syntactic structures, 2. weaken constraints, by changing the weight in a way that it makes the violation of the constraint condition more easily acceptable, 3. introduce additional dependency labels into the model, 4. remove existing dependency labels from the model 5. reinforce constraints, by removing guards for exceptional cases from them, 6. reinforce constraints, by strengthening their  weights or making the constraint non-defeasible in the extreme case, and 7. introducing new constraints, to prohibit certain constructions and/or syntactic structures.</Paragraph>
    <Paragraph position="18"> Since for the purpose of our experiments, we start with a fairly broad-coveragegrammarofGerman,from  Ingeneral,itis noteasy to predictthe possible outcome of a parsing run when using a grammar with a reduced coverage. Whether a sentence can be analysed at all solely depends on the available alternatives for structuring it. Which structural description it can receive, however, is influenced by the scores resulting from rule applications or constraint violations. Moreover, the transformation-based solution method used for the WCDG-experiments introduces yet another condition: since it is based on a limited heuristics for candidate generation, the grammar must license not only the final parsing result for a sentence, but also all the intermediate transformation steps with a sufficiently high score. This might exclude some structural interpretations from being considered at all if the grammar is not tolerant enough to accommodate highly deviant structures. null  Thus, the ability to deal with extragrammatical input in a robust manner is a crucial property if we are going to use a grammar with coverage limitations. Unfortunately,robust behaviouris usually achieved by extending instead of reducing the coverage of the model and compensating the resulting increase in ambiguity by an appropriately designed scoring scheme together with an optimization procedure.</Paragraph>
    <Paragraph position="19"> To deal with these opposing tendencies, it is obviously important to determine which parts of the model need to be relaxed to achieve a sufficient degree of robustness, and which ones can be reinforced to limit the space of alternatives in a sensible way. Excluding phenomena from the grammar which never occur in a corpusshouldalwaysgivean advantage,sincethisreduces thenumberofalternativestoconsiderateachstep without forbidding any of the correct ones.</Paragraph>
    <Paragraph position="20"> On the other hand, removing support for a construction that is actually needed forces the parser to choose an incorrect solution for at least some part of a sentence, so that a deterioration might occur instead. But evenif coverageis reducedbelow the strictly necessary amount, a net gain in accuracy could occur for two reasons: null  1. Leaking: The grammar overgenerates the constructionin question,so thatforbiddingit prevents errors occurring on 'normal' sentences.</Paragraph>
    <Paragraph position="21"> 2. Focussing: Due to a more restricted search space,  theparserisnotledastraybyrarehypotheses,thus saving processingtime which can be used to come closer to the optimum.</Paragraph>
    <Section position="1" start_page="28" end_page="28" type="sub_section">
      <SectionTitle>
4.1 Experiment 1: More with less
</SectionTitle>
      <Paragraph position="0"> In our first experiment, we analysed 10,000 sentences of online newscast texts both with the normal grammar andwiththe21rarephenomenaexplicitlyexcluded.As usual for dependency parsers, we measure the parsing quality by computing the structural accuracy (the ratio of correct subordinations to all subordinations) and labelled accuracy (the ratio of all correct subordinations that also bear the correct label to all subordinations).</Paragraph>
      <Paragraph position="1"> Note that the WCDG parser always establishes exactly one subordination for each word of a sentence, so that nodistinctionbetweenprecisionandrecallarises.Also, the grammar is written in such a way that even if a necessary phenomenon is removed, the parser will at least find some analysis, so that the coverage is always 100%.</Paragraph>
      <Paragraph position="2"> As expected, those 'rare' sentences in which at least one of these constructions does actually occur are analyzed less accurately than before: structural and labelled accuracy drop by about 2 percent points (see Table 3). However, the other sentences receive slightly better analyses, and since they are in the greatmajority, the overall effect is an increase in parsing quality. Note also that the 'rare'sentencesappearto be more difficult to analyze in the first place.</Paragraph>
      <Paragraph position="3">  the same text with reduced coverage.</Paragraph>
      <Paragraph position="4"> The net gain in accuracy might be due to plugged leaks (misleadingstructuresthatusedtobefoundarerejected infavorofcorrectstructures)ortofocussing(structures thatwerepreferredbutmissedthroughsearcherrorsare now found). A point in case of the latter explanation is the fact that the average runtime decreases by 10% with the reduced grammar. Also, if we consider only those sentences on which the local search originally exceeded the time limit of 500 s and therefore had to be interrupted,the accuracy rises from 85.2%/83.0%to 86.5%/84.4%, i.e. even more pronounced than overall.</Paragraph>
    </Section>
    <Section position="2" start_page="28" end_page="29" type="sub_section">
      <SectionTitle>
4.2 Experiment 2: Stepwise refinement
</SectionTitle>
      <Paragraph position="0"> For comparison with previous work and to investigate corpus-specific effects, we repeated the experiment with the test set of the NEGRA corpus as defined by (Dubey and Keller, 2003). For that purpose the NEGRA annotations were automatically transformed to dependencytrees with the freely available tool DEPSY (Daum et al., 2004). Some manual corrections were made to its output to conform to the annotation guidelines of the WCDG of German; altogether, 1% of all words had their regents changed for this purpose.</Paragraph>
      <Paragraph position="1"> Table3showsthattheproportionofsentenceswith rare phenomena is somewhat higher in the NEGRA sentences, and consequently the net gain in parsing accuracy is smaller; apparently the advantage of reducing the problem size is almost cancelled by the disadvantage of losing necessary coverage.</Paragraph>
      <Paragraph position="2"> To test this theory, we then reduced the coverage of the grammar in smaller steps. Since constraints allow us to switch off each of the 21 rare phenomena individually, we can test whether the effects of reducing coverage are merely due to the smaller number of alternatives to consider or whether some constructions affect the parser more than others, if allowed.</Paragraph>
      <Paragraph position="3"> We first took the first 3,000 sentences of the NEGRA corpus as a training set and counted how often each construction actually occurs there and in the test set.</Paragraph>
      <Paragraph position="4"> Table 4 shows that the two parts of the corpus, while different, seem similar enough that statistics obtained  on the one could be useful for processing the other.</Paragraph>
      <Paragraph position="5"> The test set was then parsed again with the coverage  successivelyreducedinseveralsteps:first,allconstructions were removed that never occur in the training set, then those which occur less than 10 times or 100 times respectively were also removed. We also performed the opposite experiment, first removing support for the least rare phenomena and only then for the really rare ones.</Paragraph>
      <Paragraph position="6">  Table 5 shows the results of parsing the test set in this way (the first and last lines are repetitions from Table 3). The resulting effects are very small, but they do suggest that removing coverage for the very rare constructions is somewhat more profitable: the first three new experiments tend to yield better accuracy than the original grammar, while in the last three it tends to drop.</Paragraph>
    </Section>
    <Section position="3" start_page="29" end_page="30" type="sub_section">
      <SectionTitle>
4.3 Experiment 3: Plugging known leaks
</SectionTitle>
      <Paragraph position="0"> The previous experiment used only counts from the treebank annotations to determine how rare a phenomenon is supposed to be, but it might also be important how rare the parser actually assumes it to be.</Paragraph>
      <Paragraph position="1"> The fact that a particular construction never occurs in a corpus does not prevent the parser from using it in its analyses, perhapsmore often than another construction that is much more common in the annotations. In other words, we should measure how much each construction actually leaks. To this end, we parsed the training set with the original grammar and grouped all 21 phenomena into three classes: A: Phenomena that are predicted much more often than they are annotated B: Phenomena that are predicted roughly the right number of times C: Phenomena that are predicted less often than annotated (or in fact not at all).</Paragraph>
      <Paragraph position="2"> 'Much more often' here means 'by a factor of two or more';constructionswhichwereneverpredictedor annotated at all were grouped into class C.</Paragraph>
      <Paragraph position="3"> There are different reasons why a phenomenon might leak more or less. Some constructions depend on particular combinations of word forms in the input; for instance, an auxiliary flip can only be predicted when the finite verb does in fact precede the full verb (phenomenon 12 in Table 1), so that covering it should not change the behaviour of the system much. But most sentencescontainmorethanonenounphrasewhichthe parser might possibly misrepresent as a non-projective extraposition (phenomenon 1). Also, some rare phenomena are dispreferred more than others even when they are allowed. We did not investigate these reasons in detail.</Paragraph>
      <Paragraph position="4">  structions, 14 regularly leak into sentences where they have no place, while 4 work more or less as designed. Only3arepredictedtooseldom.Thisisconsistentwith our earlier interpretationthat most added coverageis in fact unhelpful when judging a parser solely by its empirical accuracy on a corpus.</Paragraph>
      <Paragraph position="5">  Accordingly, it is in fact more helpful to judge constructions by their observed tendency to leak than just by their annotated frequency: the first experiment (A) yields the highest accuracy for the newspaper text.</Paragraph>
      <Paragraph position="6"> Conversely, removing those constructions which actuallyworklargelyasintended(B)reduceseventheover- null all accuracy, and not just the accuracy on 'rare' sentences. The third class contains only three very rare phenomena, and removing them from the grammar does not influence parsing very much at all.</Paragraph>
      <Paragraph position="7"> Note that thisresult was obtainedalthoughthe distribution of the phenomena differs between parser predictions on the training set and the test set; had we classified them according to their behaviour on the test set itself,theclassAwouldhavecontainedonly9items(of which 7 overlap with the classification actually used).</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="30" end_page="30" type="metho">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> The fact that leaking is an ubiquitous property of natural language grammars has been noted as early as 80 years ago by (Sapir, 1921). Since no precise definition was given, the notion offers room for interpretation. In general linguistics, leaking is usually understood as the underlying reason for the apparent impossibility to write a grammar which is complete, in the sense that it covers all sentences of a language, while maintaininga precise distinction betweencorrect an incorrect word form sequences (see e.g. (Sampson, forthcoming)). In Computational Linguistics, attention was first drawn to the resulting consequences for obtaining parse trees when it became obvious that all attempts to build wide-coverage grammars led to an increase in output ambiguity, and that even more fine-grained feature-based descriptions were not able solve the problem. Stochastic approachesare usually considered to provide a powerful countermeasure (Manning and Sch&amp;quot;utze, 1999). However, as (Steedman, 2004) alreadynoted,stochastic modelsdo notaddressthe problem of overgenerationdirectly.</Paragraph>
    <Paragraph position="1"> Disregarding rare phenomena is something that can be achieved in a stochastic frameworkby putting a threshold on the minimum number of occurrences to be considered. Such an approach is mainly used to either exclude rare phenomena in grammar induction (c.f. (Solsona et al., 2002)) or to prune the search space by adjusting a beam width during parsing itself (Goodman, 1997). The direct use of thresholding techniques at the level of the stochastic model, however,has not been investigated extensively so far. Stochastic models of syntax suffer to such a degree from data sparseness that in effect strong efforts in the opposite direction become necessary: instead of ignoring rare events in the training data,evenunseeneventsare includedbysmoothing techniques. The only experimental investigation of the impact of rare events we are aware of is (Bod, 2003), where heuristics are explored to constrain the model in the DOP framework by ignoring certain tree fragments. Contrary to the results of our experiments, very few constraints have been found that do not decrease the parse accuracy. In particular, no improvement by disregarding selected observations was possible.</Paragraph>
    <Paragraph position="2"> The tradeoff between processing time and output quality which our transformation-based problem solving strategy exhibits, is also a fundamental property of all beam-search procedures. While a limited beam width might cause search errors, widening the beam in orderto improvethequalityrequiresinvestingmorecomputational resources (see e.g. (Collins, 1999)). In contrast to our transformation-based procedure, however, the commonly used Viterbi search is not interruptible and therefore not in a position to really profit from the tradeoff. Thus, focussing as a possibility to increase output quality to our knowledge has never been investigated elsewhere.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML