XML Viewer - p04-3008

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-3008_metho.xml
Size: 11,156 bytes
Last Modified: 2025-10-06 14:09:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3008">
  <Title>Interactive grammar development with WCDG</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The WCDG parsing system
</SectionTitle>
    <Paragraph position="0"> The WCDG formalism (Schrcurrency1oder, 2002) describes natural language exclusively as dependency structure, i.e. ordered, labelled pairs of words in the input text. It performs natural language analysis under the paradigm of constraint optimization, where the analysis that best conforms to all rules of the grammar is returned. The rules are explicit descriptions of well-formed tree structures, allowing a modular and ne-grained description of grammatical knowledge. For instance, rules in a grammar of English would state that subjects normally precede the nite verb and objects follow it, while temporal NP can either precede or follow it.</Paragraph>
    <Paragraph position="1"> In general, these constraints are defeasible, since many rules about language are not absolute, but can be preempted by more important rules. The strength of constraining information is controlled by the grammar writer: fundamental rules that must always hold, principles of different import that have to be weighed against each other, and general preferences that only take effect when no other disambiguating knowledge is available can all be formulated in a uniform way. In some cases preferences can also be used for disambiguation by approximating information that is currently not available to the system (e.g. knowledge on attachment preferences).</Paragraph>
    <Paragraph position="2"> Even the very weak preferences have an in uence on the parsing process; apart from serving as tiebreakers for structures where little context is available (e.g. with fragmentary input), they provide an  initial direction for the constraint optimization process even if they are eventually overruled. As a consequence, even the best structure found usually incurs some minor constraint violations; as long as the combined evidence of these default expectation failures is small, the structure can be regarded as perfectly grammatical.</Paragraph>
    <Paragraph position="3"> The mechanism of constraint optimization simultaneously achieves robustness against extra-grammatical and ungrammatical input. Therefore WCDG allows for broad-coverage parsing with high accuracy; it is possible to write a grammar that is guaranteed to allow at least one structure for any kind of input, while still preferring compliant over deviant input wherever possible. This graceful degradation under reduced input quality makes the formalism suitable for applications where deviant input is to be expected, e.g. second language learning. In this case the potential for error diagnosis  is also very valuable: if the best analysis that can be found still violates an important constraint, this directly indicates not only where an error occurred, but also what might be wrong about the input.</Paragraph>
    <Paragraph position="4"> 3 XCDG: A Tool for Parsing and</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Modelling
</SectionTitle>
      <Paragraph position="0"> An implementation of constraint dependency grammar exists that has the character of middleware to allow embedding the parsing functionality into other natural language applications. The program XCDG uses this functionality for a graphical tool for grammar development.</Paragraph>
      <Paragraph position="1"> In addition to providing an interface to a range of different parsing algorithms, graphical display of grammar elements and parsing results is possible; for instance, the hierarchical relations between possible attributes of lexicon items can be shown.</Paragraph>
      <Paragraph position="2"> See Figure 1 for an excerpt of the hierarchy of German syntactical categories used; the terminals correspond to those used the Stuttgart-Tcurrency1ubingen Tagset of German (Schiller et al., 1999).</Paragraph>
      <Paragraph position="3"> More importantly, mean and end results of parsing runs can be displayed graphically. Dependency structures are represented as trees, while additional relations outside the syntax structure are shown as arcs below the tree (see the referential relationship REF in Figure 2). As well as end results, intermediate structures found during parsing can be displayed. This is often helpful in understanding the behaviour of the heuristic solution methods employed. null Together with the structural analysis, instances of broken rules are displayed below the dependency graph (ordered by decreasing weights), and the dependencies that trigger the violation are highlighted on demand (in our case the PP-modi cation between the preposition in and the in nite form verkaufen). This allows the grammar writer to easily check whether or not a rule does in fact make the distinction it is supposed to make. A unique identi er attached to each rule provides a link into the grammar source le containing all constraint de nitions. The unary constraint 'mod-Distanz' in the example of Figure 2 is a fairly weak constraint which penalizes attachments the stronger the more distant a dependent is placed from its head. Attaching the preposition to the preceding noun Bund would be preferred by this constraint, since the distance is shorter. However, it would lead to a more serious constraint violation because noun attachments are generally dispreferred.</Paragraph>
      <Paragraph position="4"> To facilitate such experimentation, the parse window doubles as a tree editor that allows structural, lexical and label changes to be made to an analysis by drag and drop. One important application of the integrated parsing and editing tool is the creation of large-scale dependency treebanks. With the ability to save and load parsing results from disk, automatically computed analyses can be checked and hand-corrected where necessary and then saved as annotations. With a parser that achieves a high performance on unseen input, a throughput of over 100 annotations per hour has been achieved.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Grammar development with XCDG
</SectionTitle>
    <Paragraph position="0"> The development of a parsing grammar based on declarative constraints differs fundamentally from that of a derivational grammar, because its rules forbid structures instead of licensing them: while a context-free grammar without productions licenses nothing, a constraint grammar without constraints would allow everything. A new constraint must therefore be written whenever two analyses of the same string are possible under the existing constraints, but human judgement clearly prefers one over the other.</Paragraph>
    <Paragraph position="1">  Most often, new constraints are prompted by inspection of parsing results under the existing grammar: if an analysis is computed to be grammatical that clearly contradicts intuition, a rule must be missing from the grammar. Conversely, if an error is signalled where human judgement disagrees, the relevant grammar rule must be wrong (or in need of clarifying exceptions). In this way, continuous improvement of an existing grammar is possible.</Paragraph>
    <Paragraph position="2"> XCDG supports this development style through the feature of hypothetical evaluation. The tree display window does not only show the result returned by the parser; the structure, labels and lexical selections can be changed manually, forcing the parser to pretend that it returned a different analysis. Recall that syntactic structures do not have to be specifically allowed by grammar rules; therefore, every conceivable combination of subordinations, labels and lexical selections is admissible in principle, and can be processed by XCDG, although its score will be low if it contradicts many constraints.</Paragraph>
    <Paragraph position="3"> After each such change to a parse tree, all constraints are automatically re-evaluated and the updated grammar judgement is displayed. In this way it can quickly be checked which of two alternative structures is preferred by the grammar. This is useful in several ways. First, when analysing parsing errors it allows the grammar author to distinguish search errors from modelling errors: if the intended structure is assigned a better score than the one actually returned by the parser, a search error occurred (usually due to limited processing time); but if the computed structure does carry the higher score, this indicates an error of judgement on the part of the grammar writer, and the grammar needs to be changed in some way if the phenomenon is to be modelled adequately.</Paragraph>
    <Paragraph position="4"> If a modelling error does occur, it must be because a constraint that rules against the intended analysis has overruled those that should have selected it. Since the display of broken constraints is ordered by severity, it is immediately obvious which of the grammar rules this is. The developer can then decide whether to weaken that rule or extend it so that it makes an exception for the current phenomenon. It is also possible that the intended analysis really does con ict with a particular linguistic principle, but in doing so follows a more important one; in this case, this other rule must be found and strengthened so that it will overrule the rst one.</Paragraph>
    <Paragraph position="5"> The other rule can likewise be found by re-creating the original automatic analysis and see which of its constraint violations needs to be given more weight, or, alternatively, which entirely new rule must be added to the grammar.</Paragraph>
    <Paragraph position="6"> In the decision whether to add a new rule to a constraint grammar, it must be discovered under what conditions a particular phenomenon occurs, so that a generally relevant rule can be written. The possession of a large amount of analysed text is often useful here to verify decisions based on mere introspection. Working together with an external program to search for speci c structures in large treebanks, XCDG can display multiple sentences in stacked widgets and highlight all instances of the same phenomenon to help the grammar writer decide what the relevant conditions are.</Paragraph>
    <Paragraph position="7"> Using this tool, a comprehensive grammar of modern German has been constructed (Foth, 2004) that employs 750 handwritten well-formedness rules, and has been used to annotate around 25,000 sentences with dependency structure. It achieves a structural recall of 87.7% on sentences from the NE-GRA corpus (Foth et al., submitted), but can be applied to texts of many other types, where structural recall varies between 80 90%. To our knowledge, no other system has been published that achieves a comparable correctness for open-domain German text. Parsing time is rather high due to the computational effort of multidimensional optimization; processing time is usually measured in seconds rather than milliseconds for each sentence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML