XML Viewer - p05-1039

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1039_intro.xml
Size: 11,834 bytes
Last Modified: 2025-10-06 14:03:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1039">
  <Title>and smoothing</Title>
  <Section position="4" start_page="0" end_page="316" type="intro">
    <SectionTitle>
2 Data
</SectionTitle>
    <Paragraph position="0"> The parsing models we present are trained and tested on the NEGRA corpus (Skut et al., 1997), a hand-parsed corpus of German newspaper text containing approximately 20,000 sentences. It is available in several formats, and in this paper, we use the Penn Treebank (Marcus et al., 1993) format of NEGRA.</Paragraph>
    <Paragraph position="1"> The annotation used in NEGRA is similar to that used in the English Penn Treebank, with some differences which make it easier to annotate German syntax. German's flexible word order would have required an explosion in long-distance dependencies (LDDs) had annotation of NEGRA more closely resembled that of the Penn Treebank. The NEGRA designers therefore chose to use relatively flat trees, encoding elements of flexible word order us- null ing grammatical functions (GFs) rather than LDDs wherever possible.</Paragraph>
    <Paragraph position="2"> To illustrate flexible word order, consider the sentences Der Mann sieht den Jungen ('The man sees the boy') and Den Jungen sieht der Mann. Despite the fact the subject and object are swapped in the second sentence, the meaning of both are essentially the same.1 The two possible word orders are disambiguated by the use of the nominative case for the subject (marked by the article der) and the accusative case for the object (marked by den) rather than their position in the sentence.</Paragraph>
    <Paragraph position="3"> Whenever the subject appears after the verb, the non-standard position may be annotated using a long-distance dependency (LDD). However, as mentioned above, this information can also be retrieved from the grammatical function of the respective noun phrases: the GFs of the two NPs above would be 'subject' and 'accusative object' regardless of their position in the sentence. These labels may therefore be used to recover the underlying dependencies without having to resort to LDDs. This is the approach used in NEGRA. It does have limitations: it is only possible to use GF labels instead of LDDs when all the nodes of interest are dominated by the same parent. To maximize cases where all necessary nodes are dominated by the same parent, NEGRA uses flat 'dependency-style' rules. For example, there is no VP node when there is no overt auxiliary verb. category. Under the NEGRA annotation scheme, the first sentence above would have</Paragraph>
    <Section position="1" start_page="314" end_page="315" type="sub_section">
      <SectionTitle>
3.1 Model
</SectionTitle>
      <Paragraph position="0"> As explained above, this paper focuses on unlexicalized grammars. In particular, we make use of probabilistic context-free grammars (PCFGs; Booth (1969)) for our experiments. A PCFG assigns each context-free rule LHS a0 RHS a conditional probability Pra1 RHSa2LHSa3 . If a parser were to be given POS tags as input, this would be the only distribution 1Pragmatically speaking, the second sentence has a slightly different meaning. A better translation might be: 'It is the boy the man sees.' required. However, in this paper we are concerned with the more realistic problem of accepting text as input. Therefore, the parser also needs a probability distribution Pwa1 wa2LHSa3 to generate words. The probability of a tree is calculated by multiplying the probabilities all the rules and words generated in the derivation of the tree.</Paragraph>
      <Paragraph position="1"> The rules are simply read out from the treebank, and the probabilities are estimated from the frequency of rules in the treebank. More formally:</Paragraph>
      <Paragraph position="3"> The probabilities of words given tags are similarly estimated from the frequency of word-tag cooccurrences: null</Paragraph>
      <Paragraph position="5"> To handle unseen or infrequent words, all words whose frequency falls below a threshold Ohm are grouped together in an 'unknown word' token, which is then treated like an additional word. For our experiments, we use Ohm a4 10.</Paragraph>
      <Paragraph position="6"> We consider several variations of this simple model by changing both Pr and Pw. In addition to the standard formulation in Equation (1), we consider two alternative variants of Pr. The first is a Markov context-free rule (Magerman, 1995; Charniak, 2000). A rule may be turned into a Markov rule by first binarizing it, then making independence assumptions on the new binarized rules. Binarizing</Paragraph>
      <Paragraph position="8"> Making the 2nd order Markov assumption 'forgets' everything earlier then 2 previous sisters. A rule would now be in the form ABi</Paragraph>
      <Paragraph position="10"> Bi, and the probability would be:</Paragraph>
      <Paragraph position="12"> The other rule type we consider are linear precedence/immediate dominance (LP/ID) rules (Gazdar et al., 1985). If a context-free rule can be thought of as a LHS token with an ordered list of tokens on the RHS, then an LP/ID rule can be thought of as a LHS token with a multiset of tokens on the RHS together with some constraints on the possible orders of tokens on the RHS. Uszkoreit (1987) argues that LP/ID rules with violatable 'soft' constraints are suitable for modelling some aspects of German word order. This makes a probabilistic formulation of LP/ID rules ideal: probabilities act as soft constraints. null Our treatment of probabilistic LP/ID rules generate children one constituent at a time, conditioning upon the parent and a multiset of previously generated children. Formally, the the probability of the rule is approximated as:</Paragraph>
      <Paragraph position="14"> In addition to the two additional formulations of the Pr distribution, we also consider one variant of the Pw distribution, which includes the suffix analysis. It is important to clarify that we only change the handling of uncommon and unknown words; those which occur often are handled as normal. suggested different choices for Pw in the face of unknown words: Schiehlen (2004) suggests using a different unknown word token for capitalized versus uncapitalized unknown words (German orthography dictates that all common nouns are capitalized) and Levy and Manning (2004) consider inspecting the last letter the unknown word to guess the part-of-speech (POS) tags. Both of these models are relatively impoverished when compared to the approaches of handling unknown words which have been proposed in the POS tagging literature. Brants (2000) describes a POS tagger with a highly tuned suffix analyzer which considers both capitalization and suffixes as long as 10 letters long. This tagger was developed with German in mind, but neither it nor any other advanced POS tagger morphology analyzer has ever been tested with a full parser. Therefore, we take the novel step of integrating this suffix analyzer into the parser for the second Pw distribution. null</Paragraph>
    </Section>
    <Section position="2" start_page="315" end_page="316" type="sub_section">
      <SectionTitle>
3.2 Treebank Re-annotation
</SectionTitle>
      <Paragraph position="0"> Automatic treebank transformations are an important step in developing an accurate unlexicalized parser (Johnson, 1998; Klein and Manning, 2003).</Paragraph>
      <Paragraph position="1"> Most of our transformations focus upon one part of the NEGRA treebank in particular: the GF labels.</Paragraph>
      <Paragraph position="2"> Below is a list of GF re-annotations we utilise: Coord GF In NEGRA, a co-ordinated accusative NP rule might look like NP-OA a0 NP-CJ KON NP-CJ. KON is the POS tag for a conjunct, and CJ denotes the function of the NP is a coordinate sister. Such a rule hides an important fact: the two co-ordinate sisters are also accusative objects. The Coord GF re-annotation would therefore replace the above rule with NP-OA a0 NP-OA KON NP-OA.</Paragraph>
      <Paragraph position="3"> NP case German articles and pronouns are strongly marked for case. However, the grammatical function of all articles is usually NK, meaning noun kernel. To allow case markings in articles and pronouns to 'communicate' with the case labels on the GFs of NPs, we copy these GFs down into the POS tags of articles and pronouns. For example, a rule like NP-OA a0 ART-NK NN-NK would be replaced by NP-OA a0 ART-OA NN-NK. A similar improvement has been independently noted by Schiehlen (2004).</Paragraph>
      <Paragraph position="4"> PP case Prepositions determine the case of the NP they govern. While the case is often unambiguous (i.e. f&amp;quot;ur 'for' always takes an accusative NP), at times the case may be ambiguous. For instance, in 'in' may take either an accusative or dative NP.</Paragraph>
      <Paragraph position="5"> We use the labels -OA, -OD, etc. for unambiguous prepositions, and introduce new categories AD (accusative/dative ambiguous) and DG (dative/genitive ambiguous) for the ambiguous categories. For example, a rule such as PP a0 P ART-NK NN-NK is replaced with PP a0 P-AD ART-AD NN-NK if it is headed by the preposition in.</Paragraph>
      <Paragraph position="6"> SBAR marking German subordinate clauses have a different word order than main clauses. While subordinate clauses can usually be distinguished from main clauses by their GF, there are some GFs which are used in both cases. This transformation adds an SBAR category to explicitly disambiguate these  cases. The transformation does not add any extra nonterminals, rather it replaces rules such as S a0 KOUS NP V NP (where KOUS is a complementizer POS tag) with SBAR a0 KOUS NP V NP.</Paragraph>
      <Paragraph position="7"> S GF One may argue that, as far as syntactic disambiguation is concerned, GFs on S categories primarily serve to distinguish main clauses from subordinate clauses. As we have explicitly done this in the previous transformation, it stands to reason that the GF tags on S nodes may therefore be removed without penalty. If the tags are necessary for semantic interpretation, presumably they could be re-inserted using a strategy such as that of Blaheta and Charniak (2000) The last transformation therefore removes the GF of S nodes.</Paragraph>
    </Section>
    <Section position="3" start_page="316" end_page="316" type="sub_section">
      <SectionTitle>
3.3 Method
</SectionTitle>
      <Paragraph position="0"> To allow comparisons with earlier work on NEGRA parsing, we use the same split of training, development and testing data as used in Dubey and Keller (2003). The first 18,602 sentences are used as training data, the following 1,000 form the development set, and the last 1,000 are used as the test set. We remove long-distance dependencies from all sets, and only consider sentences of length 40 or less for efficiency and memory concerns. The parser is given untagged words as input to simulate a realistic parsing task. A probabilistic CYK parsing algorithm is used to compute the Viterbi parse.</Paragraph>
      <Paragraph position="1"> We perform two sets of experiments. In the first set, we vary the rule type, and in the second, we report the additive results of the treebank re-annotations described in Section 3.2. The three rule types used in the first set of experiments are standard CFG rules, our version of LP/ID rules, and 2nd order Markov CFG rules. The second battery of experiments was performed on the model with Markov rules.</Paragraph>
      <Paragraph position="2"> In both cases, we report PARSEVAL labeled  with Markov rules.</Paragraph>
      <Paragraph position="3"> bracket scores (Magerman, 1995), with the brackets labeled by syntactic categories but not grammatical functions. Rather than reporting precision and recall of labelled brackets, we report only the F-score, i.e. the harmonic mean of precision and recall.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML