File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0401_intro.xml

Size: 16,980 bytes

Last Modified: 2025-10-06 14:01:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0401">
  <Title>A model of syntactic disambiguation based on lexicalized grammars</Title>
  <Section position="2" start_page="0" end_page="2" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recent studies on the automatic extraction of lexicalized grammars (Xia, 1999; Chen and Vijay-Shanker, 2000; Hockenmaier and Steedman, 2002a) allow the modeling of syntactic disambiguation based on linguistically motivated grammar theories including LTAG (Chiang, 2000) and CCG (Clark et al., 2002; Hockenmaier and Steedman, 2002b). However, existing models of disambiguation with lexicalized grammars are a mere extension of lexicalized probabilistic context-free grammars (LPCFG) (Collins, 1996; Collins, 1997; Charniak, 1997), which are based on the decomposition of parsing results into the syntactic/semantic dependencies of two words in a sentence under the assumption of independence of the dependencies. While LPCFG models have proved that the incorporation of lexical associations (i.e., dependencies of words) significantly improves the accuracy of parsing, this idea has been naively inherited in the recent studies on disambiguation models of lexicalized grammars.</Paragraph>
    <Paragraph position="1"> However, the disambiguation models of lexicalized grammars should be totally different from that of LPCFG, because the grammars define the relation of syntax and semantics, and can restrict the possible structure of parsing results. Parsing results cannot simply be decomposed into primitive dependencies, because the complete structure is determined by solving the syntactic constraints of a complete sentence. For example, when we apply a unification-based grammar, LPCFG-like modeling results in an inconsistent probability model because the model assigns probabilities to parsing results not allowed by the grammar (Abney, 1997). We have only two ways of adhering to LPCFG models: preserve the consistency of probability models by abandoning improvements to the lexicalized grammars using complex constraints (Chiang, 2000), or ignore the inconsistency in probability models (Clark et al., 2002).</Paragraph>
    <Paragraph position="2"> This paper provides a new model of syntactic disambiguation in which lexicalized grammars can restrict the possible structures of parsing results. Our modeling aims at providing grounds for i) producing a consistent probabilistic model of lexicalized grammars, as well as ii) evaluating the contributions of syntactic and semantic preferences to syntactic disambiguation. The model is composed of the syntax and semantics probabilities, which represent syntactic and semantic preferences respectively.</Paragraph>
    <Paragraph position="3"> The syntax probability is responsible for determining the syntactic categories chosen by words in a sentence, and the semantics probability selects the most plausible dependencies of words from candidates allowed by the syntactic categories yielded by the syntax probability. Since the sequence of syntactic categories restricts the possible structure of parsing results, the semantics probability is a conditional probability without decomposition into the primitive dependencies of words. Recently used machine learning methods including maximum entropy models (Berger et al., 1996) and support vector machines (Vapnik, 1995) provide grounds for this type of modeling, because it allows various dependent features to be incorporated into the model without the independence assumption. null The above approach, however, has a serious deficiency: a lexicalized grammar assigns exponentially many parsing results because of local ambiguities in a sentence, which is problematic in estimating the parameters of a probability model. To cope with this, we adopted an algorithm of maximum entropy estimation for feature forests (Miyao and Tsujii, 2002; Geman and Johnson, 2002), which allows parameters to be efficiently estimated. The algorithm enables probabilistic modeling of complete structures, such as transition sequences in Markov models and parse trees, without dividing them into independent sub-events. The algorithm avoids exponential explosion by representing a probabilistic event by a packed representation of a feature space. If a complete structure is represented with a feature forest of a tractable size, the parameters can be efficiently estimated by dynamic programming.</Paragraph>
    <Paragraph position="4"> A series of studies on parsing with wide-coverage LFG (Johnson et al., 1999; Riezler et al., 2000; Riezler et al., 2002) have had a similar motivation to ours. Their models have also been based on a discriminative model to select a parsing result from all candidates given by the grammar. A significant difference is that we apply maximum entropy estimation for feature forests to avoid the inherent problem with estimation: the exponential explosion of parsing results given by the grammar. They assumed that parsing results would be suppressed to a reasonable number through using heuristic rules, or by carefully implementing a fully restrictive and wide-coverage grammar, which requires a considerable amount of effort to develop. Our contention is that this problem can be solved in a more sophisticated way as is discussed in this paper. Another difference is that our model is separated into syntax and semantics probabilities, which will benefit computational/linguistic investigations into the relation between syntax and semantics, and allow separate improvements to both models.</Paragraph>
    <Paragraph position="5"> Overall, the approach taken in this paper is different from existing models in the following respects.</Paragraph>
    <Paragraph position="6"> * Since it does not require the assumption of independence, the probability model is consistent with lexicalized grammars with complex constraints including unification-based grammar formalism. Our model can assign consistent probabilities to parsing results of lexicalized grammars, while the traditional models assign probabilities to parsing results not allowed by the grammar.</Paragraph>
    <Paragraph position="7"> * Since the syntax and semantics probabilities are separate, we can improve them individually. For example, the syntax model can be improved by smoothing using the syntactic classes of words, while the semantics model should be able to be improved by using semantic classes. In addition, the model can be a starting point that allows the theory of syntax and semantics to be evaluated through consulting an extensive corpus.</Paragraph>
    <Paragraph position="8"> We evaluated the validity of our model through experiments on a disambiguation task of parsing the Penn Tree-bank (Marcus et al., 1994) with an automatically acquired LTAG grammar. To assess the contribution of the syntax and semantics probabilities to the accuracy of parsing and to evaluate the validity of applying maximum entropy estimation for feature forests, we compared three models trained with the same training set and the same set of features. Following the experimental results, we concluded that i) a parser with the syntax probability only achieved high accuracy with the lexicalized grammar, ii) the incorporation of preferences for lexical association through the semantics probability resulted in significant improvements, and iii) our model recorded an accuracy that was quite close to the traditional model, which indicated the validity of applying maximum entropy estimation for feature forests.</Paragraph>
    <Paragraph position="9"> In what follows, we first describe the existing models for syntactic disambiguation, and discuss problems with them in Section 2. We then define the general form for parsing results of lexicalized grammars, and introduce our model in Section 3. We prove the validity of our approach through a series of experiments in Section 4.</Paragraph>
    <Paragraph position="10"> 2 Traditional models for syntactic disambiguation This section reviews the existing models for syntactic disambiguation from the viewpoint of representing parsing results of lexicalized grammars. In particular, we discuss how the models incorporate syntactic/semantic preferences for syntactic disambiguation. The existing studies are based on the decomposition of parsing results into primitive lexical dependencies where syntactic/semantic preferences are combined. This traditional scheme of syntactic disambiguation can be problematic with lexicalized grammars. Throughout the discussion, we refer to the example sentence &amp;quot;What does your student want to write?&amp;quot;, whose parse tree is in Figure 1.</Paragraph>
    <Section position="1" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Lexicalized parse trees
</SectionTitle>
      <Paragraph position="0"> The first successful work on syntactic disambiguation was based on lexicalized probabilistic context-free grammar (LPCFG) (Collins, 1997; Charniak, 1997). Although LPCFG is not exactly classified into lexicalized grammar formalism, we should mention these studies since they demonstrated that lexical dependencies were essential to improving the accuracy of parsing.</Paragraph>
      <Paragraph position="1">  A lexicalized parse tree is an extension of a parse tree that is achieved by augmenting each non-terminal with its lexical head. There is an example of a lexicalized parse tree in Figure 2, which is a lexicalized version of the one in Figure 1. A lexicalized parse tree is represented by a set of branchings in the tree</Paragraph>
      <Paragraph position="3"> the head word of a non-head, and r i a grammar rule corresponding to each branching. LPCFG models yield a probability of the complete parse tree T = {&lt;w</Paragraph>
      <Paragraph position="5"> where e is a condition of the probability, which is usually the nonterminal symbol of the mother node. Since each branching is augmented with the lexical heads of non-terminals in the rule, the model can capture lexical dependencies, which increase the accuracy. This is because lexical dependencies approximately represent the semantic preference of a sentence. As is well known, a syntactic structure is not accurately disambiguated only with syntactic preferences, and the incorporation of approximate  For simplicity, we have assumed parse trees are only composed of binary branchings.</Paragraph>
      <Paragraph position="6"> semantic preferences was the key to improving the accuracy of syntactic disambiguation.</Paragraph>
      <Paragraph position="7"> We should note that this model has the following three disadvantages.</Paragraph>
      <Paragraph position="8"> 1. The model fails to represent some linguistic dependencies, including long-distance dependencies and argument/modifier distinctions. Since an existing study incorporates these relations ad hoc (Collins, 1997), they are apparently crucial in accurate disambiguation. This is also problematic for providing a sufficient representation of semantics.</Paragraph>
      <Paragraph position="9"> 2. The model assumes the statistical independence of branchings, which is apparently not preserved. For example, the ambiguity of PP-attachments should be resolved by considering three words: the modifiee of the PP, its preposition, and the object of the PP.</Paragraph>
      <Paragraph position="10"> 3. The preferences of syntax and semantics are combined in the lexical dependencies of two words, i.e., features for syntactic preference and those for semantic preference are not distinguished in the model. Lexicalized grammars formalize the constraints of the relations between syntax and semantics, but the model does not assume the existence of such constraints. The model prevents further improvements to the syntax/semantics models; in addition to the linguistic analysis of the relation between syntax and semantics.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="2" type="sub_section">
      <SectionTitle>
2.2 Derivation trees
</SectionTitle>
      <Paragraph position="0"> Recent work on the automatic extraction of LTAG (Xia, 1999; Chen and Vijay-Shanker, 2000) and disambiguation models (Chiang, 2000) has been the first on the statistical model for syntactic disambiguation based on lexicalized grammars. However, the models are based on the lexical dependencies of elementary trees, which is a simple extension of the LPCFG. That is, the models are still based on decomposition into primitive lexical dependencies. null Derivation trees, the structural description in LTAG (Schabes et al., 1988), represent the association of lexical items i.e., elementary trees. In LTAG, all syntactic constraints of words are described in an elementary tree, and the dependencies of elementary trees, i.e., a derivation tree, describe the semantic relations of words more directly than lexicalized parse trees. For example, Figure 3 has a derivation tree corresponding to the parse tree in Figure 1  . The dotted lines represent substitution while the solid lines represent adjunction. We should note that the relations captured by ad-hoc augmentation  The nodes in a derivation tree are denoted with the names of the elementary trees, while we have omitted details. what does student want to  of lexicalized parse trees, such as the distinction of arguments/modifiers and unbounded dependencies (Collins, 1997), are elegantly represented in derivation trees. Formally, a derivation tree is represented as a set of dependencies: D = {&lt;a</Paragraph>
      <Paragraph position="2"> represents a node in a j where substitution/adjunction has occurred, and r i is a label of the applied rule, i.e., adjunction or substitution. A probability of derivation tree D = {&lt;a</Paragraph>
      <Paragraph position="4"> generally defined as follows (Schabes et al., 1988; Chiang, 2000).</Paragraph>
      <Paragraph position="6"> Note that each probability on the right represents the syntactic/semantic preference of a dependency of two lexical items. We can readily see that the model is very similar to LPCFG models.</Paragraph>
      <Paragraph position="7"> The first problem with LPCFG is partially solved by this model, since the dependencies not represented in LPCFG (e.g., long-distance dependencies and argument/modifier distinctions) are elegantly represented, while some relations (e.g., the control relation between &amp;quot;want&amp;quot; and &amp;quot;student&amp;quot;) are not yet represented. However, the other two problems remain unsolved in this model.</Paragraph>
      <Paragraph position="8"> In particular, when we apply Feature-Based LTAG (FB-LTAG), the above probability is no longer consistent because of the non-local constraints caused by feature unification (Abney, 1997).</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
2.3 Dependency structures
</SectionTitle>
      <Paragraph position="0"> A disambiguation model for wide-coverage CCG (Clark et al., 2002) aims at representing deep linguistic dependencies including long-distance dependencies and control relations. This model can represent all the syntactic/semantic dependencies of words in a sentence. However, the statistical model is still a mere extension of LPCFG, i.e., it is based on decomposition into primitive lexical dependencies.</Paragraph>
      <Paragraph position="1"> In this model, a lexicalized grammar defines the mapping from a sentence into dependency structures, which represent all the necessary dependencies of words in a sentence, including long-distance dependencies and control relations. There is an example in Figure 4, which  corresponds to the parse tree in Figure 1. Note that this representation includes a dependency not represented in the derivation tree (the control relation between &amp;quot;want&amp;quot; and &amp;quot;student&amp;quot;). A dependency structure is formally defined as a set of dependencies: S = {&lt;w</Paragraph>
      <Paragraph position="3"> are a head and argument word of the dependency, and e</Paragraph>
      <Paragraph position="5"> is an argument position of the head word filled by the argument word.</Paragraph>
      <Paragraph position="6"> An existing model assigns a probability value to de-</Paragraph>
      <Paragraph position="8"> Primitive probability is approximated by the relative frequency of lexical dependencies of two words in a training corpus.</Paragraph>
      <Paragraph position="9"> Since dependency structures include all necessary dependency relations, the first problem with LPCFG is now completely solved. However, the third problem still remains unsolved. The probability of a complete parse tree is defined as the product of probabilities of primitive dependencies of two words. In addition, the second problem is getting worse; the independence assumption is apparently violated in this model, since the possible dependency structures are restricted by the grammar. The probability model is no longer consistent.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML