File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1104_metho.xml

Size: 19,624 bytes

Last Modified: 2025-10-06 14:07:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1104">
  <Title>From Shallow to Deep Parsing Using Constraint Satisfaction</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Property Grammars
</SectionTitle>
    <Paragraph position="0"> The notion of constraints is of deep importance in linguistics, see for example Maruyama (1990), Pollard (1994), Sag (1999). Recent theories (from the constraint-based paradigm to the principle and parameters one) rely on this notion. One of the main interests in using constraints comes from the fact that it becomes possible to represent any kind of information (very general as well as local or contextual one) by means of a unique device. We present in this section a formalism, called Property Grammars, described in Bes (1999) or Blache (2001), that makes it possible to conceive and represent all linguistic information in terms of constraints over linguistic objects. In this approach, constraints are seen as relations between two (or more) objects: it is then possible to represent information in a flat manner. The first step in this work consists in identifying the relations usually used in syntax.</Paragraph>
    <Paragraph position="1"> This can be done empirically and we suggest, adapting a proposal from Bes (1999), the set of following constraints: linearity, dependency, obligation, exclusion, requirement and uniqueness. In a phrase-structure perspective all these constraints participate to the description of a phrase. The following figure roughly sketches their respective roles, illustrated with some examples for the NP.</Paragraph>
    <Paragraph position="2">  repeated in a phrase In this approach, describing a phrase consists in specifying a set of constraints over some categories that can constitute it. A constraint is specified as follows. Let R a symbol representing a constraint relation between two (sets of) categories. A constraint of the form a R b stipulates that if a and b are realized, then the constraint a R b must be satisfied. The set of constraints describing a phrase can be represented as a graph connecting several categories.</Paragraph>
    <Paragraph position="3"> The following example illustrates some constraints for the NP.</Paragraph>
    <Paragraph position="4">  In this description, one can notice for example a requirement relation between the common noun and the determiner (such a constraint implements the complementation relation) or some exclusion that indicate cooccurrency restriction between a noun and a pronoun or a proper noun and a determiner. One can notice the use of sub-typing: as it is usually the case in linguistic theories, a category has several properties that can be inherited when the description of the category is refined (in our example, the type noun has two sub-types, proper and common represented in feature based notation). All constraints involving a noun also hold for its sub-types. Finally, the dependency relation, which is a semantic one, indicates that the dependent must combine its semantic features with the governor. In the same way as HPSG does now with the DEPS feature as described in Bouma (2001), this relation concerns any category, not necessarily the governed ones. In this way, the difference between a complement and an adjunct is that only the complement is selected by a requirement constraint, both of them being constrained with a dependency relation. This also means that a difference can be done between the syntactic head (indicated by the oblig constraint) and the semantic one (the governor of the dependency relation), even if in most of the cases, these categories are the same. Moreover, one can imagine the specification of dependencies within a phrase between two categories other than the head.</Paragraph>
    <Paragraph position="5"> One of the main advantages in this approach is that constraints form a system and all constraints are at the same level. At the difference of other approaches as Optimality Theory, presented in Prince (1993), there exists no hierarchy between them and one can choose, according to the needs, to verify the entire set of constraints or a subpart of it. In this perspective, using a constraint satisfaction technique as basis for the parsing strategy makes it possible to implement the possibility of verifying only a subpart of this constraint system. What is interesting is that some constraints like linearity provide indications in terms of boundaries, as described for example in Blache (1990). It follows that verifying this subset of constraints can constitute a bracketing technique. The verification of more constraints in addition to linearity allows to refine the parse. In the end, the same parsing technique (constraint satisfaction) can be used both for shallow and deep parsing. More precisely, using the same linguistic resources (lexicon and grammar), we propose a technique allowing to choose the granularity of the parse. 2 Two techniques for parsing</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Property Grammars
</SectionTitle>
      <Paragraph position="0"> We describe in this paper different parsing techniques, from shallow to deep one, with this originality that they rely on the same formalism, described in the previous section. In other words, in our approach, one can choose the granularity level of the parse without modifying linguistic resources</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Shallow parsing
</SectionTitle>
      <Paragraph position="0"> In this technique, we get hierarchical and grammatical information while preserving robustness and efficiency of the processing. In this perspective, we make use of a grammar represented in the Property Grammar formalism described above. One of the main interests of this formalism is that it doesn't actually make use of the grammaticality notion, replacing it with a more general concept of characterization.</Paragraph>
      <Paragraph position="1"> A characterization simply consists in the set of the constraint system after evaluation (in other words the set of properties that are satisfied and the set of properties that are not satisfied). A characterization only formed with satisfied properties specifies a grammatical structure. In this sense, characterization subsumes grammaticality. It becomes then possible to propose a description in terms of syntactic properties for any kind of input (grammatical or not). Opening and closing chunks relies here on information compiled from the grammar. This information consists in the set of left and right potential corners, together with the potential constituents of chunks. It is obtained in compiling linear precedence, requirement and exclusion properties described in the previous sections together with, indirectly, that of constituency.</Paragraph>
      <Paragraph position="2"> The result is a compiled grammar which is used by the parser. Two stacks, one of opened categories and a second of closed categories, are completed after the parse of each new word: we can open new categories or close already opened ones, following some rules. This algorithm being recursive, the actions opening, continuing and closing are recursive too. This is the reason why rules must have a strict definition in order to be sure that the algorithm is deterministic and always terminates. This shallow parsing technique can be seen as a set of  production/reduction/cutting rules.</Paragraph>
      <Paragraph position="3"> * Rule 1: Open a phrase p for the current category c if c can be the left corner of p.</Paragraph>
      <Paragraph position="4"> * Rule 2: Do not open an already opened category if it belongs to the current phrase or is its right corner. Otherwise, we can reopen it if the current word can only be its left corner.</Paragraph>
      <Paragraph position="5"> * Rule 3: Close the opened phrases if the more recently opened phrase can neither continue one of them nor be one of their right corner.</Paragraph>
      <Paragraph position="6"> * Rule 4: When closing a phrase, apply rules 1, 2</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Deep parsing
</SectionTitle>
      <Paragraph position="0"> Deep analysis is directly based on property grammars. It consists, for a given sentence, in building all the possible subsets of juxtaposed elements that can describe a syntactic category.</Paragraph>
      <Paragraph position="1"> A subset is positively characterized if it satisfies the constraints of a grammar. These subsets are called edges, they describe a segment of the sentence between two positions.</Paragraph>
      <Paragraph position="2"> At the first step, each lexical category is considered as an edge of level 0. The next phase consists in producing all the possible subsets of edges at level 0. The result is a set of edges of level 1. The next steps work in the same way and produce all the possible subsets of edges, each step corresponding to a level. The algorithm ends when no new edge can be built.</Paragraph>
      <Paragraph position="3"> An edge is characterized by: * an initial and a final position in the sentence, * a syntactic category, * a set of syntactic features * a set of constituents: a unique lexical constituent at the level 0, and one or several edges at the other levels.</Paragraph>
      <Paragraph position="4"> After parsing, a sentence is considered as grammatical if at least one edge covering completely the sentence and labelled by the category S is produce. But even for ungrammatical cases, the set of edges represents all possible interpretations of the sentence: the set of edges contains the set of constraints that describe the input. By another way, in case of ambiguity, the parser generates several edges covering the same part and labelled with the same category. Such similar edges are distinct by their syntactical features (in the case of an ambiguity of features) or by their different constituents (typically an ambiguity of attachment).</Paragraph>
      <Paragraph position="5"> Several heuristics allow to control the algorithm. For example, an edge at level n must contain at least an edge at level n-1. Indeed, if it would contain only edges at levels lower than n-1, it should have been already produced at the level n-1.</Paragraph>
      <Paragraph position="6"> The parse ends in a finite number of steps at the following conditions: * if the number of syntactic categories of the grammar is finite, * if the grammar does not contain a loop of production. We call loop of production, the eventuality that a category c1 can be constituted by an unique category c2, itself constituted by an unique category c3 and so until cn and that one of category c2 to cn can be constituted by the unique category c1.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Compared complexity
</SectionTitle>
    <Paragraph position="0"> Of course, the difference of granularity of these algorithms does have a cost which has to be known when choosing a technique.</Paragraph>
    <Paragraph position="1"> In order to study their complexity, we parsed a french corpus of 13,236 sentences (from the newspaper Le Monde), tagged by linguists (the CLIF project, headed by Talana).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Shallow parsing with Chinks and
Chunks
</SectionTitle>
      <Paragraph position="0"> With the aim of comparing our techniques, we first built a simple robust chunker. This quick program gives an idea of a bottom complexity for the two techniques based on property grammars. This algorithm relies on the Liberman and Church's Chink&amp;Chunk technique (see Liberman &amp; Church (1992)) and on Di Cristo's chunker (see Di Cristo (1998) and DiCristo &amp; al (2000)). Its mechanism consists in segmenting the input into chunks, by means of a finite-state automaton making use of function words as block borders. An improvement of the notion of chunk is implemented, using conjunctions as neutral elements for chunks being built. This algorithm constitutes an interesting (and robust) tool for example as basis for calculating prosodic units in a Text-to-Speech Synthesizer. Chink/Chunk algorithm is a simple but efficient way to detect syntactic boundaries. In the average, best and worst cases, for M sentences, each sentence consisting of Nw words, its complexity has an order of M*Nw*Constant. That is to say a linear complexity.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Instructions / number of words
for Chink &amp; Chunk (logarithmic scale)
3.2 Shallow parsing with PG
</SectionTitle>
      <Paragraph position="0"> With the shallow parser algorithm, we can detect and label more syntactic and hierarchic data: in the average, worst and best cases, for M sentences, each sentence consisting of Nw words; for a set of C precompiled categories, its complexity has an order of M*C*(Nw2+Nw)*Constant. That is to say a polynomial complexity.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Deep parsing with PG
</SectionTitle>
      <Paragraph position="0"> For the evaluation of the deep parser algorithm, we parsed a corpora of 620 sentences of the same corpus. Unlike the two previous algorithms, the dispersal of results is much more important.</Paragraph>
      <Paragraph position="1"> Million instructions / number of words for Deep Parser (logarithmic scale) In the theory, the algorithm is of exponential type but its progress is permanently constrained by the grammar. This control being heavily dependent from the grammatical context, the number of instructions necessary to parse two same size sentences can be very different.</Paragraph>
      <Paragraph position="2"> Nevertheless, in the reality of a corpus, the average complexity observed is of polynomial type. So, if Nw is the number of words of a sentence, the best estimate complexity of its parse corresponds to a polynomial of order 2.4 (Nw2.4*Constant).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Remarks on complexity
</SectionTitle>
      <Paragraph position="0"> Our study considers the parser complexity as a function of two parameters: - the size of the parsed sentence, - the number of grammatical categories.</Paragraph>
      <Paragraph position="1"> This complexity is relies on the number of &amp;quot;simple instructions&amp;quot; treated by the programs. Comparing the average complexity of each parser is then a good way to know which one is faster. Ranking techniques can then be extracted from the results. It would have been interesting to compare them in terms of maximal complexity, but this is not actually possible because of an important difference between the two first parsers which are deterministic, and the last one which is not: - for the first two techniques, the minimal, average and maximal complexities are polynomial, - the deep parser has an exponential maximal complexity and polynomial minimal and average complexities.</Paragraph>
      <Paragraph position="2"> Moreover, the study of the maximal complexity of the deep parser has to be treated as another problem. Usually, such a study must have to be done taking care of the size of the grammar. But with property grammars, other parameters have to be used: a property grammar is a set of constraints (Linearity, Dependency, Obligation, Exclusion, Requirement, Uniqueness) belonging to two different groups. In a formal terminology, these groups are &amp;quot;reduction constraints&amp;quot; and &amp;quot;increasing constraints&amp;quot;. These groups characterize the behavior of the parser, as a formal system would do for a recognition problem. &amp;quot;Increasing constraints&amp;quot; allow the instanciation of the search space, where &amp;quot;reduction constraints&amp;quot; allow pruning this space. Most of the sentences are fastly parsed because of their well-formedness: the reduction constraints are more frequently used than increasing ones in such sentences. Ambiguous and ill-formed sentences require a greater use of increasing constraints.</Paragraph>
      <Paragraph position="3"> Thus, the size of the grammar is less informative about the theoretical complexity than the relative importance of increasing and reduction constraints: for instance a greater grammar, with  more reduction constraints would have a lower theoretical complexity. The study of such a problem does not belong to this study because it would lead us to a different experimentation.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Different results
</SectionTitle>
    <Paragraph position="0"> Our parsers demonstrate the possibility of a variable granularity within a same approach. We illustrate in this section the lacks and assets of the different techniques with the example below (in French): &amp;quot;Le compositeur et son librettiste ont su creer un equilibre dramatique astucieux en mariant la comedie espiegle voire egrillarde et le drame le plus profond au coeur des memes personnages.&amp;quot; &amp;quot;The composer and his librettist successfully introduced an astute dramatic balance in marrying the mischievous, ribald comedy with the deepest drama for the same characters.&amp;quot;</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Chink/chunk approach
</SectionTitle>
      <Paragraph position="0"> [(sentence) [(chunk)Le compositeur et son librettiste ont su creer] [(chunk)un equilibre dramatique astucieux] [(chunk)en mariant] [(chunk)la comedie espiegle] [(chunk)voire egrillarde] [(chunk)et le drame] [(chunk)le plus profond] [(chunk)au coeur des memes personnages]] This first example shows a non-hierarchical representation of the sentence, divided into chunks. No linguistic information is given.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Shallow parsing approach
</SectionTitle>
      <Paragraph position="0"> This second example gives a hierarchical representation of the sentence, divided into grammatically tagged chunks. Because we used a precompiled version of the grammar (shortened) and because we forced some syntactic choices in order to keep a determinist and finishing parsing, it appears that some errors have been made by the shallow parser: Conjunctions are (badly) distinguished as Adverbial Phrases. In spite of these gaps, cutting is improved and most of the categories are detected correctly.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Deep parsing approach
</SectionTitle>
      <Paragraph position="0"> The last example (next figure) presents two of the maximum coverages produced by the deep parser. This figure, which illustrates the PP attachment ambiguity, only presents for readabiulity reasons the hierarchical structure.</Paragraph>
      <Paragraph position="1"> However, remind the fact that each label represents in fact a description which the state of the constraint system after evaluation.</Paragraph>
      <Paragraph position="2"> Le compositeur et son librettiste ont su creer un equilibre dramatique astucieux en mariant la comedie espiegle voire egrillarde et le drame le plus profond au_coeur des memes personnages</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML