File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/j90-4003_metho.xml

Size: 7,043 bytes

Last Modified: 2025-10-06 14:12:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="J90-4003">
  <Title>A B C D D A B C D</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TECHNICAL CORRESPONDENCE
PARSING DISCONTINUOUS CONSTITUENTS IN
DEPENDENCY GRAMMAR
</SectionTitle>
    <Paragraph position="0"> Discontinuous constituents--for example, a noun and its modifying adjective separated by words unrelated to them--arc common in variable-word-order languages; Figure I shows examples. But phrase structure grammars, including ID/LP grammars, require each constituent to be a contiguous series of words. Insofar as standard parsing algorithms are based on phrase structure rules, they are inadequate for parsing such languagesJ The algorithm presented here, however, does not require constituents to be continuous, but merely prefers them so.</Paragraph>
    <Paragraph position="1"> It can therefore parse languages in which conventional parsing techniques do not work. At the same time, because of its preference for nearby attachments, it prefers to make constituents continuous when more than one analysis is possible. The new algorithm has been used successfully to parse Russian and Latin (Covington 1988, 1990).</Paragraph>
    <Paragraph position="2"> This algorithm uses dependency grammar. That is, instead of breaking the sentence into phrases and subphrases, it establishes links between individual words. Each link connects a word (the &amp;quot;head&amp;quot;) with one of its &amp;quot;dependents&amp;quot; (an argument or modifier). Figure 2 shows how this works.</Paragraph>
    <Paragraph position="3"> The arrows point from head to dependent; a head can have many dependents, but each dependent can have only one head. Of course the same word can be the head in one link and the dependent in another. 2 Dependency grammar is equivalent to an X-bar theory with only one phrasal bar level (Figure 3)--the dependents of a word are the heads of its sisters. Thus dependency grammar captures the increasingly recognized importance of headship in syntax. At the same time, the absence of phrasal nodes from the dependency representation streamlines the search process during parsing.</Paragraph>
    <Paragraph position="4"> The parser presupposes a grammar that specifies which words can depend on which. In the prototype, the grammar consists of unification-based dependency rules (called</Paragraph>
    <Paragraph position="6"> This rule sanctions a dependency relation between any two words whose features unify with the structures shown--in this case, the verb and its subject in a language such as Russian or Latin. The arrow means &amp;quot;can depend on&amp;quot; and the word order is not specified. X and Y are variables.</Paragraph>
    <Paragraph position="7"> D-rules take the place of the phrase structure rules used by Shieber (1986) and others; semantic information can easily be added to them, and the whole power of unification-based grammar is available.</Paragraph>
    <Paragraph position="8"> The parser accepts words from the input string and keeps track of whether or not each word is &amp;quot;independent&amp;quot; (not yet known to depend on another word), indicated by + or - in  mosl~ recent first, for words that can depend on W. If any are tbund, establish the dependencies and change the marking of the dependents from + to -.</Paragraph>
    <Paragraph position="9"> (21) Search all words so far seen, most recent first, for a worct on which W can depend. If one is found, establish the dependency and mark Was -. Otherwise mark Was +.</Paragraph>
    <Paragraph position="10"> Figure 4 shows the process in action. The first three words, ultima Cumaei venit, are accepted without creating any links. Then the parser accepts iam and makes it depend on venit. Next the parser accepts carminis, on which Cumaei, already in the list, depends. Finally it accepts aetas, which becomes a dependent of venit and the head of ultima and carminis.</Paragraph>
    <Paragraph position="11"> The most-recent-first search order gives the parser its preference for continuous constituents. The search order is significant because it is assumed that the parser can backtrack, i.e., whenever there are alternatives it can back up and try them. This is necessary to avoid &amp;quot;garden paths&amp;quot; such as taking animalia (ambiguously nominative or accusative) to be the subject of animalia vident pueri &amp;quot;boys see animals.&amp;quot; With ordinary sentences, however, backtracking is relatively seldom necessary. Further, there appear to be other constraints on variable word order. Ades and Steedman (1982) propose that all discontinuities can be resolved by a pushdown stack. (For example, pick up ultima, then Cumaei, then put down Cumaei next to carminis, then put down ultima next to aetas. Crossing movements are not permitted.) Moreover, there appears to be an absolute constraint against mixing clauses together? If these hypotheses hold true, the parser can be modified to restrict the search process accordingly.</Paragraph>
    <Paragraph position="12"> Most dependency parsers have followed a &amp;quot;principle of adjacency&amp;quot; that requires every word plus all its direct and indirect dependents to form a contiguous substring (Hays and Ziehe 1960; Starosta and Nomura 1986; Fraser 1989; but not Hellwig 1986 and possibly not J/ippinen et al.</Paragraph>
    <Paragraph position="13"> 1986). This is equivalent to requiring constituents to be continuous. This parser imposes no such requirement. To add the adjacency requirement, one would modify it as follows:  do not (yet) depend on other words.</Paragraph>
    <Paragraph position="14"> skip over an independent word. That is, if an independent word is found that cannot depend on IV, then neither can any earlier independent word.</Paragraph>
    <Paragraph position="15"> (2) When looking for the word on which W depends, consider only the previous word, that word's head, the head's head if any, and so on.</Paragraph>
    <Paragraph position="16"> With these requirements added, the algorithm would be the same as one implemented by Hudson (1989).</Paragraph>
    <Paragraph position="17"> Formal complexity analysis has not been carried out, but my algorithm is simpler, at least conceptually, than the variable-word-order parsers of Johnson (1985), Kashket (1986), and Abramson and Dahl (1989). Johnson's parser and Abramson and Dahl's parser use constituency trees with explicit discontinuity (&amp;quot;tangled trees&amp;quot;), with all their inherent unwieldiness. Kashket's parser, though based on GB theory, is effectively a dependency parser since it relies on case assignment and subcategorization rather than tree structure.</Paragraph>
    <Paragraph position="18">  Robinson 1970, Hudson 1986, Schubert 1987, Mel'~uk 1988, and Starosta 1988. In Hudson's system, a single word can have two heads provided the grammatical relations connecting it to them are distinct. 3. As pointed out by an anonymous reviewer for Computational Linguistics. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML