File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-3007_intro.xml

Size: 8,113 bytes

Last Modified: 2025-10-06 14:03:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-3007">
  <Title>Lexicalising Word Order Constraints for Implemented Linearisation Grammar</Title>
  <Section position="2" start_page="0" end_page="24" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> It is a while since the linearisation technique was introduced into HPSG by Reape (1993; 1994) as a way to overcome the inadequacy of the conventional phrase structure rule based grammars in handling 'freer' word order of languages such as German and Japanese. In parallel in computational linguistics, it has long been proposed that more flexible parsing techniques may be required to adequately handle such languages, but hitherto a practical system using linearisation has eluded large-scale implementation. There are at least two obstacles: its higher computational cost accompanied with non-CFG algorithms it requires, and the difficulty to state word order information succinctly in a grammar that works well with a non-CFG parsing engine.</Paragraph>
    <Paragraph position="1"> In a recent development, the 'cost' issue has been tackled by Daniels and Meurers (2004), who propose to narrow down on search space while using a non-CFG algorithm. The underlying principle is to give priority to the full generative capacity, let the parser overgenerate at default but restrict generation for efficiency thereafter. While sharing this principle, I will attempt to further streamline the computation of linearisation, focusing mainly on the issue of grammar formalism.</Paragraph>
    <Paragraph position="2"> Specifically, I would like to show that the lexicalisation of word order constraints is possible with some conservative modifications to the standard HPSG (Pollard and Sag, 1987; Pollard and Sag, 1994). This will have the benefit of making the representation of linearisation grammar simpler and more parsing friendly than Reape's influential Word Order Domain theory.</Paragraph>
    <Paragraph position="3"> In what follows, after justifying the need for non-CFG parsing and reviewing Reape's theory, I will propose to introduce into HPSG the Word Order Constraint (WOC) feature for lexical heads. I will then describe the parsing algorithm that refers tothis featureto constrainthe searchfor efficiency.</Paragraph>
    <Section position="1" start_page="0" end_page="23" type="sub_section">
      <SectionTitle>
1.1 Limitation of CFG Parsing
</SectionTitle>
      <Paragraph position="0"> One of the main obstacles for CFG parsing is the discontinuity in natural languages caused by 'interleaving' of elements from different phrases (Shieber, 1985). Although there are well-known syntactic techniques to enhance CFG as in GPSG (Gazdar et al., 1985), there remain constructions that show 'genuine' discontinuity of the kind that cannot be properly dealt with by CFG.</Paragraph>
      <Paragraph position="1"> Such 'difficult' discontinuity typically occurs when it is combined with scrambling - another symptomatic phenomenon of free word order languages - of a verb's complements. The following is an example from German, where scrambling and discontinuity co-occur in what is called 'inco- null herent' object control verb construction.</Paragraph>
      <Paragraph position="2"> (1) Ich glaube, dass der Fritz dem Frank</Paragraph>
      <Paragraph position="4"> das Buch zu lesen erlaubt.</Paragraph>
      <Paragraph position="5"> the book(Acc) to read allow 'I think that Fritz allows Frank to read the book'  (1') Ich glaube, dass der Fritz [das Buch] dem Frank [zu lesen] erlaubt Ich glaube, dass dem Frank [das Buch] der Fritz [zu lesen] erlaubt Ich glaube, dass [das Buch] dem Frank der Fritz [zu lesen] erlaubt ...</Paragraph>
      <Paragraph position="6"> Here (1) is in the 'canonical' word order while the examples in (1') are its scrambled variants. In the traditional 'bi-clausal' analysis according to which the object control verb subcategorises for a zu-infinitival VP complement as well as nominal complements, this embedded VP, das Buch zu lesen, becomes discontinuous in the latter examples (in square brackets).</Paragraph>
      <Paragraph position="7"> One CFG response is to use 'mono-clausal' analysis or argument composition(Hinrichs and Nakazawa, 1990), according to which the higher verb and lower verb (in the above example erlauben and zu lesen) are combined to form a single verbal complex, which in turn subcategorises for nominal complements (das Buch, der Fritz and dem Frank). Under this treatment both the verbal complex and the sequence of complements are rendered continuous, rendering all the above examples CFG-parseable.</Paragraph>
      <Paragraph position="8"> However, this does not quite save the CFG parseability, in the face of the fact that you could extrapose the lower V + NP, as in the following. (2) Ich glaube, dass der Fritz dem Frank [erlaubt], das Buch [zu lesen].</Paragraph>
      <Paragraph position="9"> Now we have a discontinuity of 'verbal complex' instead of complements (the now discontinuous verbal complex is marked with square brackets). Thus either way, some discontinuity is inevitable. Such discontinuity is by no means a marginal phenomenon limited to German. Parallel phenomena are observed in the object control verbs in Korean and Japanese ((Sato, 2004) for examples). These languages also show a variety of 'genuine' discontinuity of other sorts, which do not lend itself to a straightforward CFG parsing (Yatabe, 1996). TheCFG-recalcitrant constructions exist in abundance, pointing to an acute need for non-CFG parsing.</Paragraph>
    </Section>
    <Section position="2" start_page="23" end_page="24" type="sub_section">
      <SectionTitle>
1.2 Reape's Word Order Domain
</SectionTitle>
      <Paragraph position="0"> The most influential proposal to accommodate such discontinuity/scrambling in HPSG is Reape's Word Order Domain, or DOM, a feature that constitutes an additional layer separate from the dominance structure of phrases (Reape, 1993; Reape, 1994). DOM encodes the phonologically realised ('linearised') list of signs: the daughter signs of a  phrase in the HD-DTR and NHD-DTRS features are linearly ordered as in Figure 1.</Paragraph>
      <Paragraph position="1"> The feature UNIONED in the daughters indicates whether discontinuity amongst their constituents is allowed. Computationally, the positive ('+') value of the feature dictates (the DOMs of) the daughters to be sequence unioned (represented by the operator (c)) into the mother DOM: details apart, this operation essentially merges two lists in a way that allows interleaving of their elements.</Paragraph>
      <Paragraph position="2"> In Reape's theory, LP constraints come from an entirely different source. There is nothing as yet that blocks, for instance, the ungrammatical zu lesen das Buch VP sequence. The relevant constraint, i.e. COMPS[?]ZU-INF-V in German, is stated in the LP component of the theory. Thus with the interaction of the UNIONED feature and LP statements, the grammar rules out the unacceptable sequences while endorsing grammatical ones such as the examples in (1').</Paragraph>
      <Paragraph position="3"> One important aspect of Reape's theory is that DOM is a list of whole signs rather than of any part of them such as PHON. This is necessitated by the fact that in order to determine how DOM should be constructed, the daughters' internal structure need to be referred to, above all, the UNIONED feature. In other words, the internal features of the daughters must be accessible.</Paragraph>
      <Paragraph position="4"> While this is a powerful system that overcomes the inadequacies of phrase-structure rules, some may feel this is a rather heavy-handed way to solve the problems. Above all, much information is repeated, as all the signs are effectively stated twice, once in the phrase structure and again in DOM. Also, the fact that discontinuity and linear precedence are handled by two distinct mechanisms seems somewhat questionable, as these two factors are computationally closely related. These properties are not entirely attractive features for a computational grammar.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML