File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1027_metho.xml

Size: 9,391 bytes

Last Modified: 2025-10-06 14:11:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1027">
  <Title>A Parametric NL Translator</Title>
  <Section position="2" start_page="0" end_page="124" type="metho">
    <SectionTitle>
2. System Structure
</SectionTitle>
    <Paragraph position="0"> The GBT contains a UG conioonent and two language-specific components, one for English, one for Spardsh. Exploiting the modular nature of GB (see Wehrli 1983), the UG consists of a phrase structure cc~nent based on X-bar syntax (Jackendoff 1977), a transformational component that includes the rule Move Affix (=affix-hopping) and the general rule of Move Alpha (including the subjacency constraint), and a well-formedness component containing constraints on surface representations. The UG, then, is an expression of the theories of X-bar, Case, Theta, Government, Binding, and Bounding (Choms\]oy 1981) (see Fig. i).</Paragraph>
    <Paragraph position="1"> The l~ge-specific ccmi0onents consist of a lexicon and a grammar. The lexicon includes (i) the lexical entries, and (2) tables of inflections and contractions (e.g. Spanish del = de + el). The grammar contains the UG parameters and idiosyncratic transformations. Figure 2 lists the parameters and transformations currently implemented for each language.</Paragraph>
    <Paragraph position="2"> x-bar phrase structure rules  - Doubly-filled CCMP Filter - WH-Filter - Case Filter - Empty Category Principle - Bindin~ Conditions -- Fig. 1 UG  3. Lexicon Dictionary e/itries are represented as Prolog unit clauses: (3) dict (Lang, Word, Cat, Features).</Paragraph>
    <Paragraph position="3"> where 'Lang w is the language~ 'Word' is the lexical unit, 'Cat ~ is the syntactic category, and 'Features' contains a list of morphological and syntactic features along with the transfer value. Sc~e sPS~01e entries are shown below: (4) di~c(e,put,v, \[subcat(\[n,p\]) ,spanish(poner) \]). dict (e,believe,v, \[~0cat (n), sub cat (c), sdel (+), ~nish (o~r) \] )deg dict (e, seem,v, \[subcat (c) ,sdel (+) ,theta(~), spanish (~) \] ).</Paragraph>
    <Paragraph position="4"> dict (s,poner,v, \[subcat ( \[n, p\] ), english (put) \] ). dict (s, c~er,v, \[subcat (n) ,subcat (c),</Paragraph>
    <Paragraph position="6"> ~e verbs ~_t and poner s%Lbcategorize for an NP and pp. Believe subcategorizes for either an NP or a clause, as does its Spanish equivalent, creer. The former has the S-bar Deletion property, while the latter does ~ot, allowing for an infinitival comple= ment for believe but not for creer (see (2) above).</Paragraph>
    <Paragraph position="7"> Seem and parecer do not assign thematic roles to the subject, ir~Lcated by the feature &amp;quot;theta(-)&amp;quot;, thus giving rise to sentences such as It seems John has left/Parece que Juan ha salido. Note that since seeal and parecer also have the &amp;quot;sdel(+)&amp;quot; feature, they beth exhibit subject=raising: John seems to have left/Juan_~ece haber ealido.</Paragraph>
    <Paragraph position="8"> Other features in the lexicon include person, ntm~er, gender, tense, irregular forms (e.g.</Paragraph>
    <Paragraph position="9"> go/went), \[+-wh\], \[+-pronoun\], and \[2anaphor\] (~cmlsky</Paragraph>
    <Paragraph position="11"> The primary l~n~ase struC/:~ure rules are given in (5), using X-bar syntax and written in the gra~ar notation of Clocksin and Mellish (1981):</Paragraph>
    <Paragraph position="13"> complements (L, SUBCATS, PostHD).</Paragraph>
    <Paragraph position="14"> For language L and category C, the x2 rule above constructs a imrse tree containing nodal information (category and what will become phrasal features) followed by the specifier, the X1 sub-structure, and any ~-adjuncts (e.g. PP modifiers). The xl rule parses a pre-adjunct, t/le head, and the complements of the head, the latter tak~ frcm the subcategorization features of tJ~e head. In this way, the parser is head-driven (Proudian and Pollard 1985); the head determines the course of f~irther parsing.</Paragraph>
    <Paragraph position="15"> The rule in (5) is used for all major phrasal categories, i.e. NP,VP,AP,PP, as well as the clausal phrases COMP (=S-bar) ar~ INFL (~) (Stowell 1981).</Paragraph>
    <Paragraph position="16"> (It should be pointed out that (5) reflects head-initial gran~nars. A simple parameter could be inserted to aocommodate head-final grammars, such as for SOV languages like German, but this introduces parsing problems for Prolog. ) As an ex-~r~ole, the following structure is created :for the sente/%ce The man had seen many_~i\[~LS_ from his window, where the symbol $e denotes an empty value, and &amp;quot; &amp;quot; denotes a place-holder for features:  (6) \[\[c,_\], Se, \[ Se, Se, \[\[i,_\], \[\[n,_\], the, \[ Se, man \]\], E Se, Se, \[\[v,_\], had, \[ $e, seen, \[\[n,_\], ~ny, \[ $e, things \] \]\], \[\[p,_\], Se, \[ $e, fz~m, \[\[n,_\], his, \[ Se, window \]\]\]\]\]\]\], \[mode, decl\] \] \]</Paragraph>
  </Section>
  <Section position="3" start_page="124" end_page="125" type="metho">
    <SectionTitle>
5. System Operation
</SectionTitle>
    <Paragraph position="0"> Following the strategy :in (i), the GH9 reads in a sentence (assumed to be grammatically correct), analyzes the ~orphology of each word, and applies the phrase structure rule (5) recursively to build the S-str~Icture. Then, all movement transformations are undone and features percolated to the phrasal node, at whidi time f~iture agreement is ch~-~ed.</Paragraph>
    <Paragraph position="1"> %he result is a D-structure in which all (and only) thematic elements are in e-marked positions, thereby satisfying the Theta Criterion (Chomsky 1981). This also simplifies lexical translation (as opposed to trmlslating ~tween LF representations).</Paragraph>
    <Paragraph position="2">  The transformation stage presents the most interesting aspect of the system, since this is where the principles of UG are applied. (since the ir~t is assumed to be ~tical, it is not tested for well-formedness. ) The high~level Prolog program for this stage is given below:  ecp(L, Sstructure).</Paragraph>
    <Paragraph position="3"> The first action is to transform the D-structure to an S-structure. The &amp;quot;transform,, predicate is called recursively on each cyclic node (~-bar), beginning with the most deeply embedded one. The transformations include those listed in Figure 2 above plus the general transformations of Move Affix and Move Alpha.</Paragraph>
    <Paragraph position="4"> The next step involves checking the resulting S-structure for well-formedness. (Note that the well-formedness conditions could execute in parallel, given appropriate machine architecture.) An S-structure that fails to pass any of the conditions forces backtracking into the ,,transform,, predicate. For example, the clause in (8a), which involves no movement, will be generated, but since John cannot be assigned a grammatical Case, it fails the Case Filter. Backtracking to Move Alpha results in John moving to the non-8-marked subject of seem (8b), resulting in a well-formed structure: (8) a. *It seems John to have left.</Paragraph>
    <Paragraph position="5"> b. John seems t to have left.</Paragraph>
    <Paragraph position="6"> Another example that illustrates how parameters affect generation is given by the Empty Category Principle (ECP), which requires traces to be properly governed. Given the following parameter settings: (9) proper_governor(english,v).</Paragraph>
    <Paragraph position="7"> proper_governor ( english, p ).</Paragraph>
    <Paragraph position="8"> proper_governor (spanish, v).</Paragraph>
    <Paragraph position="9"> proper_governor(spanish, i).</Paragraph>
    <Paragraph position="10"> where the last statement is interpreted as &amp;quot;INFL is a proper governor if it contains the feature \[+tns\]&amp;quot;, the &amp;quot;ecp&amp;quot; statement in (7) will allow preposition-stranding in English but not in Spanish (i0), and allow &amp;quot;that-trace&amp;quot; in Spanish but not in Enqlish (11) : (i0) Which film did they leave after t? After which film did they leave _t ? Despu4s de cuil pelicula salieron ellos _t ? *Cu~l pelicula salieron ellos despu~s de t ? (11) Who seems t to have left? Who does it seem _t has left? *Who does it se~s that t has left? Qui~n parece t haber salido? Qui~n parece que t ha salido? A similar use of parameters controls subjacency within Move Alpha. To show that S-bar is a bounding node in Spanish, Torrego (1984) notes that Verb Preposing ~mlst occur in every clause that contains a wh-phrase or its trace in OCMP. In (12a), the trace of con ~ui~n causes inversion whereas (b) is derived without movement to (~MP, obviating inversion: (12) a. Con qui~n sabia Juan _t que habia hablado Maria t ? b. Con qui~n sabia Juan que Maria habia hablado t ? ('With wh~ did John know that Mary had spoken?') The GBT system operates solely on the basis of syntax. In a more complete translation system, issues of semantics, pragmatics, and discourse must be dealt with, ideally by again assuming general principles subject to parametric variation. Never ~ theless, the current system illustrates the feasibility of a generalized syntactic ~nent in an overall language processing device.</Paragraph>
  </Section>
  <Section position="4" start_page="125" end_page="125" type="metho">
    <SectionTitle>
7. Acknowledgements
</SectionTitle>
    <Paragraph position="0"> This paper is dedicated to the memory of Alfredo Hurtado.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML