File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1020_metho.xml
Size: 20,948 bytes
Last Modified: 2025-10-06 14:10:17
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1020"> <Title>Morphology-Syntax Interface for Turkish LFG &quot;</Title> <Section position="5" start_page="153" end_page="153" type="metho"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> G&quot;ung&quot;ord&quot;u and Oflazer (1995) describes a rather extensive grammar for Turkish using the LFG formalism. Although this grammar had a good coverage and handled phenomena such as freeconstituent order, the underlying implementation was based on pseudo-unification. But most crucially, it employed a rather standard approach to represent the lexical units: words with multiple nested derivations were represented with complex nested feature structures where linguistically relevant information could be embedded at unpredictable depths which made access to them in rules extremely complex and unwieldy.</Paragraph> <Paragraph position="1"> Bozs,ahin (2002) employed morphemes overtly as lexical units in a CCG framework to account for a variety of linguistic phenomena in a prototype implementation. The drawback was that morphotactics was explicitly raised to the level of the sentence grammar, hence the categorial lexicon accounted for both constituent order and the morpheme order with no distinction. Oflazer's dependency parser (2003) used IGs as units between which dependency relations were established. Another parser based on IGs is EryiVgit and Oflazer's We use this term to distinguish the lexicon used by the morphological analyzer.</Paragraph> <Paragraph position="2"> (2006) statistical dependency parser for Turkish.</Paragraph> <Paragraph position="3"> C, akici (2005), used relations between IG-based representations encoded within the Turkish Tree-bank (Oflazer et al., 2003) to automatically induce a CCG grammar lexicon for Turkish.</Paragraph> <Paragraph position="4"> In a more general setting, Butt and King (2005) have handled the morphological causative in Urdu as a separate node in c-structure rules using LFG's restriction operator in semantic construction of causatives. Their approach is quite similar to ours yet differs in an important way: the rules explicitly use morphemes as constituents so it is not clear if this is just for this case, or all morphology is handled at the syntax level.</Paragraph> </Section> <Section position="6" start_page="153" end_page="153" type="metho"> <SectionTitle> 3 Inflectional Groups as Sublexical Units </SectionTitle> <Paragraph position="0"> Turkish is an agglutinative language where a sequence of inflectional and derivational morphemes get affixed to a root (Oflazer, 1994). At the syntax level, the unmarked constituent order is SOV, but constituent order may vary freely as demanded by the discourse context. Essentially all constituent orders are possible, especially at the main sentence level, with very minimal formal constraints.</Paragraph> <Paragraph position="1"> In written text however, the unmarked order is dominant at both the main sentence and embedded clause level.</Paragraph> <Paragraph position="2"> Turkish morphotactics is quite complicated: a given word form may involve multiple derivations and the number of word forms one can generate from a nominal or verbal root is theoretically infinite. Turkish words found in typical text average about 3-4 morphemes including the stem, with an average of about 1.23 derivations per word, but given that certain noninflecting function words such as conjuctions, determiners, etc. are rather frequent, this number is rather close to 2 for inflecting word classes. Statistics from the Turkish Treebank indicate that for sentences ranging between 2 words to 40 words (with an average of about 8 words), the number of IGs range from 2 to 55 IGs (with an average of 10 IGs per sentence) (EryiVgit and Oflazer, 2006).</Paragraph> <Paragraph position="3"> The morphological analysis of a word can be represented as a sequence of tags corresponding to the morphemes. In our morphological analyzer output, the tag ^DB denotes derivation boundaries that we also use to define IGs. If we represent the morphological information in Turkish in the following general form:</Paragraph> <Paragraph position="5"> denotes the relevant sequence of inflectional features including the part-of-speech for the root (in IG</Paragraph> </Section> <Section position="7" start_page="153" end_page="155" type="metho"> <SectionTitle> BD </SectionTitle> <Paragraph position="0"> ) and for any of the derived forms.</Paragraph> <Paragraph position="1"> A given word may have multiple such representations depending on any morphological ambiguity brought about by alternative segmentations of the kitaplarimdaki hikayeler word, and by ambiguous interpretations of morphemes. null For instance, the morphological analysis of the derived modifier cezalandirilacak (literally, &quot;(the one) that will be given punishment&quot;) The first IG indicates that the root is a singular noun with nominative case marker and no possessive marker. The second IG indicates a derivation into a verb whose semantics is &quot;to acquire&quot; the preceding noun. The third IG indicates that a causative verb (equivalent to &quot;to punish&quot; in English), is derived from the previous verb. The fourth IG indicates the derivation of a passive verb with positive polarity from the previous verb. Finally the last IG represents a derivation into future participle which will function as a modifier in the sentence.</Paragraph> <Paragraph position="2"> The simple phrase eski kitaplarimdaki hikayeler (the stories in my old books) in Figure 1 will help clarify how IGs are involved in syntactic relations: Here, eski (old) modifies kitap (book) and not hikayeler (stories), and the locative phrase eski The morphological features other than the obvious part-of-speech features are: +A3sg: 3sg number-person agreement, +Pnon: no possesive agreement, +Nom: Nominative case, +Acquire: acquire verb, +Caus: causative verb, +Pass: passive verb, +FutPart: Derived future participle, +Pos: Positive Polarity.</Paragraph> <Paragraph position="3"> Though looking at just the last POS of the words one sees an +Adj +Adj +Noun sequence which may imply that both adjectives modify the noun hikayeler kitaplarimda (in my old books) modifies hikayeler with the help of derivational suffix -ki. Morpheme boundaries are represented by '+' sign and morphemes in solid boxes actually define one IG. The dashed box around solid boxes is for word boundary. As the example indicates, IGs may consist of one or more morphemes.</Paragraph> <Paragraph position="4"> Example (2) shows the corresponding f-structure for this NP. Supporting the dependency representation in Figure 1, f-structure of adjective eski is placed as the adjunct of kitaplarimda,at the innermost level. The semantics of the relative suffix -ki is shown as 'relCWAY OBJCX' where the f-structure that represents the NP eski kitaplarimda is the OBJ of the derived adjective. The new f-structure with a PRED constructed on the fly, then modifies the noun hikayeler. The derived adjective behaves essentially like a lexical adjective. The effect of using IGs as the representative units can be explicitly seen in c-structure where each IG corresponds to a separate node as in Example (3). Here, DS stands for derivational suffix.</Paragraph> <Paragraph position="5"> a more complex example given in Example (4) where we observe a chain/hierarchy of relations Note that placing the sublexical units of a word in separate nodes goes against the Lexical Integrity principle of LFG (Dalrymple, 2001). The issue is currently being discussed within the LFG community (T. H. King, personal communication). null stituent structure (c-structure) and the corresponding feature structure (f-structure) for this noun phrase. Within the tree representation, each IG corresponds to a separate node. Thus, the LFG grammar rules constructing the c-structures are coded using IGs as units of parsing. If an IG contains the root morpheme of a word, then the node corresponding to that IG is named as one of the syntactic category symbols. The rest of the IGs are given the node name DS (to indicate derivational suffix), no matter what the content of the IG is.</Paragraph> <Paragraph position="6"> The semantic representation of derivational suffixes plays an important role in f-structure construction. In almost all cases, each derivation that is induced by an overt or a covert affix gets a OBJ feature which is then unified with the f-structure of the preceding stem already constructed, to obtain the feature structure of the derived form, with the PRED of the derived form being constructed on the fly. A PRED feature thus constructed however is not meant to necessarily have a precise lexical semantics. Most derivational suffixes have a consistent (lexical) semantics , but some don't, that is, the precise additional lexical semantics that the derivational suffix brings in, depends on the stem it is affixed to. Nevertheless, we represent both cases in the same manner, leaving the determination of the precise lexical semantics aside. If we consider Figure 2 in terms of dependency relations, the adjective mavi (blue) modifies the noun renk (color) and then the derivational suffix -li (with) kicks in although the -li is attached to renk only. Therefore, the semantics of the phrase should be with(blue color), not blue with(color). With the approach we take, this difference can easily be represented in both the f-structure as in the leftmost branch in Example (5) e.g., the &quot;to acquire&quot; example earlier and the c-structure as in the middle ADJUNCT f-structure in Example (6). Each DS in c-structure gives rise to an OBJject in c-structure. More precisely, a derived phrase is always represented as a binary tree where the right daughter is always a DS. In f-structure DS unifies with the mother f-structure and inserts PRED feature which subcategorizes for a OBJ. The left daughter of the binary tree is the original form of the phrase that is derived, and it unifies with the OBJ of the mother f-structure.</Paragraph> </Section> <Section position="8" start_page="155" end_page="158" type="metho"> <SectionTitle> 4 Inflectional Groups in Practice </SectionTitle> <Paragraph position="0"> We have already seen how the IGs are used to construct on the fly PRED features that reflect the lexical semantics of the derivation. In this section we describe how we handle phenomena where the derivational suffix in question does not explicitly affect the semantic representation in PRED feature but determines the semantic role so as to unify the derived form or its components with the appropriate external f-structure.</Paragraph> <Section position="1" start_page="155" end_page="157" type="sub_section"> <SectionTitle> 4.1 Sentential Complements and Adjuncts, and Relative Clauses </SectionTitle> <Paragraph position="0"> In Turkish, sentential complements and adjuncts are marked by productive verbal derivations into nominals (infinitives, participles) or adverbials, while relative clauses with subject and non-subject (object or adjunct) gaps are formed by participles which function as adjectivals modifying a head noun.</Paragraph> <Paragraph position="1"> Example (7) shows a simple sentence that will be used in the following examples.</Paragraph> <Paragraph position="2"> Once the grammar encounters such a sentential complement, everything up to the participle IG is parsed, as a normal sentence and then the participle IG appends nominal features, e.g., CASE, to the existing f-structure. The final f-structure is for a noun phrase, which now is the object of the matrix verb, as shown in Example (9). Since the participle IG has the right set of syntactic features of a noun, no new rules are needed to incorporate the derived f-structure to the rest of the grammar, that is, the derived phrase can be used as if it is a simple NP within the rules. The same mechanism is used for all kinds of verbal derivations into infinitives, adverbial adjuncts, including those derivations encoded by lexical reduplications identified Relative clauses also admit to a similar mechanism. Relative clauses in Turkish are gapped sentences which function as modifiers of nominal heads. Turkish relative clauses have been previously studied (Barker et al., 1990; G&quot;ung&quot;ord&quot;u and Engdahl, 1998) and found to pose interesting issues for linguistic and computational modeling.</Paragraph> <Paragraph position="3"> Our aim here is not to address this problem in its generality but show with a simple example, how our treatment of IGs encoding derived forms handle the mechanics of generating f-structures for such cases.</Paragraph> <Paragraph position="4"> Kaplan and Zaenen (1988) have suggested a general approach for handling long distance dependencies. They have extended the LFG notation and allowed regular expressions in place of simple attributes within f-structure constraints so that phenomena requiring infinite disjunctive enumeration can be described with a finite expression. We basically follow this approach and once we derive the participle phrase we unify it with the appropriate argument of the verb using rules based on functional uncertainty. Example (10) shows a relative clause where a participle form is used as a modifier of a head noun, adam in this case.</Paragraph> <Paragraph position="5"> 'The man the grocer said the girl called' This time, the sentence is parsed with a gap with an appropriate functional uncertainty constraint, and when the participle IG is encountered the sentence f-structure is derived into an adjective and the gap in the derived form, the object here, is then unified with the head word as marked with co-indexation in Example (11).</Paragraph> <Paragraph position="6"> The example sentence (10) includes Example (8) as a relative clause with the object extracted, hence the similarity in the f-structures can be observed easily. The ADJUNCT in Example (11) is almost the same as the whole f-structure of Example (9), differing in TNS-ASP and ADJUNCT-TYPE features. At the grammar level, both the relative clause and the complete sentence is parsed with the same core sentence rule. To understand whether the core sentence is a complete sentence or not, the finite verb requirement is checked.</Paragraph> <Paragraph position="7"> Since the requirement is met by the existence of TENSE feature, Example (8) is parsed as a complete sentence. Indeed the relative clause also includes temporal information as 'pastpart' value of PART feature, of the ADJUNCT f-structure, denoting a past event.</Paragraph> </Section> <Section position="2" start_page="157" end_page="158" type="sub_section"> <SectionTitle> 4.2 Causatives </SectionTitle> <Paragraph position="0"> Turkish verbal morphotactics allows the production multiply causative forms for verbs.</Paragraph> <Paragraph position="1"> Such verb formations are also treated as verbal derivations and hence define IGs. For instance, the morphological analysis for the verb aradi (s/he called) is ara+Verb+Pos+Past+A3sg and for its causative aratti (s/he made (someone else) call) the analysis is ara+Verb^DB+Verb+Caus+Pos+Past+A3sg. In Example (12) we see a sentence and its causative form followed by respective f-structures for these sentences in Examples (13) and (14). The detailed morphological analyses of the verbs are given to emphasize the morphosyntactic relation between the bare and causatived versions of the verb.</Paragraph> <Paragraph position="2"> Passive, reflexive, reciprocal/collective verb formations are also handled in morphology, though the latter two are not productive due to semantic constraints. On the other hand it is possible for a verb to have multiple causative markers, though in practice 2-3 seem to be the maximum observed. The end-result of processing an IG which has a verb with a causative form is to create a larger f-structure whose PRED feature has a SUBJect, an OBJect and a XCOMPlement. The f-structure of the first verb is the complement in the f-structure of the causative form, that is, its whole structure is embedded into the mother f-structure in an encapsulated way. The object of the causative (causee - that who is caused by the causer - the sub-ject of the causative verb) is unified with the sub-ject the inner f-structure. If the original verb is transitive, the object of the original verb is further unified with the OBJTH of the causative verb. All of grammatical functions in the inner f-structure, namely XCOMP, are also represented in the mother f-structure and are placed as arguments of caus since the flat representation is required to enable free word order in sentence level. Though not explicit in the sample f-structures, the important part is unifying the object and former subject with appropriate case markers, since the functions of the phrases in the sentence are decided with the help of case markers due to free word order. If the verb that is causativized subcategorizes for an direct object in accusative case, after causative formation, the new object unified with the subject of the causativized verb should be in dative case (Example 15). But if the verb in question subcategorizes for a dative or an ablative oblique object, then this object will be transformed into a direct object in accusative case after causativization (Example 16). That is, the causativation will select the case of the object of the causative verb, so as not to &quot;interfere&quot; with the object of the verb that is causativized. In causativized intransitive verbs the causative object is always in accusative case.</Paragraph> <Paragraph position="3"> '(s/he) made the man hit the woman' All other derivational phenomena can be solved in a similar way by establishing the appropriate semantic representation for the derived IG and its effect on the semantic representation.</Paragraph> </Section> </Section> <Section position="9" start_page="158" end_page="158" type="metho"> <SectionTitle> 5 Current Implementation </SectionTitle> <Paragraph position="0"> The implementation of the Turkish LFG grammar is based on the Xerox Linguistic Environment (XLE) (Maxwell III and Kaplan, 1996), a grammar development platform that facilitates the integration of various modules, such as tokenizers, finite-state morphological analyzers, and lexicons. We have integrated into XLE, a series of finite state transducers for morphological analysis and for multi-word processing for handling lexicalized, semi-lexicalized collocations and a limited form of non-lexicalized collocations.</Paragraph> <Paragraph position="1"> The finite state modules provide the relevant ambiguous morphological interpretations for words and their split into IGs, but do not provide syntactically relevant semantic and subcategorization information for root words. Such information is encoded in a lexicon of root words on the grammar side.</Paragraph> <Paragraph position="2"> The grammar developed so far addresses many important aspects ranging from free constituent order, subject and non-subject extractions, all kinds of subordinate clauses mediated by derivational morphology and has a very wide coverage NP subgrammar. As we have also emphasized earlier, the actual grammar rules are oblivious to the source of the IGs, so that the same rule handles an adjective - noun phrase regardless of whether the adjective is lexical or a derived one. So all such relations in Figure 2 are handled with the same phrase structure rule.</Paragraph> <Paragraph position="3"> The grammar is however lacking the treatment of certain interesting features of Turkish such as suspended affixation (Kabak, 2007) in which the inflectional features of the last element in a co-ordination have a phrasal scope, that is, all other Except the last one which requires some additional treatment with respect to definiteness.</Paragraph> <Paragraph position="4"> coordinated constituents have certain default features which are then &quot;overridden&quot; by the features of the last element in the coordination. A very simple case of such suspended affixation is exemplified in (17a) and (17b). Note that although this is not due to derivational morphology that we have emphasized in the previous examples, it is due to a more general nature of morphology in which af- null 'the girl called the man and the woman' Suspended affixation is an example of a phenomenon that IGs do not seem directly suitable for. The unification of the coordinated IGs have to be done in a way in which non-default features of the final constituent is percolated to the upper node in the tree as is usually done with phrase structure grammars but unlike coordination is handled in such grammars.</Paragraph> </Section> class="xml-element"></Paper>