File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2226_intro.xml
Size: 5,499 bytes
Last Modified: 2025-10-06 14:06:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2226"> <Title>Translating Idioms</Title> <Section position="3" start_page="0" end_page="1389" type="intro"> <SectionTitle> 2 Compounds and idioms </SectionTitle> <Paragraph position="0"> A two-way partition of MWEs in (i) compounds and (ii) idioms is both convenient and theoretically well-motivated 2. Compounds are defined as MWEs of Xdeg-level (ie. word level), in which the chunks are adjacent, as exemplified in (1), while &quot;idiomatic expressions&quot; correspond to MWEs of phrasal level, where chunks may not be adjacent, and may undergo various syntactic operations, as exemplified in (2-3).</Paragraph> <Paragraph position="1"> (1)a. pomme de terre 'potato' b. ~ cause de 'because of' c. d~s lors que 'as soon as' The compounds given in (1) function, respectively, as noun, preposition and conjunction. They correspond to a single unit, both syntactically and semantically. In contrast, idiomatic expressions do not generally constitute fixed, closed syntactic units. They do, however, behave as semantic units. For instance the complex syntactic expression casser du sucre sur le dos de quelqu'un, literally break some sugar on ~This distinction between compounds and idioms is also discussed in Wehrli (1997) somebody's back is essentially synonymous with criticize.</Paragraph> <Paragraph position="2"> (2)a. Jean a forc~ la main ~ Luc.</Paragraph> <Paragraph position="3"> Jean has forced the hand to Luc 'Jean twisted Luc's hand' b. C'est ~ Luc que Jean a forc~ la main.</Paragraph> <Paragraph position="4"> It is to Luc that Jean has forced the hand 'It is Luc's hand that Jean has twisted' c. C'est & Luc que Paul pretend que Jean a voulu forcer la main.</Paragraph> <Paragraph position="5"> It is to Luc that Paul claims that Jean has wanted to force the hand 'It is Luc's hand that Paul claims that Jean has wanted to force' d. La main semble lui avoir ~t~ un peu forc~e.</Paragraph> <Paragraph position="6"> The hand hand seems to him to have been a little forced 'His hand seems to have been somewhat twisted' The idiom illustrated in (2) is typical of a very large class of idioms based on a verbal head. Syntactically, such idioms correspond to verb phrases, with a fixed direct object argument (la main, in our example) and an open indirect object argument. Notice that this verb phrase is completely regular in its syntactic behaviour. In particular, it can can undergo syntactic operations such as adverbial modification, raising, passive, dislocation, etc., as examplified in (2b-d).</Paragraph> <Paragraph position="7"> With example (3), we have a much less common pattern, since the subject argument of the verb constitutes a chunk of the expression. Here, again, various operations are possible, in- null cluding passive and raising ~ (3)a. Quelle mouche a piqu~ Paul? 'What has gotten to Paul?' b. Quelle mouche semble l'avoir pique? 'What seems to have gotten to him' c. Je me demande par quelle mouche Paul a ~t~ pique.</Paragraph> <Paragraph position="8"> 'I wonder what's gotten to him' 3Another interesting example of idiom with fixed sub-ject is la moutarde monte au nez de NP (&quot;NP looses his temper&quot;), discussed in Abeille and Schabes (1989). The extent to which expressions can undergo modifications and other syntactic operations can vary tremendously from one expression to the next, and in the absence of a general explanation for this fact, each expression must be recorded with the llst of its particular properties and constraints 4.</Paragraph> <Paragraph position="9"> Given the categorial distinction (X deg vs. XP) and other fundamental differences sketched above, compounds and idioms are treated very differently in our system. Compounds are simply listed in the lexicon as complex lexical units. As such, their identification belongs to the lexical analysis component. Once a compound has been recognized, its treatment in the ITS-2 system does not differ in any interesting way from the treatment of simple words.</Paragraph> <Paragraph position="10"> While idiomatic expressions must also be listed in the lexicon, their entries are far more complex than the ones of simple or compound words (cf. section 3.2). As for their identification, it turns out to be a rather complex operation, which cannot be reliably carried out at a superficial level of representation. As we saw in the above examples, idiom chunks can be found far away from the (verbal) head with which they constitute an expression; they can also be modified in various ways, and so on. Preprocessing idioms, for instance during the lexical analysis, might therefore lead to lengthy, inefficient or unreliable treatments. We will argue that in order to drastically simplify the task of identifying idioms, it is necessary to undo whatever syntactic operations they might have undergone. To put it differently, idioms can best be recognized on the basis of a normalized structure, a structure in which constituents occur in their canonical position. In a generative grammar framework, normalized structures correspond to D-structure representations. At that level, for instance, the four sentences in (2), share the common structure in (4).</Paragraph> <Paragraph position="11"> (4) ... \[ Vp forcer \[ DP la main\] \[ pp/t X\] \] As we will show in the next section, our treat- null the drastic normalization process that our GB-based parser carries out.</Paragraph> </Section> class="xml-element"></Paper>