File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2013_metho.xml
Size: 5,650 bytes
Last Modified: 2025-10-06 14:09:37
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2013"> <Title>Automatic recognition of French expletive pronoun occurrences</Title> <Section position="4" start_page="74" end_page="75" type="metho"> <SectionTitle> 3 Realization </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="74" end_page="74" type="sub_section"> <SectionTitle> 3.1 Left context of the lexical head </SectionTitle> <Paragraph position="0"> In (1c), the left context of the lexical head - the sequence of tokens on the left of difficile (difficult) - is reduced to Il est (it is). However, sentences such as (3a) or (3b), in which the left context of the lexical head is more complex, are frequently found in real texts. In (3a), the left context includes (from right to left) the adverb tres (very) which modifies the adjective, the verb paraitre (seem) in the infinitive form which is a &quot;light verb&quot; for adjectives, the pronoun lui (to him) and finally the modal verb peut (may) preceded by il (it). In (3b), it includes the light verb s'averer conjugated in a compound tense (s'est avere) and negated (ne s'est pas avere).</Paragraph> <Paragraph position="1"> (3)a Il peut lui paraitre tres difficile de resoudre ce probleme (It may seem very difficult to him to solve this problem) b Il ne s'est pas avere difficile de resoudre ce probleme (It didn't turn out to be difficult to solve this problem) As a consequence, for each type of the lexical heads (adjectival, verbal) that anchors an impersonal clause, all the elements that may occur in the left-context have to be determined and integrated in patterns. This raises no real difficulty, though it is time consuming . In contrast, we are faced with tough ambiguities when coming to the right context, as we are going to show it.</Paragraph> <Paragraph position="2"> In the rest of the paper, patterns are presented with simplified left-contexts - as in (2) - for the sake of readability.</Paragraph> <Paragraph position="3"> ILIMP can be re-used in a tool which aims at identifying the lexical head of a clause.</Paragraph> </Section> <Section position="2" start_page="74" end_page="75" type="sub_section"> <SectionTitle> 3.2 Right context of the lexical head Syntactic ambiguities </SectionTitle> <Paragraph position="0"> There is a number of syntactic ambiguities in the right context since, as is well known, a sequence of parts of speech may receive several syntactic analyses. As an illustration, consider the pattern in (4a), in which the symbol Ohm matches any non-empty sequence of tokens. This pattern corresponds to two syntactic analyses: (4b) in which il is impersonal and the infinitival phrase de <V:K> is subcategorized by difficile, and (4c) in which il is anaphoric and the infinitival phrase is part of an NP. These two analyses are illustrated in (4d) and (4e) respectively - these sentences differ only in the adverb ici/juste.</Paragraph> <Paragraph position="1"> (4)a Il est difficile pour Ohm de <V:K> b Il[IMP] est difficile pour [Ohm] de resoudre ce probleme (It is difficult for the students who came here to solve this problem) e Il est difficile pour [les etudiants qui viennent juste de resoudre ce probleme]</Paragraph> <Paragraph position="3"> is difficult for the students who have just solved this problem) To deal with syntactic ambiguities, one solution is to state explicitly that a pattern such as (4a) is ambiguous by means of the tag [AMB] which is to be interpreted as &quot;ILIMP cannot determine whether il is anaphoric or impersonal&quot;. However this tag may be of no help for later processing, especially if it is used too often. Another solution is to rely upon heuristics based on frequencies.</Paragraph> <Paragraph position="4"> For example, sentences which follow the pattern in (4a) are more frequently analyzed as (4b) than as (4c). Therefore il in (4a) can be tagged as [IMP] despite some rare errors. I have adopted this latter solution. The heuristics I use are either based on my linguistic knowledge and intuition and/or on quantitative studies on corpora.</Paragraph> <Paragraph position="5"> Lexical ambiguities In about ten cases, a lexical item may anchor both impersonal and personal clauses with the same subcategorization frame, e.g. the adjective certain (certain) with a clausal complement as illustrated in sentence (5a). Since both readings of (5a) seem equally frequent, il in the pattern Other difficulties A last type of difficulties is found with impersonal clauses with an extraposed nominal subject. See the pair in (6a-b) in which the only difference is du/de, whereas (6a) is impersonal and (6b) personal. Along the same lines, see the pair in (6c-d) in which the only difference is valise/priorite, whereas (6c) is impersonal and (6d) personal.</Paragraph> <Paragraph position="6"> (6)a Il manque du poivre (dans cette maison) (There is pepper missing (in this house)) b Il manque de poivre (ce roti de porc) (It is lacking pepper (this roasting pork)) c Il reste la valise du chef (dans la voiture) (There remains the boss' suitcase (in the car)) d Il reste la priorite du chef (le chomage) (It remains the boss' priority (unemployment)) I have tried to set up heuristics to deal with these subtle differences. However, I did not attempt (perilous) enterprises such as using the feature [+abstract] for nouns.</Paragraph> <Paragraph position="7"> In conclusion, ILIMP relies on a number of heuristics so as to avoid a too frequent use of [AMB]. These heuristics may lead to errors, which are going to be examined.</Paragraph> </Section> </Section> class="xml-element"></Paper>