File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2013_intro.xml
Size: 5,549 bytes
Last Modified: 2025-10-06 14:02:55
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2013"> <Title>Automatic recognition of French expletive pronoun occurrences</Title> <Section position="3" start_page="73" end_page="74" type="intro"> <SectionTitle> 2 Method </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="73" end_page="73" type="sub_section"> <SectionTitle> 2.1 Lexicon-Grammar </SectionTitle> <Paragraph position="0"> As for most linguistic phenomena, the impersonal use of il depends on both lexical and syntactic conditions. For example, the adjective violet (purple) can never be the lexical head of an impersonal clause - see (1a); the adjective probable (likely) followed by a clausal complement anchors an impersonal clause - see (1b); and the adjective difficile (difficult) when followed by an infinitival complement introduced by the preposition de (resp. a) anchors an impersonal (resp. personal) clause - see (1c) and (1d).</Paragraph> <Paragraph position="1"> (1)a Il est violet (It is purple) b Il est probable que Fred viendra (It is likely that Fred will come) c Il est difficile de resoudre ce probleme (It is difficult to solve this problem) d Il est difficile a resoudre, ce probleme (It is difficult to solve, this problem) Therefore, the French lexicon-grammar developed by Maurice Gross and his group (Gross 1994, Leclere 2003) is an appropriate linguistic resource for ILIMP since it describes, for each lexical head of a clause, its syntactic arguments and the possible alternations. From the lexicon-grammar, I have (manually) extracted all the items that can be the lexical head of an impersonal clause while recording their syntactic arguments. See below for a brief overview of the lexical heads that I have recorded.</Paragraph> <Paragraph position="2"> First, one has to distinguish verbal phrases for which the subject can only be the impersonal pronoun il, from those whose surface subject is impersonal il because their deep subject is extraposed in a post-verbal position.</Paragraph> <Paragraph position="3"> Among the former ones, I have compiled 45 meteorological verbal phrases (Il neige (It snows), Il fait beau (It is a nice day)), 21 verbs from Table 17 of (Gross 1975) (Il faut que Fred vienne /du pain) and 38 frozen expressions (Il etait une fois (once upon a time), quoi qu'il en soit (whatsoever)).</Paragraph> <Paragraph position="4"> Among the latter ones, one has to distinguish those with a clausal extraposed subject from those with a nominal extraposed subject.</Paragraph> <Paragraph position="5"> Among the former ones, I have compiled 682 adjectives (Il est probable que Fred viendra (it is likely that Fred will come)), 88 expressions of the form Prep X (Danlos 1980) (Il est de regle de faire un cadeau (It is standard practice to make a present)), and around 250 verbs from (Gross 1975) (Il est dit que Fred viendra (It is said that Fred will come)).</Paragraph> <Paragraph position="6"> Among the latter ones with a nominal extraposed subject, some are quite frequent verbs such as rester or manquer, while others are verbs in the passive form only used in a refined register (Il est venu trois personnes (Three persons came)).</Paragraph> </Section> <Section position="2" start_page="73" end_page="74" type="sub_section"> <SectionTitle> 2.2 Unitex Unitex </SectionTitle> <Paragraph position="0"> is a tool which allows us to write linguistic patterns (regular expressions or automata) which are located in the input text, with a possible addition of information when an automaton is in fact a transducer. A raw text, when given as input to Unitex, is first preprocessed: it is segmented into sentences, some compound expressions are recognized as such, and each token is tagged with all the parts of speech and inflexion features recorded in its entry (if any) in the French full-form morphologic dictionary DELAF (Courtois 2004).</Paragraph> <Paragraph position="1"> There is no disambiguation at all; in other words, the pre-processing in Unitex does not amount to a tagging.</Paragraph> <Paragraph position="2"> For ILIMP, the basic idea is to manually write patterns (transducers) such as the one presented in (2) in a simplified linear form. <etre.V:3s> targets the third person singular inflected forms of the verb etre; <Adj1:ms> targets the masculine singular adjectives that belong to the class Adj1, which groups together adjectives behaving as difficult; <V:K> targets any verb in the infinitive form. [IMP] is a tag which is added in the input text to the occurrences of il that appear in clauses which follow the pattern in (2). The occurrence of il in (1c) is thereby marked with tag [IMP].</Paragraph> <Paragraph position="3"> Unitex is a GPL open source system, which is similar to Intex (Silberstein 1994). Documentation and download of Unitex can be found at the following url: http://ladl.univmlv.fr. null Tag [ANA] is the default value: it marks the occurrences of il that have not been tagged with [IMP]. The occurrence of il in (1d) is thereby marked with tag [ANA]. Nevertheless, the matter is slightly more complex, since there is a third tag, [AMB], which is explained in Section 3.2.</Paragraph> <Paragraph position="4"> The output of ILIMP is therefore the input text in which each occurrence of il is marked with one of [IMP], [ANA] and [AMB].</Paragraph> <Paragraph position="5"> After this presentation of the theoretical principles underlying ILIMP, let us examine its realization.</Paragraph> </Section> </Section> class="xml-element"></Paper>