File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0403_metho.xml
Size: 10,793 bytes
Last Modified: 2025-10-06 14:09:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0403"> <Title>What is at stake: a case study of Russian expressions starting with a preposition</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The analysis of the structure of </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Russian MWEs </SectionTitle> <Paragraph position="0"> First, a few words on the linguistic features of MWEs in Russian in general and of prepositional phrases in particular. Russian is an inflecting language in which a word inflects for a set of morphological categories and shows a specific combination of these categories in its ending. For instance, a noun in Russian has a fixed gender and inflects for 6 to 9 cases and for the number (singular or plural, with relics of the dual, which is relevant for some words). Similarly, an adjective inflects for six cases, two numbers and three genders and agrees with the noun that is the head of the nom- null (1) beloj vorony genitive, singular (2) beloj vorone dative, singular (3) belye vorony nominative, plural inal group in the values of these three categories. This means that an approach that treats MWEs as 'words with spaces inside' is not always suitable for English, and cannot work for Russian. There is a certain variation in the number of forms in an MWE like rara avis in English, because rarae aves and rara avises are both possible according to (OED, 1989), even though they are extremely rare (neither is used in the BNC and Internet searches mostly point to entries in dictionaries), but at least it is feasible to list the two extra forms separately. At the same the Russian expression belaja vorona (corresponding to rara avis, lit. 'white crow') exists in 10 different forms (see examples in Table 1, the endings are underlined) and the variability of forms applies to any nominal group. The situation is even more complicated in the case of MWEs including verbs, given that in addition to several proper verbal forms, a Russian verb can exist in the form of up to four participles, each of which is inflected as an adjective with its own set of forms. At the same time the large number of forms does not mean that each form can be mapped to a lemma and a set of morphological categories without any ambiguity, because the number of endings is much smaller than the number of possible combinations of features. As lines (1) and (2) in Table 1 suggest, the genitive and dative forms of singular feminine adjectives coincide, as well as the genitive singular and nominative plural forms of the noun vorona, see lines (1) and (3).1 If we consider prepositional phrases, the amount of ambiguity is much smaller, because prepositions govern the case of a nominal group that follows them and do not themselves inflect.2 However, PPs still exhibit the general problem of e.g. nominal groups vs. prepositional phrases, follows (Halliday, 1985).</Paragraph> <Paragraph position="1"> ambiguity in lemma selection. For instance, the word form tem is ambiguous between the genitive plural form of the noun tema (topic) and the instrumental singular masculine form of the demonstrative pronoun tot (that). What is more, the prepositional phrase s tem from the purely syntactic view-point can be interpreted in both ways, because the preposition s can govern either the genitive or the instrumental case. At the same time the word tem as the component of s tem chtoby (in order to, lit. 'with that to') shows no ambiguity in its part of speech. More frequently ambiguity concerns the selection of a lemma or morphological properties for the collocate. For instance, the second word in the expression s bol'shim zapasom (with a huge margin, lit. 'with large storage') can be analysed as either of two adjectives bol'shoj (large) or b'ol'shij (larger). Similarly, the last word in the expression do six por (until now, lit. 'before this time') can be analysed as either of two nouns por'a (time, season) or p'ora (pore). However, the expressions as a whole are not ambiguous and have specific meanings.</Paragraph> <Paragraph position="2"> The second problem with prepositional phrases concerns their syntactic function, in particular the notorious PP attachment problem. Even though MWEs consisting of a preposition followed by a nominal group are often identical in their syntactic structure to fully compositional prepositional phrases, they do not carry the same syntactic function as the latter. Such MWEs function in the syntactic structure of the clause as a single unit with a clearly defined meaning that cannot be decomposed into the meaning of their components. In the end, it is better to treat them as adverbs, e.g. v chastnosti (in particular), pod kljuch (turnkey, lit. 'under key'), or as prepositions in their own right, e.g. v techenie ('in the course of'). Multiword expressions starting with a preposition in English have similar structure, but the difference with Russian is that there is no change in the structure of the prepositional group, unlike some English MWEs, e.g. in line, at large, which do not have a determiner. Thus, we cannot use the difference in the PP structure as an indicator of an MWE.</Paragraph> <Paragraph position="3"> The fact that MWEs are not fully compositional means that the meanings of their constituent words change resulting a specific idiomatic meaning of the whole contstruction. In this case we cannot accept the general assumption of one sense per discourse (Gale et al., 1992), because words such as line, large in English or kljuch in Russian can function in the same discourse in a totally different sense. However, the assumption of one sense per collocation can hold, because an MWE with a prepositional phrase typically has one and the same meaning: even though line, large or techenie are ambiguous, in line, at large and pod kljuch, v techenie have their specific meanings.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Methodology </SectionTitle> <Paragraph position="0"> The study starts with the selection of the list of the most frequent prepositions to account for a large number of potential collocations. Information on the frequency of prepositions (Table 2) is taken from the pilot version of the Russian Reference Corpus, which currently consists of about 55 million words (Table 2 lists the relative frequency of prepositions in terms of the number of their instances per million words, ipm).</Paragraph> <Paragraph position="1"> Then for each preposition we extract its most frequent collocations in the same corpus and weight them according to the pointwise mutual information score (MI score) and Student's t test (T score). Two types of collocates are extracted: all lexical items occurring immediately on the right of a preposition and the longest possible nominal groups defined as the sequence of adjectives and nouns with the condition that nouns after the first one are in the genitive case. This simple pattern captures the majority of Russian nominal groups, except those with elaborations of other clauses or other prepositional phrases embedded inside them.</Paragraph> <Paragraph position="2"> Anyway, because of their nature they do not belong to the class of fixed expressions under study.</Paragraph> <Paragraph position="3"> The MI score foregrounds collocations in which the second component rarely (almost never) occurs outside of the expression in question, whereas the T score foregrounds the most stable collocations on the basis of their frequency.</Paragraph> <Paragraph position="4"> For every preposition and the list of its most significant collocates we select MWEs on the basis of the lack of compositionality, namely that there is a specific function performed by the expression and this function cannot be automatically derived from the meaning of the words comprising the candidate MWE. The criterion cannot be defined precisely, but in many cases it is immediately obvious that the candidate MWE is or is not fully compositional. For instance, the expression bez vsjakoj svjazi ('for no apparent reason', lit. 'without any connection') is sufficiently frequent (38 instances) and the last element has a lexical ambiguity svjaz': connection (either physical or logical) or communication. When the MWE is used in texts, it has a specific function, namely someone's discourse is evaluated as lacking a continuity. Thus, bez vsjakoj svjazi is treated as an MWE. On the other hand, the expression v Rossii (in Russia) is much more frequent and statistically significant (14557 instances, its T score is 104.21), but the set of locations constitutes an open list, in which other members may be also frequent, e.g. v SSHA (in the USA, 4739 instances), v Evrope (in Europe, 2752), v Parizhe (in Paris, 2087), v Kitae (in China, 1055), and the expressions are fully compositional. None of them are considered to be MWEs. At the same time, an expression with a very similar structure: v storone ([to keep] aloof, lit. 'in side', 9690 instances, its T-score is 83.95) is considered to be an MWE, because it is not compositional. The vast majority of uses of this expression do not refer to a physical location, but to the fact that a person does not take part in a joint activity.</Paragraph> <Paragraph position="5"> Also, because of the idiomaticity of the meaning of an MWE, it functions as a whole in the syntactic structure of the clause, most typically as an adjunct, and is translated to other languages in a specific way not necessarily related to prepositional phrases. The possibility of its translation into English without the use of a prepositional phrase is another reason for treating the expression to be a potential MWE.</Paragraph> <Paragraph position="6"> Finally, an easy test for detecting an MWE concerns the &quot;penetrability&quot; of the expression, i.e. the possibility to insert another word, most typically an adjective or a determiner, into the candidate MWE. If any insertion is unlikely or the meaning of components is redefined as the result of insertion, then the expression in question is an MWE.</Paragraph> <Paragraph position="7"> For instance, even though the MWE v storone can be modified as v drugoj/levoj/protivopolozhnoj storone (on the other/left/opposite side), the result- null of their patterns ing expressions refer to physical locations and not to the idiomatic meaning of the MWE v storone.</Paragraph> <Paragraph position="8"> Thus, they are not considered as MWEs but the possibility of insertion here does not violate the penetrability of the MWE in question.</Paragraph> </Section> class="xml-element"></Paper>