File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0403_concl.xml
Size: 3,514 bytes
Last Modified: 2025-10-06 13:54:08
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0403"> <Title>What is at stake: a case study of Russian expressions starting with a preposition</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> The paper reports the first attempt to apply computational methods to the detection and use of multiword expressions in Russian. The study resulted in a list of about 700 prepositional phrases which is available from http://www.comp.leeds.ac.uk/ ssharoff/frqlist/mwes-en.html. The list offers rough results of MWE selection: it includes proper idioms, of the type one can find in a phraseological dictionary, in particular items missed or underdescribed in such dictionaries, so that it can be used as a source for improving them. However, it also includes items on the edge between idioms and other types of lexicalised phrases, for instance, grammatical constructions or institutionalised phrases.</Paragraph> <Paragraph position="1"> The study shows that a simple method with little syntactic knowledge about the structure of PPs in Russian and no semantic resources can produce a useful list of MWEs. The combination of automatic detection of the most significant collocations and manual filtering of the results is not labour intensive and produces many expressions that are not covered in existing Russian dictionaries. null The next immediate step would be to use the lists for the study of translation equivalence between English and Russian, because MWEs are also not adequately represented in bilingual dictionaries, whereas their translation causes significant problems for language learners as well as for machine translation systems. For instance, the Oxford Russian Dictionary lists 13 translations of bez (without), including such idioms as bez uma ('be crazy about something', lit. 'without mind'), but fails to list many other more frequent constructions, such as bez ocheredi (to jostle to the front of the queue, lit. 'without queue'), bez umolku ([to talk] nonstop), bez sleda ([to vanish] without any hint), etc.</Paragraph> <Paragraph position="2"> The lists can also act as a useful resource for morphological and semantic disambiguation. The list covers about 2% of the running text in the corpus, yet it reduces semantic ambiguity in the running text by 4-7%, and morphological ambiguity by 11%. We did not experiment with the reduction of syntactic ambiguity, because there is no Russian syntactic parser that can give robust parsing of an unrestricted corpus, such as that used in the study. Also, there is no easy way to force existing parsers to treat the identified MWEs as separate syntactic units on the clause level. However, we expect that accuracy will increase, because the set of identified MWEs reduces the number of PP attachment problems, as each MWE acts as an adjunct unit of its own within the clause.</Paragraph> <Paragraph position="3"> The domain of prepositional phrases has been chosen specifically because it is relatively easy to guess the structure from the form by means of shallow parsing. Further experiments may consider detection of other types of MWEs, in particular, with light verbs, such as brat' primer (to follow the example of someone, lit. 'take example'), which are also very important for translation, but given the free word order in Russian this extension requires syntactic parsing to detect the dependency structure.</Paragraph> </Section> class="xml-element"></Paper>