File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0112_metho.xml
Size: 14,584 bytes
Last Modified: 2025-10-06 14:07:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0112"> <Title>Teaching Computational Linguistics at the University of Tartu: Experience, Perspectives and Challenges</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Background to tuition </SectionTitle> <Paragraph position="0"> Tuition can be efficient only when the lecturers themselves are active researchers.</Paragraph> <Paragraph position="1"> R&D work in computational linguistics and language technology in Estonia is being carried out, in addition to UT, in the Institute of the Estonian Language (IEL is a research institution) and the Institute of Cybernetics of Tallinn Technical University. The IEL focuses on compiling computer lexicons but also on the computer (morphological) processing of Estonian.</Paragraph> <Paragraph position="2"> The topics at the Institute of Cybernetics include the generation of spoken Estonian (together with IEL), the Institute has also started compiling databases necessary for speech recognition.</Paragraph> <Paragraph position="3"> The R&D at the University of Tartu has focused on the computer analyses of Estonian texts and compiling the text corpora of Estonian underlying that research. The major research topics include a0 Formalising the morphology and syntax of the Estonian language a0 Formalising the semantics of Estonian (incl compiling a lexico-semantic database) a0 Pragmatics: modelling the Estonian (spoken) dialogue.</Paragraph> <Paragraph position="4"> Based on the results of the research, various linguistic software and resources have been elaborated at UT: a0a0 a0a0 The morphological analyser and generator of Estonian; the university spin-off language software company Filosoft in turn has used these resources for creating the Estonian spell checker and hyphenator (included in the MS Office package) a0 Estonian morphologic disambiguator and syntax analyser a0 Various corpora of written Estonian for the period 1890-1990 (total 3 million words), partly morphologically and syntactically tagged a0 A corpus of spoken Estonian (300 000 words transliterated), and a corpus of Estonian dialogues based on it (60 000 words).</Paragraph> <Paragraph position="5"> UT computational linguists have participated in a number of international projects, e.g. GLOSSER, MULTEXT-EAST, TELRI-I, TELRI-II, CONCEDE, EuroWordNet, BABEL, to name some, and carried out numerous projects commissioned by the Estonian Science Foundation and the Estonian Informatics Centre. The plans for the next years include further development of language software, incl morphological and semantic disambiguator and the syntactic analyser and generator, to study and model the formal structure of dialogues, to expand the tagged corpora. All these outputs can be used in various language technology applications, from aids to the text compiler (e.g. grammar and style checkers in a text editing programme) to machine translation or man-machine dialogue in Estonian. Previously, in 1960s, UT was engaged in language statistics and automated information retrieval (in 1970s, an information retrieval system for legal texts was developed at UT).</Paragraph> <Paragraph position="6"> In the second half of 1960s, a special structural linguistics work group (the so-called generative grammar group) was active at the Department of the Estonian Language. It included lecturers, doctoral students and students. Quite a few of the present computational linguists received their first knowledge of CL from that group.</Paragraph> <Paragraph position="7"> Until 1998, obtaining the CL education was only possible according to individually tailored study plans.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Curriculum </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Earlier experience </SectionTitle> <Paragraph position="0"> While elaborating the CL curriculum and preparing and modifying new courses, we have tried to take into account the experience of other universities.</Paragraph> <Paragraph position="1"> As demonstrated by a questionnaire carried out in March 1999 in 60 European universities where CL is being taught (de Smedt et al., 1999), three options are basically used in teaching CL and At UT, as already mentioned, CL is taught as an independent subject in the Faculty of Philosophy. The amount of studies is characterised by the number of credit points (CP) where 1 CP corresponds to 40 hours of work by the student (incl. independent work). The total amount of the 4-year bachelor studies is 160 CP. In the current curriculum, the amount of CL is As may be seen, the block of Computer Science is very small (presently, it includes only the subjects &quot;Prolog for linguists&quot;, 2 CP, and &quot;UNIX for linguists&quot;, 1 CP). In the new curriculum, we have increased the importance of that block.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 New curriculum </SectionTitle> <Paragraph position="0"> The necessity for the new curriculum was prompted by the higher education reform in Estonia. The reform was launched in 2000 and is based on various international documents and agreements (Magna Charta Universitatum).</Paragraph> <Paragraph position="1"> The main objectives of the Estonian higher education reform are: a0 to expand the cross-curriculum share of subject areas by widening the opportunities of interdisciplinary studies; a0 to simplify the system of university education levels; a0 to simplify and expand the opportunities of students from different specialities to continue their studies in other universities (also outside Estonia).</Paragraph> <Paragraph position="2"> The Bachelor level education will be achieved after completing a 3-year curriculum (nominal study period, 120 CP). The Master level education will be achieved upon completing a 5-year curriculum (nominal study period, 200 CP). The Doctoral level education will be achieved upon completing a 9-year curriculum (nominal study period, 360 CP) and defending the doctoral thesis. Bachelor studies will provide general theoretical education at the university level. In the first year, the students study subjects shared by the curricula of one broad field. The second year they study following the narrow field module and the third year is devoted to specialised subjects. Master studies will provide special, professional knowledge and vocational skills. The &quot;Estonian and Finno-Ugric linguistics&quot; curriculum that will become operational in the Faculty of Philosophy foresees the possibility of majoring in computational linguistics.</Paragraph> <Paragraph position="3"> In the bachelor studies one has to take a0 two modules of elective subjects (each 16 CP) from the list of the following subjects: &quot;Estonian language&quot;, &quot;Estonian language and culture for non-Estonians&quot;, &quot;Finnish language and culture&quot;, &quot;Finno-Ugric languages&quot;, &quot;Hungarian language and culture&quot; and &quot;General linguistics&quot;; a0 electives and optional subjects (20 CP) that may be chosen from any curriculum.</Paragraph> <Paragraph position="4"> The CL speciality module entails the following subjects: &quot;Mathematics for computational linguists I&quot;, &quot;Programming&quot;, &quot;Data analysis in humanities&quot;, &quot;Linguistic theories for computational linguists&quot;, each amounting to 4 CP. Upon completing bachelor studies, the student will receive the Bachelor degree in Estonian and Finno-Ugric linguistics. As a rule, this education will not guarantee entry to the labour market (at least not as a computational linguist) but has to be followed by master studies. It is assumed that at least 75% of students admitted to bachelor studies will continue their master studies.</Paragraph> <Paragraph position="5"> Master studies curriculum comprises of speciality studies (56 CP), master thesis (20 CP) and optional subjects (4 CP). A prerequisite for starting the master's studies is either a bachelor's degree or education level corresponding to it. A preliminary condition for entrance to the CL speciality is having taken the speciality module during the bachelor studies. Thus every person with a bachelor's degree who has taken the 4 subjects comprising the CL speciality module may enter the CL master's studies.</Paragraph> <Paragraph position="6"> In master studies, the CL speciality studies will consist of compulsory subjects (22 CP) and electives (34 CP). The compulsory subjects are &quot;Introduction to CL&quot;, &quot;Corpus linguistics&quot;, &quot;Language technology&quot;, &quot;Mathematics for computational linguists II&quot; and the master's seminar. The list of electives is open and will be updated according to requirements and possibilities.</Paragraph> <Paragraph position="7"> The present list includes subjects from linguistics, computer science and CL.</Paragraph> <Paragraph position="8"> Linguistics subjects include, for example &quot;Phonology and morphology&quot;, ,,Syntax of Estonian&quot;, &quot;Semantics&quot;, &quot;Theories of linguistic communication&quot;, ,,Pragmatics&quot;. Computer science subjects included in the list are &quot;Artificial Intelligence I and II&quot;, &quot;Applied software: Perl&quot;, ,,Databases&quot;; Computational Linguistics subjects include such as &quot;Computational morphology&quot;, &quot;Computational lexicology&quot;, &quot;Syntactic analyser&quot;, etc.</Paragraph> <Paragraph position="9"> The majority of the subjects are the same that have been and are taught within the existing curriculum but there will also be several new ones, e.g. &quot;Statistical models of natural languages&quot; and &quot;Introduction to speech technology&quot;.</Paragraph> <Paragraph position="10"> The qualification conferred to a graduate will be that of master of Estonian and Finno-Ugric linguistics (computational linguistics). This is a specialist whose computational linguistics education is based on linguistics (this has been the case up to now when studying according to the present curriculum).</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 A new opportunity: language </SectionTitle> <Paragraph position="0"> technology In connection with preparing new curricula it became possible to start preparing computational linguists with a different education at UT based on computer science -language technology studies. Proceeding from the new computer science curriculum to come into force in the Faculty of Mathematics and Computer Science it is possible to choose blocks of linguistics and CL subjects that have been named language technology modules (to differentiate them from the CL modules of the curriculum of Estonian and Finno-Ugric linguistics in the Faculty of Philosophy).</Paragraph> <Paragraph position="1"> Bachelor studies in computer science will provide general knowledge in the classical branches of mathematics and basic knowledge of computer software, hardware, networks and systems, artificial intelligence, software technologies and data protection and a certain amount of practical skills for work in the computer science (incl programming skills). It is possible to choose between theoretical computer science, software systems and language technology. Upon completing bachelor studies, the student will receive the Bachelor degree in Computer Science.</Paragraph> <Paragraph position="2"> The language technology narrow field module (16 CP) comprises the subjects &quot;Language technology&quot;, &quot;Introduction to CL&quot;, &quot;Corpus linguistics&quot;, &quot;Introduction to general linguistics&quot; and &quot;Database theory&quot;.</Paragraph> <Paragraph position="3"> Master studies in computer science will provide profound knowledge in one area of the computer science enabling the person to carry out development activities in that area; skills to provide consultations; team-work and project management skills. It is possible to major in theoretical computer science, cryptology or language technology.</Paragraph> <Paragraph position="4"> The qualification conferred to a graduate will be that of master of computer science.</Paragraph> <Paragraph position="5"> A prerequisite for entrance to master's studies is the bachelor's degree in the computer science (or in a close speciality) and prerequisite subjects amounting to 20 CP (Object-oriented programming, Algorithms and data structures, Introduction to mathematical logics, Elements of discrete mathematics, Algebra I, Data bases).</Paragraph> <Paragraph position="6"> The didactics of informatics and master seminar are compulsory for all master students (both 4 CP); optional subjects may be chosen for the amount of 4 CP.</Paragraph> <Paragraph position="7"> Those who major in language technology will have the following compulsory subjects in master studies: Software technology, Automata, languages and translators, Graphs, Theory of databases, Artificial Intelligence I, Computational morphology, Syntax theories and models, Computational lexicology, Semantics, Statistical models of natural languages, total 32 CP. In addition to that, 16 CP of electives from the open list that will be updated according to needs and opportunities as the similar list for computational linguistics Although these two lists have a quite big overlapping area they are not the same. In case of computational linguistics, linguistic subject will take up a major part of the list; in case of language technology, computer science subjects will replace them (e.g. Methods of logical programming, Methods of functional programming, Systems modelling, Formal languages).</Paragraph> <Paragraph position="8"> Therefore, a person who has completed this curriculum is an information scientist who has additionally studied linguistic and computational linguistic subjects to such an extent that he/she will have a systematic picture of the tasks of natural language processing and will be able to solve these tasks in co-operation with linguists.</Paragraph> </Section> class="xml-element"></Paper>