File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/e99-1035_intro.xml
Size: 4,120 bytes
Last Modified: 2025-10-06 14:06:50
<?xml version="1.0" standalone="yes"?> <Paper uid="E99-1035"> <Title>A Cascaded Finite-State Parser for Syntactic Analysis of Swedish</Title> <Section position="3" start_page="0" end_page="245" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Cascaded Finite-State Automata </SectionTitle> <Paragraph position="0"> Finite-state technology has had a great impact on a variety of Natural Language Processing applications, as well as in industrial and academic Language Engineering. Attractive properties, such as conceptual simplicity, flexibility, and space and time efficiency, have motivated researchers to create grammars for natural language using finite-state methods: Koskenniemi et al. (1992); Appelt et al. (1993); Roche (1996); Roche & Schabes (1997). The cascaded, finite-state mechanism we use in this work is described in Abney (1997): &quot;...a finite-state cascade consists of a sequence of strata, each stratum being defined by a set of regular-expression patterns for recognizing phrases. \[...\] The output of stratum 0 consists of parts of speech. The patterns at level l are applied to the output of level I-1 in the manner of a lexicaI analyzer \[...\] longest match is selected (ties being resolved in favour of the first pattern listed), the matched input symbols are consumed from the input, the category of the matched pattern is produced as output, and the cycle repeats...&quot;, (p. 130).</Paragraph> </Section> <Section position="2" start_page="0" end_page="245" type="sub_section"> <SectionTitle> 2.2 Swedish Finite-State Grammars </SectionTitle> <Paragraph position="0"> There have been few attempts in the past to model Swedish grammars using finite-state methods. K.</Paragraph> <Paragraph position="1"> Church at MIT implemented a Swedish, regular-expression grammar, inspired by ideas from Ejerhed & Church (1983). Unfortunately, the lexicon and the rules were designed to parse a very limited set of sentences. In Ejerhed (1985), a very Proceedings of EACL '99 general description of Swedish grammar was presented. Its algorithmic details were unclear, and we are unaware of any descriptions in the literature of large scale applications or implementations of the models presented. It seems to us that Swedish language researchers are satisfied with the description and, apparently, the implementation on a small scale of finite-state methods for noun phrases only, (Cooper, 1984; Rauch, 1993). However, large scale grammars for Swedish do exist, employing other approaches to parsing, either radically different, such as the Swedish Core Language Engine, (GambPSck & Rayner, 1992), or slightly different, such as the Swedish Constraint Grammar, (Birn, 1998).</Paragraph> </Section> <Section position="3" start_page="245" end_page="245" type="sub_section"> <SectionTitle> 2.3 Pre-Processing </SectionTitle> <Paragraph position="0"> By pre-processing we mean: (i) the recognition of multi-word tokens, phrasal verbs and idioms; (ii) sentence segmentation; (iii) part-of-speech tagging using Brill's (1994) part-of-speech tagger, and the EAGLES tagset for Swedish, (Johansson-Kokkinakis & Kokkinakis, 1996). The general accuracy of the tagger is at the 96% level, (98,7% for the evaluation presented in table (1)). Tagging errors do not influence critically the performance of Cass-SWE 1 (cf. Voutilainen, 1998); (iv) semantic inheritance in the form of NE labels: time sequences, locations, persons, organizations, communication and transportation means, money expressions and body-part. The recognition is performed using finite-state recognizers based on trigger words, typical contexts, and typical predicates associated with the entities. The performance of the NE recognition for Swedish is 97.4% precision, and 93.5% recall, tested within the AVENTINUS 2 domain. Cass-SWE has been integrated in the General Architecture for Text Engineering (GATE), Cunningham et al. (1996).</Paragraph> </Section> </Section> class="xml-element"></Paper>