File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/91/e91-1025_abstr.xml

Size: 4,172 bytes

Last Modified: 2025-10-06 13:47:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="E91-1025">
  <Title>Parsing without lexicon: the MorP system</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> MorP is a system for automatic word class assignment on the basis of surface features. It has a very small lexicon of form words (%o entries), and for the rest works entirely on morphological and configurational patterns. This makes it robust and fast, and in spite of the (deliberate) restrictedness of the system, its performance reaches an average accuracy level above 91% when run on unrestricted Swedish text.</Paragraph>
    <Paragraph position="1"> Keywords: parsing, morphology.</Paragraph>
    <Paragraph position="2"> The development of the parser to be presented has been supported by the Swedish Research Council for the Humanities. The parser is called MorP, for morphology based parser, and the hypotheses behind it can be formulated thus: a) It is to a large extent possible to decide the word class of words in running text from pure surface criteria, such as the morphology of the words together with the configurations that they appear in.</Paragraph>
    <Paragraph position="3"> b) These surface criteria can be described so dearly that an automatic identification of word class will be possible.</Paragraph>
    <Paragraph position="4"> c) Surface criteria give signals that will suffice to give a word class identification with a level of around or above  gunnel@/ing.su.se 90% correctness, at least for a language with as much inflectional Swedish.</Paragraph>
    <Paragraph position="5"> morphology as A parser was constructed along these lines, which are first presented in Brodda (1982), and the predictions of the hypotheses were found to hold fairly well. The project is reported in publications in Swedish (K/illgren 1984a) and English (K/illgren 1984b, 1985, 1991a) and the parser has been tested in a practical application in connection with information retrieval (K/illgren 1984c, 1991a). We also plan to use the parser in a project aimed at building a large tagged corpus of Swedish (the SUC corpus, K/illgren 1990, 1991b). The MorP parser is implemented in a high-level string manipulating language developed at Stockholm University by Benny Brodda. The language is called Beta and fuller descriptions of it can be found in Brodda (1990). The version of Beta that is used here is a PC/DOS implementation written in Pascal * (Malkior-Carlvik 1990), but Macintosh and DEC versions also exist.</Paragraph>
    <Paragraph position="6"> The rules of the parser are partitioned between different subprograms that perform recognition of different surface patterns of written language. The first programs work on single words and segments of words and add their analy- 143sis directly into the string. Later programs look at the markings in the string and their configurations. The programs can add markings on previously unmarked words, but can also change markings inserted by earlier programs. The units identified by the programs are word classes and two kinds of larger constituents: noun phrases and prepositional phrases. The latter constituents are established mainly as a step in the process of identifying word class from contextual criteria. After the processing, the original string is restored and the final result of the analysis is given in the form of tags, either after or below the words or constituents.</Paragraph>
    <Paragraph position="7"> An interesting feature of the MorP parser is its way of handling non-deterministic situations by simply postponing the decision until enough information is available. The postponing of decisions is partly done with the use of ambiguous word class markers that are inserted wherever the morphological information signals two possible word classes. Hereby, all other word classes are excluded, which reduces the number of possible choices considerably, and later programs can use the information in the ambiguous markers both to perform analysis that does no t require full disambiguation and to ultimately resolve the ambiguity.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML