File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/c92-3145_abstr.xml

Size: 2,062 bytes

Last Modified: 2025-10-06 13:47:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-3145">
  <Title>A Freely Available Wide Coverage Morphological Analyzer for English*</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents a morphological lexicon for English that handle more than 317000 inflected forms derived from over 90000 stems. The lexicon is available in two formats. The first can be used by an implementation of a two-level processor for morphological analysis (Karttunen and Wittenhurg, 1983; Antworth, 1990). The second, derived from the first one for efficiency reasons, consists of a disk-based database using a UNIX hash table facility (Seltzer and Yigit, 1991). We also built an X Window tool to facilitate the maintenance and browsing of the lexicon. The package is ready to be integrated into an natural language application such as a parser through hooks written in Lisp and C.</Paragraph>
    <Paragraph position="1"> To our knowledge, this package is the only available free English morphological analyzer with very wide coverage. null attributes. To improve performance, we used PC-KIMMO as a generator on our lexicons to build a disk-based hashed database with a UNIX database facility (Seltzer and Yigit, 1991). Both formats, PC-KIMMO and database, are now available for distribution. We also provide an X Window tool for the database to facilitate maintenance and access. Each format contains the morphological information for over 317000 English words. The morphological database for English runs under UNIX; PC-KIMMO runs under UNIX and on a PC.</Paragraph>
    <Paragraph position="2"> This package can be easily embedded into a natural language parser; hooks for accessing the morphological database from a parser are provided for both Lucid Common Lisp and C. This morphological database is currently being used in a graphical workbench (XTAG) for the development of tree-adjoining grammars and their parsers (Paroubek et al., 1992).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML