File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/a94-1025_intro.xml

Size: 3,380 bytes

Last Modified: 2025-10-06 14:05:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1025">
  <Title>A robust category guesser for Dutch medical language</Title>
  <Section position="3" start_page="0" end_page="150" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="150" type="sub_section">
      <SectionTitle>
1.1 NLP in medicine
</SectionTitle>
      <Paragraph position="0"> Medical patient reports consists mainly of free text, combined with results of various laboratories. While numerical data can easily be stored and processed for archiving and research purposes, free text is rather difficult to be treated by computer, although it contains the most relevant information. Several authors put forward the hypothesis that Natural Language Processing (NLP) and Knowledge Representation (KR) of medical discharge summaries have become the key-issues in the domMn of intelligent medical information processing (Baud et al., 1992), (Gabrieli and Speth, 1987), (McCray, 1991). However, only a few NLP-driven systems have actually been implemented (Friedman and Johnson, 1992) . For Dutch, a limited prototype has been developed (Spyns, 1991), (Spyns and Adriaens, 1992). A broader system covering a larger part of the Dutch grammar and medical vocabulary is currently under development*  This activity forms part of the MENELAS-project 1 * This project comprises a morphological, syntactic, semantic and pragmatic analysis of the medical sub-language for Dutch, English and French (Spyns et al., 1992). The project also focuses on Knowledge Representation (by means of Conceptual Graphs) (Sowa, 1984), (Volot et al., 1993) and Production</Paragraph>
    </Section>
    <Section position="2" start_page="150" end_page="150" type="sub_section">
      <SectionTitle>
Systems (Bouaud and Zweigenbaum, 1992).
1.2 The Category Guesser for Dutch
Medical Language
</SectionTitle>
      <Paragraph position="0"> This paper focuses on the morphological and lexical component of the system, which is a combination of a database application and a Prolog rule interpreter. This component is already functioning and is used continuously during the current extension of the coverage of the Dutch grammar (Spyns et al., 1993). The importance of morphologic analysis of medical vocabulary has been widely recognised (Wingert, 1985), (Wolff, 1984), (Dujols et al., 1991), (Pacak and Pratt, 1969) (Pacak and Pratt, 1978), (Norton, 1983)* In the following sections, we will describe the different parts which interact to identify the word forms of a given sentence* The various stages of the analysis of the word forms are described* A major distinction can be made between forms &amp;quot;known by the system&amp;quot; (= stored in the dictionary cf. section 2) and unknown forms whose linguistic characteristics need to be computed and are hypothetical. The latter can be based on morphologic knowledge (section 3) or other heuristics (sections 4, 5 ~z 6). Each section is illustrated by an example or some implementation details. A schematic overview of the architecture of the category guesser is presented in section 7. The subsequent section (8) is devoted to the evaluation, which will guide the further elaboration of the here described category guesser. The paper ends with a conclusion and discussion (section 9).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML