File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1028_intro.xml

Size: 1,279 bytes

Last Modified: 2025-10-06 14:05:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1028">
  <Title>THE MURASAKI PROJECT: MULTILINGUAL NATURAL LANGUAGE UNDERSTANDING</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> This paper describes a multilingual data extraction system under development for the Department of Defense (Do\[)). The system, called Murasa.ki, processes Spanish and Japanese newspaper articles reporting AIDS disease statistics. Key to Murasaki's design is its language-independent and domain-independent architecture. The system consists of shared processing modules across the three languages it currently handles (English, Japanese, and Spanish), shared general and domain-specific knowledge bases, and separate data modules for language-specific knowledge such as grammars, lexicons, morphological data and discourse data. This data-driven architecture is crucial to the success of Murasaki as a language-independent system; extending Murasaki to additional languages can be done for the most part merely by adding new data. Some of the data can be added with user-friendly tools, others by exploiting existing on-line data or by deriving relevant data from corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML