File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2001_intro.xml

Size: 2,113 bytes

Last Modified: 2025-10-06 14:01:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-2001">
  <Title>Ronan.Reilly@may.ie</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Vast amounts of information are now available in electronic form to which accurate and speedy access is getting more difficult. The increasing quantity of information has created a need for intelligent management and retrieval techniques. Many of the existing information retrieval systems, which deal with large volumes of documents, have poor retrieval performance because these systems can use a little knowledge in the documents. By adopting XML as a standard document format, content-based queries can be performed by exploiting the XML structure of the documents. In addition, specifically tagged sections of the documents can be searched rather than the entire document, thus providing fast and effective retrieval. Furthermore, using the logical structure of a document created by XML markup, different types of operations can be performed, for example, the same content can be reused in a variety of formats, specific elements can be extracted from the XML documents and full documents satisfying certain structural conditions can be retrieved from the database. These and other advantages of using XML make it a complete solution for content management and intelligent information retrieval. However, despite the advantages and the popularity of XML, we still do not have large repositories of XML because automatic XML markup is still a challenge and the process of manually marking up XML documents is complex, tedious and expensive. Most of the existing automatic markup systems are limited to certain domains and do not perform general automatic markup. In addressing the need for more general automatic markup of text documents, we present a system with a novel hybrid architecture. The system uses the techniques of Self-Organizing Map (SOM) algorithm (Kohonen, 1997) and an inductive learning algorithm, C5 (Quinlan, 1993, 2000).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML