File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1701_intro.xml
Size: 5,276 bytes
Last Modified: 2025-10-06 14:01:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1701"> <Title>RDF(S)/XML LINGUISTIC ANNOTATION OF SEMANTIC WEB PAGES</Title> <Section position="3" start_page="2" end_page="12" type="intro"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> Although with the Semantic Web initiative much research on web page semantic annotation has already been done by AI researchers, linguistic text annotation, including the semantic one, was originally developed in Corpus Linguistics and its results have been somehow neglected by AI. The purpose of the research presented in this proposal is to prove that integration of results in both fields is not only possible, but also highly useful in order to make Semantic Web pages more machine-readable. A multi-level (possibly multi-purpose and multilanguage) annotation model based on EAGLES standards and Ontological Semantics, implemented with last generation Semantic Web languages (RDF(S)/XML) is being developed to fit the needs of both communities; the present paper focuses on its semantic level.</Paragraph> </Section> <Section position="4" start_page="12" end_page="12" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> All of us are by now accustomed to making extensive use of the so-called World Wide Web (WWW) which we might consider a great source of information, accessible through computers but, hitherto, only understandable to human beings. In its beginning, web pages were hand made, intended and oriented to the exchange of information among human beings. Due to the astonishing growth of Internet use, new technologies emerged and, with them, machine-aided web page generation appeared. Up to that point, the structure and the edition of these pages fitted only human needs - and this, only to some extent. All of these documents contained a huge amount of text, images and even sounds, meaningless to a computer. In this way, they put Dept. of Computer Systems and Programming.</Paragraph> <Paragraph position="1"> on the reader the burden of extracting and interpreting the relevant information in them.</Paragraph> <Paragraph position="2"> Currently, web page presentation in the WWW is being handled independently from its content, mainly through the use of XML (Bray et al., 1998) or other resource-oriented languages as XOL (Karp et al., 1999), SHOE (Luke et al., 2000), OML (Kent, 1998), RDF (Lassila et al., 1999), RDF Schema (Brickley et al., 2000), OIL (Horrocks et al., 2000) or DAML+OIL (Horrocks et al., 2001). But even though the automatic process of information is being eased, the above mentioned tasks - relevant information access, extraction and interpretation - cannot be wholly performed by computers yet. Hence, the goal of enabling computers to understand the meaning (the semantics) of written texts and web pages to make it explicit to computers is gaining a growing relevance. That is the main pillar sustaining the development of what we understand by Semantic Web: &quot;the conceptual structuring of the web in an explicit machine-readable way&quot; (Berners-Lee et al., 1999). In this context, the semantic annotation of texts makes meaning explicit, and has become a key topic. Thus, great efforts are being devoted to the design and application of models and formalisms for the semantic annotation of web pages to make these documents more machinereadable. null Following the guidelines of the Semantic Web initiative, much research has already been carried out by AI researchers on the semantic annotation of web pages (Luke et al., 2000), (Benjamins et al., 1999), (Motta et al., 1999), (Staab et al., 2000). However, these researchers have neglected, somehow, the decades of work and the results obtained in the field of Corpus Linguistics on corpus annotation, not only in the semantic level, but also in other linguistic levels. These other linguistic levels, whilst not being intrinsically semantic, can also add some semantic information and help a computer understand a text or, in our case, web pages.</Paragraph> <Paragraph position="3"> This paper will show the results of our research on how linguistic annotation can help computers understand the text contained in a document - a Semantic Web document, for example. Special efforts are devoted to finding a way of bringing together and identifying complementarities between the semantic annotation models from AI and the annotations proposed by Corpus Linguistics. As stated in this paper, far from being irreconcilable, they are more than close and may be considered complementary.</Paragraph> <Paragraph position="4"> This paper is organised as follows: firstly, an introduction to the state of the art in text semantic annotation in corpus linguistics is presented (section 1). Secondly, in section 2, some brief notes on the use of ontologies in semantic annotation is sketched. Thirdly, in section 3, an example of the integration of both paradigms (AI's and Corpus Linguistics') is presented in the scope of our project goals. The main advantages of this integration is analysed afterwards - section 4 - and some conclusions are stated - section 5 -, followed by the acknowledgments section and, finally, the references.</Paragraph> </Section> class="xml-element"></Paper>