File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-2019_intro.xml
Size: 2,245 bytes
Last Modified: 2025-10-06 14:01:50
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-2019"> <Title>Integrating Information Extraction and Automatic Hyperlinking</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The utilization of language technology for the creation of hyperlinks has a long history (e.g., Allen et al., 1993). Information extraction (IE) is a technology that can be applied to identifying both sources and targets of new hyperlinks. IE systems are becoming commercially viable in supporting diverse information discovery and management tasks. Similarly, automatic hyperlinking is a maturing technology designed to interrelate pieces of information, using ontologies to define the relationships. With ExtraLink, we present a novel information system that integrates both technologies in order to reach at an improved level of informativeness and comfort. Extraction and link generation occur completely in the background.</Paragraph> <Paragraph position="1"> Entities identified by the IE system are mapped into a domain ontology that relates concepts to a structured selection of predefined hyperlinks, which can be directly visualized on demand using a standard web browser. This way, the user can, while reading a text, immediately link up textual information to the Internet or to any other document base without accessing a search engine.</Paragraph> <Paragraph position="2"> The quality of the link targets is much higher than with standard search engines since, first of all, only domain-specific interpretations are sought, and second, the ontology provides additional structure, including related information.</Paragraph> <Paragraph position="3"> ExtraLink uses as its IE system SProUT, a generic multilingual shallow analysis platform, which currently provides linguistic processing resources for English, German, Italian, French, Spanish, Czech, Polish, Japanese, and Chinese (Becker et al., 2002). SProUT is used for tokenization, morphological analysis, and named entity recognition in free texts. In Section 2 to 4, we describe innovative features of SProUT. Section 5 gives details about the ExtraLink demonstrator.</Paragraph> </Section> class="xml-element"></Paper>