File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2147_intro.xml
Size: 3,143 bytes
Last Modified: 2025-10-06 14:00:53
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2147"> <Title>The Week at a Glance - Cross-language Cross-document Information Extraction and Translation</Title> <Section position="2" start_page="0" end_page="1007" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Multi-lingual inlormation extraction (MUC6), :mmmarization (AAAI 98), and cross-hmguage iiHormation retrieval (tlarntan, 1995) have over tile past three or four years become emerging technologies which are being driven in part by tile developn\]ent of tile WWW, iuld also by the general availability of machine readable texts in many languages. It appeared to the authors that an interesting application could be built which demonstrated an integrated use of these technologies in combination with a variety of machine translation techniques.</Paragraph> <Paragraph position="1"> The system we describe here has as its central component a multi-lingual information extraction engine, which uses an ontology as its main controlling element.</Paragraph> <Paragraph position="2"> Extraction based summarization produces summaries, either in tabular form, or by generating sentences using structured information derived flom texts. Sumnmries of this type are fl)cused on whatever events, the underlying extraction system handles. The summaries are inli)rmative in nature. Thai is they provide specific facts which may allow a user to gain sufficient in|ormation without reference to tile original documents. The potential applications are: producing personal profiles, assuming a series of documents on an individual are awfilable over time; tracking complex events, assuming a script is available which describes the event in terms of simpler events; and monitoring re, single event types in a data stream. This is tile application we foctls on in this paper.</Paragraph> <Paragraph position="3"> The method is partictdarly pronlising lor texts in multiple hmguages its tile structured information produced by infl)rmation extraction is relatively easy to translate. The principle drawback is that an information extraction system of this kind needs st.oh expensive resources its ontology (one for all languages) and otological lexicons (one per language). The development of tile system itself is not expensive, il we can get these resources.</Paragraph> <Paragraph position="4"> The extraction method described here is based on tile Mikrokosmos ontology (Mahesh, 1995), and uses the concepts in the ontology both to define an extraction template and to control the extraction process. At present the only infornlation used fl'om tile Mikrokosnlos lexicons, which supply language specific semantic and syntactic subcategorization information, is the mapping flom a citation form to an ontological concept.</Paragraph> <Paragraph position="5"> The complete system is composed of many pre-existing components and has been tested using two weeks of news from English, Spanish, Russian, and Japanese newspapers. We first give an overview of the steps used to generate an event based cross-document summary.</Paragraph> </Section> class="xml-element"></Paper>