File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/69/c69-0401_abstr.xml
Size: 4,323 bytes
Last Modified: 2025-10-06 13:45:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C69-0401"> <Title>Automatic Processing of Foreign Language Documents</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Experiments conducted over the last few years with the SMART document retrieval system have shown that fully automatic text processing methods using relatively simple linguistic tools are as effective for purposes of document indexing, classification, search, and retrieval as the more elaborate manual methods normally used in practice. Up to now, all experiments were carried out entirely with English language queries and documents. null The present study describes an extension of the SMAKT procedures to German language materials. A multi-lingual thesaurus is used for the analysis of documents and search requests, and tools are provided which make it possible to process English language documents against German queries, and vice versa. The methods are evaluated, and it is shown that the effectiveness of the mixed language processing is approximately equivalent to that of the standard process operating within a single language only.</Paragraph> <Paragraph position="1"> i. Introduction For some years, experiments have been under way to test the effectiveness of automatic language analysis and indexing methods in information retrieval, Specifically, document and query texts are processed fully automatically, and content identifiers are assigned using a variety of linguistic ~Department of Computer Science, Cornell University, Ithaca, N. Y. 14850.</Paragraph> <Paragraph position="2"> This study was supported in part by the National Science Foundation under grant GN-750.</Paragraph> <Paragraph position="3"> -2tools, including word stem analysis, thesaurus look-up, phrase recognition, statistical term association~ syntactic analysis, and so on. The resulting concept identifiers assigned to each document and search request are then matched, and the documents whose identifiers are sufficiently close to the queries are retrieved for the user's attention.</Paragraph> <Paragraph position="4"> The automatic analysis methods can be made to operate in real-time -while the customer waits for an answer _ by restricting the query-document comparisons to only certain document classes, and interactive user-controlled search methods can be implemented which adjust the search request during the search in such a way that more useful, and less useless, material is retrieved from the file.</Paragraph> <Paragraph position="5"> The experimental evidence accumulated over the last few years indicates that retrieval systems based on automatic text processing methods -including fully automatic content analysis as well as automatic document classification and retrieval -- are not in general inferior in retrieval effectiveness to conventional systems based on human indexing and human query formulation.</Paragraph> <Paragraph position="6"> One of the major objections to the praetical utilization of the automatic text processing methods has been the inability automatically to handle foreign language texts of the kind normally stored in documentation and library systems. Recent experiments performed with document abstracts and search requests in French and German appear to indicate that these objections may be groundless.</Paragraph> <Paragraph position="7"> In the present study~ the SMART documsnt retrieval system is used to carry out experlments using as input foreign language documents and queries. The foreign language texts are automatically processed using a -3thesaurus (synonym dictionary) translated directly from a previously available English version. Foreign language query and document texts are looked-up in the foreign language thesaurus and the analyzed forms of the queries and documents are then compared in the standard manner before retrieving the highly matching items. The language analysis methods incorporated into the SMART system are first briefly reviewed. Thereafter, the main procedures used to process the foreign language documents are described, and the retrieval effectiveness of the English text processing methods is compared with that of the foreign language material.</Paragraph> </Section> class="xml-element"></Paper>