File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/p06-1144_abstr.xml
Size: 1,176 bytes
Last Modified: 2025-10-06 13:45:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1144"> <Title>arantza.casillas@ehu.es</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper presents an approach for Multilingual Document Clustering in comparable corpora. The algorithm is of heuristic nature and it uses as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. One of the main advantages of this approach is that it does not depend on bilingual or multilingual resources. However, it depends on the possibility of identifying cognate named entities between the languages used in the corpus. An additional advantage of the approach is that it does not need any information about the right number of clusters; the algorithm calculates it. We have tested this approach with a comparable corpus of news written in English and Spanish.</Paragraph> <Paragraph position="1"> In addition, we have compared the results with a system which translates selected document features. The obtained results are encouraging.</Paragraph> </Section> class="xml-element"></Paper>