File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/p06-1144_concl.xml

Size: 1,850 bytes

Last Modified: 2025-10-06 13:55:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1144">
  <Title>arantza.casillas@ehu.es</Title>
  <Section position="6" start_page="1149" end_page="1151" type="concl">
    <SectionTitle>
5 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> We have presented a novel approach for Multilingual Document Clustering based only on cognate  named entities identification. One of the main advantages of this approach is that it does not depend on multilingual resources such as dictionaries, machine translation systems, thesaurus or gazetteers. The only requirement to fulfill is that the languages involved in the corpus have to permit the possibility of identifying cognate named entities. Another advantage of the approach is that it does not need any information about the right number of clusters. In fact, the algorithm calculates it by using the threshold values of the algorithm.</Paragraph>
    <Paragraph position="1"> We have tested this approach with a comparable corpus of news written in English and Spanish, obtaining encouraging results. We think that this approach could be particularly appropriate for news articles corpus, where named entities play an important role. Even more, when there is no previous evidence of the right number of clusters. In addition, we have compared our approach with other based on feature translation, resulting that our approach presents a slightly better performance.</Paragraph>
    <Paragraph position="2"> Future work will include the compilation of more corpora, the incorporation of machine learning techniques in order to obtain the thresholds more appropriate for different type of corpus. In addition, we will study if changing the order of the bilingual and monolingual comparison steps the performance varies significantly for different type of corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML