File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1138_concl.xml
Size: 2,902 bytes
Last Modified: 2025-10-06 13:53:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1138"> <Title>Multilingual and cross-lingual news topic tracking</Title> <Section position="6" start_page="2" end_page="2" type="concl"> <SectionTitle> 7 Conclusion and future work </SectionTitle> <Paragraph position="0"> We have shown that our system can rather accurately identify clusters of major news per day in five languages and that it can link these clusters to related news over time (topic tracking). The most interesting and novel feature of the system is, however, that it can also identify related news across languages, without translating articles or using bi-lingual dictionaries. This cross-lingual cluster similarity is achieved by a combination of three feature sets, which currently have an impact of 50%, 30% and 20%, respectively: the main feature set is the mapping onto the multilingual classification scheme Eurovoc; the others are the countries referred to in the articles (direct mention of the country, or of a smaller place name of that country) and the cognates (same strings used in the articles across languages, i.e. mainly named entities). The evaluation has shown that the results are good, but that the cross-lingual linking performs less well than the monolingual historical linking of related news clusters. Users felt that the system performs well enough for it to go online soon, for usage by a large user community of several thousand people.</Paragraph> <Paragraph position="1"> Improvements to the system will nevertheless be sought.</Paragraph> <Paragraph position="2"> Future work will include testing different settings concerning the relative impact of the three components, as well as detecting and using more named entities such as absolute and relative date expressions, proper names, etc. A further aim is to extend the system to another six languages.</Paragraph> <Paragraph position="3"> The usage of cognate similarity could be improved. Currently it will not work with Greek, for instance, except for a few proper names. We would therefore like to experiment with multi-lingual stemming methods to exploit the existence of similar words across languages such as English elephant, French elephant, Spanish and Italian elefante and German Elefant.</Paragraph> <Paragraph position="4"> Several customer groups requested an advanced news analysis that distinguishes between articles about concrete events and articles commenting about these events. We will explore this issue, but it is very likely that this distinction will require a syntactic analysis of the news and cannot be made with our bag-of-words approach.</Paragraph> <Paragraph position="5"> Finally, we intend to work on breaking news detection, i.e. detecting new events, as opposed to detecting major news. This work will require working on smaller time windows than the current 24-hour window.</Paragraph> </Section> class="xml-element"></Paper>