File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-2147_concl.xml
Size: 2,313 bytes
Last Modified: 2025-10-06 13:52:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2147"> <Title>The Week at a Glance - Cross-language Cross-document Information Extraction and Translation</Title> <Section position="6" start_page="1009" end_page="1009" type="concl"> <SectionTitle> 5 Problems </SectionTitle> <Paragraph position="0"> There are many problems associated with a system of this degree of complexity. Many are related to the quality and coverage of the resources available for processing. Techniques, for example, for proper name recognition and classification are well known. However, good quality name recognition software is only freely available at the present for English. Using general web resources it is often difficult to discover document creation dates, an important piece of information in a system of this type.</Paragraph> <Paragraph position="1"> Co-reference resolution is not handled in tiffs system at present. This is normally achieved in current information extraction systems by allowing merging of templates fiom adjacent sentences.</Paragraph> <Paragraph position="2"> The availability of large scale onomastica (bilingual lists of proper names) is also crucial to the translation of extracted information. Work is currently underway to develop these resources for a variety of languages.</Paragraph> <Paragraph position="3"> The problem of reference in general is a more interesting one. There are currently two Boris Berezovsky(s) appearing in the news. One is a pianist, the other the Russian politician. The question is, &quot;How is it possible to let our end user appreciate which person a reference is being made to?&quot;. Perhaps some document classification system needs to be added to allow the automatic detection of document topics, which could be used to provide additional information in the interface, either for display or for filtering.</Paragraph> <Paragraph position="4"> Conclusion The current system demonstrates the feasibility of a knowledge based approach to information extraction. It appears that it is possible to generate meaningful documents flom multi-language sources, although the initial amount of elTort required to get reasonable coverage and robust perlk)mmnce is significant, particularly in the area of resource development.</Paragraph> </Section> class="xml-element"></Paper>