File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/h05-2002_concl.xml
Size: 2,116 bytes
Last Modified: 2025-10-06 13:54:32
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-2002"> <Title>Bridging the Gap between Technology and Users: Leveraging Machine Translation in a Visual Data Triage Tool</Title> <Section position="4" start_page="2" end_page="2" type="concl"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> Since this is a prototype visualization tool we have yet to conduct formal user evaluations. We have begun field testing this tool with users who report successful data triage in foreign languages with which they are not familiar. We have also begun evaluations involving parallel corpora. Using Arabic English Parallel News Text (LDC 2004), which contains over 8,000 human translated documents from various Arabic new sources, we processed the English version in IN-SPIRE to view the document clusters and their labels. We also processed the Arabic version in Arabic according to the description above. The two screenshots below demonstrate that the documents clustered in similar manners (note that cluster labels have been translated in the Arabic data).</Paragraph> <Paragraph position="1"> To demonstrate that our clustering algorithm on the native language is an efficient and reliable method for data triage on foreign language data, we also pre-translated the data with CyberTrans and clustered on the output. Figure 3, demonstrates that similar clusters arise out of this methodology. However, the processing time was increase d 15ld with no clear advantage for data triage.</Paragraph> <Paragraph position="2"> fo IX. New Orleans, Louisianna.</Paragraph> <Paragraph position="3"> Initial user reports and comparisons with a parallel corpus demonstrate that our visualization environment enables users to search through and cluster massive amounts of data without native speaker competence or dependence on a machine translation system. Users can identify clusters of potential interest with this tool and translate (by human or machine) only those documents of relevance. We have demonstrated that this visualization tool allows users to derive high va</Paragraph> </Section> class="xml-element"></Paper>