File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-2002_metho.xml

Size: 3,986 bytes

Last Modified: 2025-10-06 14:09:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-2002">
  <Title>Bridging the Gap between Technology and Users: Leveraging Machine Translation in a Visual Data Triage Tool</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 IN-SPIRE
</SectionTitle>
    <Paragraph position="0"> IN-SPIRE (Hetzler et al., 2004) is a visual analytics tool developed by Pacific Northwest National Laboratory to facilitate the collection and rapid understanding of large textual corpora. IN-SPIRE generates a compiled document set from mathematical signatures for each document in a set. Document signatures are clustered according to common themes to enable information retrieval and visualizations. Information is presented to the user using several visual metaphors to expose different facets of the textual data. The central visual metaphor is a galaxy view of the corpus that allows users to intuitively interact with thousands of documents, examining them by theme.</Paragraph>
    <Paragraph position="1"> Context vectors for documents such as LSA (Deerwester et al., 1990) provide a powerful foundation for information retrieval and natural language processing techniques. IN-SPIRE leverages such representations for clustering, projection and queries-by-example (QBE). In addition to standard Boolean word queries, QBE is a process in which a user document query is converted into a mathematical signature and compared to the multi-dimensional mathematical representation of the document corpus. A spherical distance threshold adjustable by the end user controls a query result set. Using IN-SPIRE's group functionality, sub-sets of the corpus are identified for more detailed analyses. Information analysts can isolate meaningful document subsets into groups for hypothesis testing and the identification of trends. Depending on the corpus, one or more clusters may be less interesting to users. Removal of these documents, called &amp;quot;outliers&amp;quot;, enables the investigator to more clearly understand the relationships between remaining documents. These tools expose various facets of document text and document interrelationships. null</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="2" type="metho">
    <SectionTitle>
3 Foreign Language Triage Capabilities
</SectionTitle>
    <Paragraph position="0"> Information analysts need to sift through large datasets quickly and efficiently to identify relevant information for knowledge discovery. The need to sift through foreign language data complicates the task immensely. The addition of foreign language capabilities to IN-SPIRE addresses this need. We have integrated third party translators for over 40 languages and third party software for language identification. Datasets compiled with language detection allow IN-SPIRE to automatically select the most appropriate translator for each document.</Paragraph>
    <Paragraph position="1"> To triage a foreign language dataset, the system clusters the documents in their native language  (with no pre-translation required). A user can then view the cluster labels, or peak terms, in the native language, or have them translated via Systran (Senellart et al., 2003) or CyberTrans (not publicly available). The user can then explore the clusters to get a general sense of the thematic coverage of the dataset. They identify clusters relevant to their interests and the tool reclusters to show more subtle themes differentiating the remaining documents. If they search for particular words, the clusters and translated labels help them distinguish the various contexts in which those words appear. Finding a cluster of document of interest, a particular document or set of documents can be viewed and translated on demand. This avoids the need to translate the entire document set, so that only the documents of interest are translated. The native text is displayed alongside the translation at all stages.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML