File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/p99-1001_abstr.xml

Size: 1,235 bytes

Last Modified: 2025-10-06 13:49:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1001">
  <Title>Untangling Text Data Mining</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> The possibilities for data mining from large text collections are virtually untapped. Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically. Perhaps for this reason, there has been little work in text data mining to date, and most people who have talked about it have either conflated it with information access or have not made use of text directly to discover heretofore unknown information.</Paragraph>
    <Paragraph position="1"> In this paper I will first define data mining, information access, and corpus-based computational linguistics, and then discuss the relationship of these to text data mining. The intent behind these contrasts is to draw attention to exciting new kinds of problems for computational linguists. I describe examples of what I consider to be reM text data mining efforts and briefly outline recent ideas about how to pursue exploratory data analysis over text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML