File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1001_intro.xml

Size: 1,681 bytes

Last Modified: 2025-10-06 14:06:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1001">
  <Title>Untangling Text Data Mining</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The nascent field of text data mining (TDM) has the peculiar distinction of having a name and a fair amount of hype but as yet almost no practitioners. I suspect this has happened because people assume TDM is a natural extension of the slightly less nascent field of data mining (DM), also known as knowledge discovery in databases (Fayyad and Uthurusamy, 1999), and information archeology (Brachman et al., 1993). Additionally, there are some disagreements about what actually constitutes data mining. It turns out that &amp;quot;mining&amp;quot; is not a very good metaphor for what people in the field actually do. Mining implies extracting precious nuggets of ore from otherwise worthless rock.</Paragraph>
    <Paragraph position="1"> If data mining really followed this metaphor, it would mean that people were discovering new factoids within their inventory databases. However, in practice this is not really the case.</Paragraph>
    <Paragraph position="2"> Instead, data mining applications tend to be (semi)automated discovery of trends and patterns across very large datasets, usually for the purposes of decision making (Fayyad and Uthurusamy, 1999; Fayyad, 1997). Part of what I wish to argue here is that in the case of text, it can be interesting to take the mining-fornuggets metaphor seriously.</Paragraph>
    <Paragraph position="3"> The various contrasts discussed below are summarized in Table 1.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML