File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1005_concl.xml

Size: 1,337 bytes

Last Modified: 2025-10-06 13:53:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1005">
  <Title>Scaling to Very Very Large Corpora for Natural Language Disambiguation</Title>
  <Section position="7" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper, we have looked into what happens when we begin to take advantage of the large amounts of text that are now readily available.</Paragraph>
    <Paragraph position="1"> We have shown that for a prototypical natural language classification task, the performance of learners can benefit significantly from much larger training sets. We have also shown that both active learning and unsupervised learning can be used to attain at least some of the advantage that comes with additional training data, while minimizing the cost of additional human annotation. We propose that a logical next step for the research community would be to direct efforts towards increasing the size of annotated training collections, while deemphasizing the focus on comparing different learning techniques trained only on small training corpora. While it is encouraging that there is a vast amount of on-line text, much work remains to be done if we are to learn how best to exploit this resource to improve natural language processing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML