File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1046_concl.xml

Size: 1,673 bytes

Last Modified: 2025-10-06 13:54:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1046">
  <Title>Unsupervised Learning of Field Segmentation Models for Information Extraction</Title>
  <Section position="8" start_page="377" end_page="377" type="concl">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this work, we have examined the task of learning field segmentation models using unsupervised learning. In two different domains, classified advertisements and bibliographic citations, we showed that by constraining the model class we were able to restrict the search space of EM to models of interest. We used unsupervised learning methods with 400 documents to yield field segmentation models of a similar quality to those learned using supervised learning with 50 documents. We demonstrated that further refinements of the model structure, including hierarchical mixture emission models and boundary models, produce additional increases in accuracy.</Paragraph>
    <Paragraph position="1"> Finally, we also showed that semi-supervised methods with a modest amount of labeled data can sometimes be effectively used to get similar good results, depending on the nature of the problem.</Paragraph>
    <Paragraph position="2"> While there are enough resources for the citation task that much better numbers than ours can be and have been obtained (with more knowledge and resource intensive methods), in domains like classified ads for lost pets or used bicycles unsupervised learning may be the only practical option. In these cases, we find it heartening that the present systems do as well as they do, even without field-specific prior knowledge.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML