File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1004_intro.xml

Size: 1,789 bytes

Last Modified: 2025-10-06 14:06:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1004">
  <Title>A Maximum Entropy Approach to Identifying Sentence Boundaries</Title>
  <Section position="3" start_page="0" end_page="16" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> To our knowledge, there have been few papers about identifying sentence boundaries. The most recent work will be described in (Pa.lmer and Hearst, To appear). There is also a less detailed description of Pahner and Hearst's system, SATZ, in (Pahuer and Hearst, 1994). 1 The SATZ architecture uses either a decision tree or a neural network to disambiguate sentence boundaries. The neural network achieves 98.5% accuracy on a corpus of Wall Str'eet Journal t~Ve recommend these articles for a more comprehensive review of sentence-boundary identification work than we will be able to provide here.</Paragraph>
    <Paragraph position="1">  articles using a lexicon which includes part-of-speech (POS) tag information. By increasing the quantity ol&amp;quot; 1.ra.ining data and decreasing the size of their test ,,~rlouS. Palmer and Hearst achieved performance of !)s.9% with the neural network. They obtained similar results using the decision tree. All the results we will present for our a.lgorithms are on their initial, larger test. corpus.</Paragraph>
    <Paragraph position="2"> In (Riley, 1989), Riley describes a decision-tree based approach to the problem. His performance on /he Brown corpus is 99.8%, using a model learned t'rom a corpus of 25 million words. Liberman and Church suggest in (Liberlnan and Church, 1992) that. a system could be quickly built to divide newswire text into sentences with a nearly negligible error rate. but, do not actually build such a system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML