File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1004_intro.xml
Size: 1,789 bytes
Last Modified: 2025-10-06 14:06:11
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1004"> <Title>A Maximum Entropy Approach to Identifying Sentence Boundaries</Title> <Section position="3" start_page="0" end_page="16" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> To our knowledge, there have been few papers about identifying sentence boundaries. The most recent work will be described in (Pa.lmer and Hearst, To appear). There is also a less detailed description of Pahner and Hearst's system, SATZ, in (Pahuer and Hearst, 1994). 1 The SATZ architecture uses either a decision tree or a neural network to disambiguate sentence boundaries. The neural network achieves 98.5% accuracy on a corpus of Wall Str'eet Journal t~Ve recommend these articles for a more comprehensive review of sentence-boundary identification work than we will be able to provide here.</Paragraph> <Paragraph position="1"> articles using a lexicon which includes part-of-speech (POS) tag information. By increasing the quantity ol&quot; 1.ra.ining data and decreasing the size of their test ,,~rlouS. Palmer and Hearst achieved performance of !)s.9% with the neural network. They obtained similar results using the decision tree. All the results we will present for our a.lgorithms are on their initial, larger test. corpus.</Paragraph> <Paragraph position="2"> In (Riley, 1989), Riley describes a decision-tree based approach to the problem. His performance on /he Brown corpus is 99.8%, using a model learned t'rom a corpus of 25 million words. Liberman and Church suggest in (Liberlnan and Church, 1992) that. a system could be quickly built to divide newswire text into sentences with a nearly negligible error rate. but, do not actually build such a system.</Paragraph> </Section> class="xml-element"></Paper>