File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/a97-1004_concl.xml
Size: 1,449 bytes
Last Modified: 2025-10-06 13:57:47
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1004"> <Title>A Maximum Entropy Approach to Identifying Sentence Boundaries</Title> <Section position="7" start_page="18" end_page="18" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We have described an approach to identifying sentence boundaries which performs comparably to other state-of-the-art systems that require vastly luore resources. For example, Riley's performance ot~ the Brown corpus is higher than ours, but his sysl era is trained on the Brown corpus and uses thirty i.ilnes as much data as our system. Also, Pahner & Hearst's system requires POS tag information, which limits its use to those genres or languages for which there are either POS tag lexica or POS tag annotated corpora, that could be used to train automarie taggers. In comparison, our system does not require POS tags or any supporting resources beyond the sentence-boundary annotated corpus. It is theretbre easy and inexpensive to retrain this syst.em tbr different genres of text in English and text in ()tiler l:(.oma.n-a.lphabet languages. Furthermore, we showed that a small training corpus is sufficient for good performance, and we estimate that annotating enough data to achieve good performance would require only several hours of work, in comparison to the many hours required to generate POS tag and lexical probabilities.</Paragraph> </Section> class="xml-element"></Paper>