File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/p98-1076_abstr.xml
Size: 659 bytes
Last Modified: 2025-10-06 13:49:19
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1076"> <Title>One Tokenization per Source</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We report in this paper the observation of one tokenization per source. That is, the same critical fragment in different sentences from the same source almost always realize one and the same of its many possible tokenizations. This observation is demonstrated very helpful in sentence tokenization practice, and is argued to be with far-reaching implications in natural language processing.</Paragraph> </Section> class="xml-element"></Paper>