File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/c02-1063_abstr.xml
Size: 1,001 bytes
Last Modified: 2025-10-06 13:42:19
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1063"> <Title>Hierarchical Orderings of Textual Units</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Text representation is a central task for any approach to automatic learning from texts. It requires a format which allows to interrelate texts even if they do not share content words, but deal with similar topics. Furthermore, measuring text similarities raises the question of how to organize the resulting clusters. This paper presents cohesion trees (CT) as a data structure for the perspective, hierarchical organization of text corpora. CTs operate on alternative text representation models taking lexical organization, quantitative text characteristics, and text structure into account. It is shown that CTs realize text linkages which are lexically more homogeneous than those produced by minimal spanning trees.</Paragraph> </Section> class="xml-element"></Paper>