Paragraph-, word-, and coherence-based approaches to sentence ranking:
A comparison of algorithm and human performance
Florian Wolf, Edward Gibson
Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences
fwolf@mit.edu, egibson@mit.edu
Abstract
Sentence ranking is a crucial part of generating text
summaries. We compared human sentence rankings
obtained in a psycholinguistic experiment to three
different approaches to sentence ranking: A simple
paragraph-based approach intended as a baseline,
two word-based approaches, and two coherence-
based approaches. In the paragraph-based ap-
proach, sentences in the beginning of paragraphs
received higher importance ratings than other sen-
tences. The word-based approaches determined
sentence rankings based on relative word frequen-
cies (Luhn (1958); Salton & Buckley (1988)).
Coherence-based approaches determined sentence
rankings based on some property of the coher-
ence structure of a text (Marcu (2000); Page et
al. (1998)). Our results suggest poor perfor-
mance for the simple paragraph-based approach,
whereas word-based approaches perform remark-
ably well. The best performance was achieved
by a coherence-based approach where coherence
structures are represented in a non-tree structure.
Most approaches also outperformed the commer-
cially available MSWord summarizer.
