Texplore - exploring expository texts via hierarchical 
representation 
Yaakov Yaari 
Bar Ilau University, Ramat Gan 
Abstract 
Exploring expository texts presents an interest- 
ing and important challenge. They are read 
routinely and extensively in the form of online 
newspapers, web-based articles, reports, techni- 
cal and academic papers. We present a system, 
called Texplore, which assists readers in explor- 
ing the content of expository texts. The system 
provides two mechanisms for text exploration, 
an expandable outline that represents the hi- 
erarchical structure of the text, and a concept 
index, hot-linked to the concept references in 
the text. The hierarchical structure is discov- 
ered using lexical cohesion methods combined 
with hierarchical agglomerative clustering. The 
list of concepts are discovered by n-gram analy- 
sis filtered by part-of-speech patterns. Rather 
than the common presentation of documents 
by static abstracts, Texplore provides dynamic 
presentation of the text's content, where the 
user controls the level of detail.q. 
1 Introduction 
Ever-faster computers, the Internet together 
with large information repositories linked by 
high-speed networks, are combined to provide 
immediate accessibility to large amounts of 
texts. The urgency of exploring these texts 
varies depending on the consumer - students, 
researchers, professionals, decision makers, or 
just anybody. In any case the amounts of texts 
are beyond our ability to digest them. 
Research in information retrieval (IR) has 
been focused until now on the task of present- 
ing relevant documents to the user. Commercial 
tools followed suit, as evident by the many pow- 
erful search engines now available on the Web. 
Typically, the relevant doo,ments are presented 
by some automatically computed abstract. 
Our work focuses on medium size and longer 
documents where the user needs some further 
assistance in exploring the content of the re- 
trieved document. The idea then is to extend 
the applicability of IR methods, beyond the doc- 
ument retrieval, to the task of a reading assis- 
tant. 
We might expect from a reading assistant, or 
a text exploration tool, to provide us with two 
basic capabilities: 
1. A controlled view of the content of the doc- 
ument. 
2. The list of concepts discussed in the text. 
The first capability might be seen as an "elec- 
tronic" table-of-contents, and is the key vehicle 
for efficient text exploration. The second can 
be seen as an "electronic" index, and provides 
an orthogonal access vehicle to the mechanism 
of a table-of-contents. 
We have implemented such a text exploration 
system in Texplore. It is designed for expository 
texts such as informative articles in magazines, 
technical reports and scientific papers. In Sec- 
tion 2 we discuss some characteristics of expos- 
itory texts that make possible the development 
of the above text exploration capabilities. In 
Section 3 we focus on the importance of hier- 
archical representation of texts as a visualiza- 
tion tool. Section 4 details the Texplore system 
itself, ag~dn focusing on the hierarchical con- 
tent representation. We conclude by discussing 
some shortcoming of the system and p\]an.q for 
improvements. 
2 Some characteristics of expository 
texts 
The following subsections consider the linguistic 
evidence that is needed to develop a hierarchi- 
cal representation of expository texts. The next 
25 
+ 
Succession 
Succession 
-Projected 
Narrative 
1. First/Third person 
2. Agent oriented 
3. Accomplished time 
4. Chronological time i 
F~pository 
1. No necessary reference 
2. Subject matter oriented 
3. Time not focal 
4. Logical n~kase 
+Projected 
Procedural 
I. Non-specific person 
2. Patient oriented 
3. Projected time 
4. Chron. linkage 
Horatory 
1. Second person 
2. Addressee oriented 
3. Mode, not time 
4. Logical linlmge 
Table 1: Discourse types 
subsection discusses expository text in general 
and its relation to other discourse types. H.ier- 
archical structure in discourse is discussed next, 
with the paragraph as its basic unit. The final 
subsection considers lexical cohesion as the ba- 
sic technique for identifying structure. 
2.1 Expository text and other 
discourse types 
In order to understand the particular domain 
of expository text is important to see it in 
the larger context of other possible discourse 
types. Longacre (1976) presents a 2 x 2 model of 
four discourse types, Narrative, Procedural (In- 
structive), Expository, and Horatory (Sermon), 
shown in Table 1. 
Expository text is seen to be less modal since 
its discourse is determined by its subject, and 
the logical structure built in its exposition, 
rather than by who the speaker is, the audi- 
ence or the temporal order of the speech acts. 
This is not to say that two authors are expected 
to produce the same text on the same subject. 
Personal style is a factor here as in any human 
writing. However, we can take advantage of the 
modeless character of expository text when cre- 
ating a representation of its content. If the dis- 
course relations between two segments can be 
assumed to be modeless we can expect these re- 
lations to be manifested, to a large extent, in 
their lexical context. In other words, we can 
expect the robust techniques of information re- 
trieval to be useful for identifying the informa- 
tion structure of expository texts. 
2.2 The paragraph unit in hierarchical 
discourse structure 
Hierarchical structure is present in all types of 
discourse. Authors organi~.e their works as trilo- 
gies of books, as chapters in books, sections in 
chapters, then subsections, subsubsections, etc. 
This is true for an instruction manual, the Bible, 
for The Hitchhiker Guide to the Galaxy, War 
and Peace, and, in a completely different cate- 
gory, this humble paper. 
Previous research shows that this hierar- 
chical structure is not just an author's style 
but is inherent in many language phenomena. 
A number of rhetoric structure theories have 
been proposed (Meyer and Rice, 1982; Mann 
and Thompson, 1987) which recognize distinct 
rhetorical structures like problem-solution and 
cause-effect. Applying this model recursively 
forms a hierarchical structure over the text. 
From the cognitive aspect, Giora (1985) pro.. 
poses a hierarchical categorial structure where 
the discourse topic functions as a prototype in 
the cognitive representation of the nnlt, i.e. a 
minimal gener~llzation of the propositions in 
the unit. FFinally, the hierarchical intention 
structure, proposed for a more general, multi- 
ple participants discourse, is a key part of the 
well-accepted discourse theory of Grosz and Sido 
net (1986). 
Hierarchical structure implies some kind of 
basic unit. Many researches (Longacre, 1979; 
Hinds, 1979; Kieras, 1982) have shown that the 
paragraph is a basic unit of coherency, and that 
it functions very slmilarly in many languages of 
vastly different origin (Chafe, 1979). 
Not only the paragraph is a basic unit of 
coherency, its initial position, the first one or 
two sentences of the paragraph, provides key 
information for identifying the discourse topics 
(Yaari et al., ). Again, as Chafe shows, this 
is true for many varied languages. The initial 
position of a paragraph is thus a key heuris- 
tic for general purpose document summariza- 
tion (Paice, 1990). 
2.3 Cohesion 
Lexical cohesion is the most common linguis- 
tic mechanism used for discourse segmenta- 
tion (Hearst, 1997; Yaari, 1997). The basic 
notion comes from the work of Halliday and 
Hasan (1976) and further developed in (HaUl- 
day, 1994). 
Cohesion is defined as the non-structural 
mechanlam by which discourse units of differ- 
ent sizes can be connected across gaps of any 
texts. One of the forms of cohesion is lexical 
cohesion. In this type, cohesion is achieved by 
choosing words that are related in some way - 
26 
lexicaUy, semantically or collocationally. Lexi- 
cal cohesion is important for a practical reason 
- it is relatively easy to identify it computation- 
ally. It is also important for linguistic reasons 
since, unlike other forms of cohesion, this form 
is active over large extents of text. 
3 Hierarchical representation of text 
In the previous section the hierarchical struc- 
ture of a text was established as an inherent 
linguistic phenomena. We have also identified 
linguistic evidence that can be used to uncover 
this structure. 
In this section we focus on the human- 
manhine interaction aspects of this form of rep- 
resentation. From this point of view, hierarchi- 
cal representation answers two kinds of prob- 
lems: how to navigate in free text, and how 
to effectively communicate the content of the 
document to the user. These two issues are dis- 
cussed in the following subsections. 
3.1 Navigating in free text 
The basic approach for free, 1restructured, text 
navigation (and the basis for the whole internet 
explosion} is the hypertext method. Navigation 
follows what may be called a stream of associ- 
ations. At any point in the text the user may 
hyper-jump to one out of a set of available desti- 
nation sites, each determined by its association 
with narrow context around the link anchor. In 
spite of their popularity, the arbitrary hyper- 
jumps create a serious drawback by losing the 
global context. Having lost the global context, 
the navigator is destined to wander aimlessly 
in maze of pages, wasting time and forgetting 
what he/she was looking for in the first place. 
The use of a static ticker frame that allows an 
immediate deliverance from this maze (typically 
placed on the left part of the browser's window) 
is a recognition of this drawback. 
Once NLP methods are applied on the text 
document, more sophisticated methods become 
possible for navigating in unstructured text. An 
important example is the use of lexical cohe- 
sion, implemented by measuring distance be- 
tween term vectors, to decompose the text to 
themes (Salton et al., 1995). Themes are de- 
fined as a set of paragraphs, not necessarily ad- 
jacent, that have strong mutual cohesion be- 
tween them. Navigation through such theme- 
linked paragraphs is a step forward in effective 
text exploration. However, the user navigates 
within the context of a single theme and still 
loses the overall context of the full text. Be- 
cause there is only one hierarchy here, the user 
has to go through a selected theme to its end 
to find out whether it provides the sought in- 
formation. 
The an.~wer proposed in Texplore is to dis- 
cover and present the user with a hierarchical 
representation of the text. Hierarchical struc- 
ture is oriented specifically to present complex 
information. Authors use it explicitly to orga- 
nize large works. Scientists use it to describe 
complex flora and fauna. Manual writers use it 
to describe complex procedures. Our task here 
is somewhat different. We are presented with a 
given unstructured text and want to uncover in 
it some latent hierarchical structure. We claim 
that in so far as the text is coherent, that is, it 
makes sense, there is some structure in it. The 
more coherent the text, the more structure it 
has. 
Combining the capabilities of hypertext and 
hierarchical representation is particularly at- 
tractive. Together they provide two advantages 
not found in other access methods: 
. 
. 
Immediate access to the sought piece of in- 
formation, or quick dismissal if none ex- 
ists. In computer memory jargon we call 
this random access. This is the ability to 
access the required information in a small 
number of steps (bound by the maximum 
depth of the hierarchy). 
User control over the level of details. Most 
navigation tools provide the text as is so 
the user has to scan at the maximum level 
of details at all times. However, for exposi- 
tory texts beyond a couple of pages in size, 
the user needs the ability to skim quickly 
over most of the text and go deeper only 
at few points. There is a need, then, to 
have good interactive control over the level 
of details presented. 
3.2 Communicating document's 
content 
Document summarization systems today are 
concerned with extracting significant, indicative 
sentences or clauses, and combining them as a 
more-or-less coherent abstract. 
27 
This static abstract should answer the ques- 
tion "what the text is about". However, be- 
cause of the underlying technique of sentence 
extraction and its static nature, the answer is 
too elaborated in some of the details and insuf- 
ficient in others. 
Boguraev et al. (1998) discuss extensively this 
drawback of today's summarizers and conclude 
that good content representation requires two 
basic features: (a) presenting the summary ex- 
tracts in their conte.~, and (b) user control 
over the granularity. Their solution is based 
on identifying primitive clauses, called capsules, 
resolving their anaphoric references and provid- 
ing them, through a user interface, at different 
granularities. 
The expandable outline view of Texplore, 
built upon hierarchical representation of the 
text's contents, nicely meets the requirements 
of context and granularity, though the underly- 
ing NLP technology is completely different. In 
the next section we discuss the Texplore system 
in details, the supporting NLP tools as well as 
the front-end visualization system. 
4 Texplore - system description 
The overall data-flow in TexpIore is shown in 
Figure i. It starts with a preprocessing stage, 
a structure and heading analyses, leading to ex- 
pandable outline display. 
A typical screen of Texplore is shown in Fig- 
ure 2 I. It consists of three parts. The original 
text is shown on the right pane, the expandable 
outline on the upper left and the concept index 
on the lower left pane. 
The following subsections describe the differ- 
ent parts of the system, focusing on the visual- 
ization aspects related to content presentation. 
4.1 NLP preprocessing 
The first two preprocessing steps, sentence anal- 
ysis part-of-speech (POS) analysis, are pretty 
standard. The result is a list of POS-tagged 
sentences, grouped in paragraphs. 
In the N-gram analysis, a repeated scan is 
made at each i'th stage, looking for pairs of con- 
secutive candidates from the previous stage. We 
1The text in this screen, as well as in the other screen 
captures, is from the article stargazer8 (Hearst, 1997). It 
deals with the possibility of life on other planets in view 
of the unique combination of earth and its moon 
text file 
H~ding ~wat~ leaioas ~tan~a~ J 
Figure 1: Data flow in Texplore 
filter each stage using mutual information mea- 
sure so the complexity is practically O(N). Fi- 
nally we remove those N-grams whose instances 
are a proper subset of some longer N-grams, and 
then apply part-of-speech filter on the remain- 
ing candidates leaving only noun compounds. 
That last step was found to be extremely use- 
ful, reducing false N-grams to practically nil. 
4.2 Hierarchical structure segmentation 
The core of the system is the hierarchical 
structure segmentation. The method used for 
segmentation, called hierarchical agglomerative 
clustering (HAC), was described in detail by 
Yaari (1997). In HAC the structure is discov- 
ered by repeatedly merging the closest data el- 
ements into a new element. In our application 
we use paragraphs as the elementary segments 
of discourse and apply a lexical cohesion mea- 
sure as the proximity test. The lexical cohesion 
measure, Proximity(si, 8i÷1) , iS adapted from 
the standard Saltoniau term vector distance. It 
computes the cosine between two successive seg- 
ments, si and si+~. 
n tBk,i " Wk,i+ l 
Pro imi U(s ,si+ ) = \[18 l\[ (1) 
k=l 
Here Wk,i is the weight of the k'th term of si, 
and \]lsiH is the length of the vector. The as- 
28 
z.z.z. \]Pzme¢l 
a~v~,ceO l~ 5073 3 
binzm\] ILr~l M~r~t syl4etn~t 8.1 ~,4 4 
dW lava laws 8.2~,4 2 
~vmg toings 0.265 
:luflar samples 9073 2 
IUn ar sudaCe 74~8 2 
islr~ie star 5.787 4 
single star system! 8.5~7 2 
;elax system 0.751 'IZ 
;tar cluster 5C50 2 
~narV IW¢~mS S?~l 5 
~0 ~r s.i~le ¢lar" ¢Glar sy~er~ has Flit bear~ ~L,Ind. W~dle astr~nomars 
|ari4r~ly ~ll~ree tlAt planot| moil pro~.ld~ly form around 8fl stets ~ ~ei~ bitl~, 
~lal~e'~ ©~ Cl~slers ~ ~ Or ~"lree ~ ~Id have ~qzp\] ch~lJc OZl~s. 
Computer ~o~els oe$1~eO ~rom ~ masl a4v~u~ce~ ;sl~'ophys~c~ 4abe 
d~ula~e toe orbits Of planMs ~n blnaej arm ~r~'y syslems so ~ SCM~US~S carl 
: IlJC tJM.~e ~.te ir5'e ct toIS WOUld hive ~ ~e~' i¥OlUIIOn. "l'tle I~l~lsU~Ofls ~veld 
llgflly efli~cal pales wffich take cee~des to mez~.e one revol~on of lee ~zr 
:lUSl~f. "11~S iS ~ecause ofle or ot~le~' OfY~e ~s ~Ine CerC~e ~ftoe c~@lex iS 
~lin 9 an~J alggtm~ a~ IRe plarmts, ci~sTc~g ~eit ~a,r~s. Of~ly wflea ~ey dq~ 
:~se to one Or otoer of toe pam~ sle~ v411N plau~e~s I~k In INs- g~lng 
leery. 
~M even toIl~ "~,/~ll be shoe-lived. Olst~foances Sei gp by ~le C o-~otll~ O 
dilU'~ zd~egt orl~ SO violerzl~y tom plane~ eno up ©elng pulled i~y grav~,, h'lto 
Figure 2: Texplore screen. The original text is shown on the right, the expandable outline in the 
upper left and the concept index on the lower left. The outline is shown collapsed except for one 
section 
sumption that only adjacent segments are com- 
pared is not necessarily the case, see (Salton 
et el., 1995). However, it allows us to cre- 
ate the more conventional 2-D structure of a 
table-of-contents, instead of the arbitrary graph 
structure that would have been formed other- 
wise. Another modification is the way the term 
weights, wk,~, are determined. We found that 
having the weight proportional to the term's 
IDF (Inverse Document Frequency, measuring 
its general significance) and the position of the 
sentence in the paragraph, improves the qual- 
ity of the proximity measure, by giving higher 
weight to terms with higher relevance to inter- 
segment cohesion. 
The result is shown in Figure 3. Inter- 
segment boundaries are set at points where lex- 
ical cohesion f~lls below some specific thresh- 
old. The resulting nesting could be quite deep 
(in this example there are 10 levels). H-man 
authors, however, rarely use a hierarchy depth 
greater than 3 (except possibly in instructional 
discourse). The rather deep nesting is then 
smoothed, between the detected boundaries, to 
fit human convenience, as seen in Figure 4. This 
smoothed structure is superimposed over the 
original text, producing the expandable outline 
shown in the left pane of Figure 2. 
i!!iiiiiiiiii ~i!iiii~i!~i~ii~!~ii!~i~iii~i~i~iiii}i~i!i~i}iii~i!!!~!~i!~iiii~i~iii~!~q~!~!i~ii~iii~!i !iiii~iiiiiii!iii!iiii!}ii~}ii~i~i~i~i~i~i~!i~i~i~i~i~i~i~!i!i 
ii ir b ' lily 
Figure 3: Result of hierarchical segmentation. 
Paragraphs are along the X-axis. Y-axis indi- 
cates prox;rnlty. Higher outline implies closer, 
and thus deeper-nested, adjoin;ng segments. 
Vertical lines indicate inter-segment boundaries. 
The hierarchical structure thus discovered is 
certainly not the only one possible. However, 
experiments with human judges (Hearst, 1997) 
showed that segmentation based on lexical cohe- 
sion is quite accurate compared to manual ones. 
4.3 Heading generation 
The next step, after the hierarchical structure 
of the text is determined, is to compose head- 
;ngs for the identified sections. Figure 5 shows 
the outline pane representing the hierarchical 
structure. 
The generated headings are, at the moment, 
29 
Figure 4: The hierarchical structure in Figure 3 
after smoothing 
=.x.s.A ~,~ affit~m,m i} 
2.1.1.1. l.~mmm ~ ~ ii 1.1.1.I. Rdlect~s mi 
1,1.1.2.1. A~ ~temld ml dkatm mndds 
2.1.L2J. La~m ~ 
2.1=. T~ ~ iii 
2.9.. St~ i~! 
2.2.1. The Hdbble md.&e eartt~ i!/ 
Because of the Improbabte sequence that began when a 
giant asteroid stuck the earkh to form the moon more than ~i~ 
4-billion years ago, scientists say the history off the earth li 
should not be thought of as a model for life elsewhere. It Is a ~:ii 
billion to one chance, they say, that the earth should have i~ 
received just the right blow to set in mo0on a train of events 
that led to the emergence and rapid development of living i~i 
lhings, But suppose that biilion to one chance was repeated ii~ 
~reughout be universe? There could still be 200 clvtllsattons iii 
in our galaxy alone. But most researchers do not think this is iii 
the case, and ~e Hubble 'linalty may put that ~eory to rest ~ii 
2~.2. lqmet~ ~ 
Figure 5: The text' outline representing the its 
hierarchical structure. 
quite shnp\]ified. The saliency of noun com- 
pounds (NPs) is scored for each section by 
their frequency, mutual information (for N- 
grams with N > 1), and position in the section. 
A higher score is ~ven to NPs in initial posi- 
tion, that is, the first one or two sentences in 
the paragraph. 
The syntax of headings follows standard con- 
ventions for human-composed headings. The 
most common case is the shnple NP. Another 
common case is a coordination of two NPs. 
With these guidelines we came up with the foL 
lowing heuristics to compose headings: 
1. Remove from the list any NP that appears 
in an enclosing section. 
2. If the saliency of the first NP in the list is 
much higher than the second, use this NP 
by itself. Otherwise, create a coordination 
of the first two NPs. 
3. Prefix each NP in the heading with the de- 
terminer that first appeared with that NP, 
if any. This rule is not very successful and 
rJiIl be modified in the future. 
The first rule exemplifies the power of this 
kind of content representation. Once an NP ap- 
pears in a heading, it is hnplied in the headings 
of all enclosed sections and thus should not ap- 
pear there explicitly. For example, in Figure 5 
the NP Moon appears in the heading of section 
2. Without the first rule it would appear in a 
few of the enclosed subsections because of its 
saliency. We, as readers, would see this as re- 
dundant information. 
4.4 Expandable outline display 
Figure 5 also ~ustrates the importance of con- 
text and granularity, mentioned earlier as key 
points in dynamic content presentation. The 
out\]me functions as a dynamic abstract with 
user control over the level of granularity. The 
user is able to read section 2.2.1, The Hubble 
and the earth in its full context. He/she is 
not lost anymore. 
In fact, the two panes with the original text 
and its outline are synchronized so that the out- 
line can act as a ticker-only pane viewing the 
text on the larger right pane, or be used as a 
combined text and heading pane. 
The outline pane also supports the standard 
controUed-aperture metaphor. Double-clicking 
on a heading alternately expands and collapses 
the underlying text. The user can thus easily 
increase the granularity of the outline to see fur- 
ther detail, or close the aperture to see only the 
general topics. 
The heading acts cognitively as the surrogate 
of its text. Thus if a heaAing of a collapsed 
section is selected, the full text corresponding 
to this heading is selected on the right. This 
strengthens the concept of the out\]me as a true, 
thought compact, representation of the original 
text. Figure 6 shows the same out\]me, this time 
with reduced granularity, highlighting the cor- 
respondence between a selected heading and its 
text. The concept index, shown in the lower 
left pane of the window, is discussed in the next 
section. 
30 
Figure 6: Collapsed outline showing correspon- 
dence between a heading and its underlying 
text. 
4.5 Concept index 
The N-gram analysis, combined with part-of- 
speech filtering, identifies a set of noun com- 
pounds that are used as a concept index for 
the text. They are termed concepts because 
the information they carry reveals a lot about 
the text, much more than simple one word 
nouns. Consider, for example, the first three N- 
grams: lunar samples, living things, and 
dry lava lakes. In contrast, the composh~ 
words of each N-gram, e.g. lava, lake, living, 
or things, reveal very little. 
The high information content of the concept 
index makes it a very concise representation of 
what the text is about, though certainly sec- 
ondary to the outline. Also, having these "con- 
cepts" hot-linked to their references in the text 
forms a hot-llnk index of key topics of the text. 
5 Conclusions and future plans 
We propose a new approach for dynamic presen- 
tation of the content of expository text based on 
uncovering and vis~aliT.ing its hierarchical struc- 
ture. Using this "electronic" table-of-contents 
the user has the advautage of exploring the text 
while staying within the full context of the ex- 
ploration path. The user has also full control 
over the granularity of the displayed informa- 
tion. These characteristics are beneficial both 
for navigating in the text as well as communi- 
catiug its content, while overcoming drawbacks 
of existing snmmarization methods. 
The weakest point in Texplore is the genera- 
tion of headings. The current approach is too 
shnplistic, both in the criteria for selecting NPs 
and in the way they are composed to headings. 
We have analyzed the way headings are formed 
by human authors (Yaari et al., ) and the re- 
sults were used to form a machlne-learnlng sys- 
tem which identifies the NPs of a given section 
using multiple sources of information. The sys- 
tem constructs headings for the text hierarchy 
using a fixed set of syntactic formats (found to 
be common in heading syntax). We are in the 
process of integrating this system into Texplore. 
The hierarchical structure segmentation is 
also too simplistic, based solely on the prox- 
imity of term vectors. Again, we are working 
on a machine learning system that uses a set of 
structured articles to learn segmentation rules. 
The basic approach is to divide the task into two 
steps, determining the boundaries and forming 
the hierarchy. We are using various cohesion 
cues, associated with each paragraph, as the 
learning attributes: lexical similarity, cue tags, 
cue words, number of starting and continuing 
lexical chains, etc. 
Using machine lemming has the advantage of 
a built-in evaluation against the segmentation 
done by human subjects. We also plan to eval- 
uate the usefulness of the hierarchical presenta- 
tion in terms of reading effectiveness. 

References 
B. Boguraev, Y.Y. Wong, C. Kennedy, R. Bel- 
lamy, S. Brawer, and J. Swartz. 1998. Dy- 
namic presentation of document content for 
rapid on-line skimming. To be published in 
the Spring 1998 AAAI Symposium on InteUio 
gent Text Summarization. 
W.L. Chafe. 1979. The flow of thought and the 
flow of language. In T. Giv'on, editor, Syntax 
and Semantics: Discourse and Syntaz, vol- 
ume 12, pages 159--182. Academic Press. 
R. Giora. 1985. A text based analysis of 
non-narrative texts. Theoretical Linguistics,, 
12:115-135. 
B.J. Grosz and C.L. Sidner. 1986. Attention, 
intentions aud the structure of discourse. 
Computational Linguistics,, 12(3):175--204. 
M.A.K. HMliday and R. Hasau. 1976. Cohesion 
in English. New York: Longman Group. 
M.A.K. Halliday. 1994. Introduction to Func- 
tional Grammar, second edition. London: 
Edward Araold. 
M. A. Hearst. 1997. Texttiling: Seganent- 
ing text into multi-paragraph subtopic pas- 
sages. Computational Linguistics, 23(1):33- 
64, March. 
J. Hinds. 1979. Organizational patterns in dis- 
course. In T. Giv'on, editor, Syntax and Se- 
mantics:: Discourse and Syntax, volume 12, 
pages 135-158. Academic Press. 
D.E. Kieras. 1982. A model of reader strategy 
for abstracting main ideas from simple tech- 
nical prose. Text, 2(1-3):47-81. 
R. Longacre. 1976. An anatomy of Speech No- 
tions. Peter de Ridder Press, Lisse. 
R.E. Longacre. 1979. The paragraph as a gram- 
matical unit. In T. Giv'on, editor, Syntax 
and Semantics: Discourse and Syntax,, vol- 
ume 12, pages 115-134. 
W.C. Mann and S.A. Thompson. 1987. Rhetor- 
ical structure theory: A theory of text organi- 
zation. Technical Report ISI/RS-87-190, ISI. 
B.J.F. Meyer and G.E. Rice. 1982. The interac- 
tion of reader strategies and the organization 
of text. Text, 2(1-3):155-192. 
C.D. Paice. 1990. Constructing literature 
abstracts by computer : techniques and 
prospects. Information Processing and Man- 
agement,, 26(1):171-186. 
G. Salton, A. Singhal, C. Buckley, and M. Mi- 
tra. 1995. Automatic text decomposition us- 
ing text segments and text themes. Tech- 
nical Report TR95-1555, Cornell Unievrsity, 
November. 
Y. Yaari, Y. Choueka, and M. Elhadad. Analy- 
sis of h~dings and sections' structure in ex- 
pository texts. Not yet published. 
Y. YaarL 1997. Segmentation of expository 
texts by hierarchical agglomerative cluster- 
ing. In Proceedings o\] RANLP, pages 59-65, 
Tzigov Chark, Bulgaria. 
