Meta-discourse markers and problem-structuring in scientific articles 
Simone Teufel 
Centre for Cognitive Science 
University of Edinburgh 
S. Teal elQed, ac. uk 
Abstract 
Knowledge about the argumentative structure of sci- 
entific articles can, amongst other things, be used to 
improve automatic abstracts. We argue that the argu- 
mentative structure of scientific discourse can be auto- 
matically detected because reasordng about problems, 
research tasks and solutions follows predictable pat- 
terns. 
Certain phrases explicitly mark the rhetorical status 
(communicative function) of sentences with respect to 
the global argumentative goal. Examples for such 
meta-diacaurse markers are "in this paper, we have 
presented..." or "however, their method fails to". 
We report on work in progress about recognizing such 
meta-comments automatically in research articles from 
two disciplines: computational linguistics and medicine 
(cardiology). 
1 Motivation 
We are interested in a formal description of the docu- 
ment structure of scientific articles from different dis- 
ciplines. Such a description could be of practical use 
for many applications in document management; our 
specific motivation for detecting document structure is 
quality improvement in automatic abstracting. 
Researchem in the field of automatic abstracting 
largely agree that it is currently not technically fea- 
sible to create automatic abstracts based on full text 
understanding (Sparck Jones 1994). As a result, many 
researchers have turned to sentence extraction (Kupiec, 
Pedersen, & Chen 1995; Brandow, Mitze, & Rau 1995; 
Hovy & Lin 1997). Sentence extraction, which does 
not involve any deep analysis, has the huge advantage 
of being robust with respect to individual writing style, 
discipline and text type (genre). Instead of produc- 
ing abstract, this results produces only extracts: doc- 
ument surrogates consisting of a number of sentences 
selected verbatim from the original text. 
We consider a concrete document retrieval (DR) sce- 
nario in which a researcher wants to select one or more 
scientific articles from a large scientific database (or 
even from the Internet) for further inspection. The 
main task for the searcher is relevance decision for each 
paper: she needs to decide whether or not to spend 
more time on a paper (read or skim-read it), depending 
on how useful it presumably is to her current informa- 
tion needs. Traditional sentence extracts can be used 
as rough-and-ready relevance indicators for this task, 
but they are not doing a great job at representing the 
contents of the original document: searchers often get 
the wrong idea about what the text is about. Much 
of this has to do with the fact that extracts are typ- 
ically incoherent texts, consisting of potentially unre- 
lated sentences which have been taken out of their con- 
text. Crucially, extracts have no handle at revealing 
the text's logical and semantic organisation. 
More sophisticated, user-tailored abstracts could help 
the searcher make a fast, informed relevance decision by 
taking factors like the searcher's expertise and current 
information need into account. If the searcher is deal- 
ing with research she knows well, her information needs 
might be quite concrete: during the process of writing 
her own paper she might want to find research which 
supports her own claims, find out if there are contra- 
dictory results to hers in the literature, or compare her 
results to those of researchers using a similar methodol- 
ogy. A different information need arises when she wants 
to gain an overview of a new research area - as an only 
"partially informed user" in this field (Kircz 1991) she 
will need to find out about specific research goals, the 
names of the researchers who have contributed the main 
research ideas in a given time period, along with infor- 
mation of methodology and results in this research field. 
There are new functions these abstracts could fulfil. 
In order to make an informed relevance decision, the 
searcher needs to judge differences and similarities be- 
tween papers, e.g. how a given paper relates to similar 
papers with respect to research goals or methodology, 
so that she can place the research described in a given 
paper in the larger picture of the field, a function we 
call navigation between research articles. A similar op- 
eration is navigation within a paper, which supports 
searchers in non-linear reading and allows them to find 
relevant information faster, e.g. numerical results. 
We believe that a document surrogate that aims at 
supporting such functions should characterize research 
articles in terms of the problems, research tasks and 
43 
solutions/methodology presented in the specific paper, 
but it should also represent other researchers" problems, 
research tasks and solutions mentioned in the paper. 
Our long-term goal is to automatically reconstruct this 
problem-solution structure from unrestricted text for 
the searcher in the form of a problem-structured ab- 
stract. 
But how can we find problems, research questions, 
research tasks and solutions in text without fully un- 
derstanding the text? We take sentence extraction as 
a starting point due to its inherent robustness. Using 
additional information about each sentence, viz. their 
rhetorical status with respect to the entire paper, we 
are in a better position to perform shallow, but guided 
information extraction, in order to find the information 
units we are interested in. 
In the next section we introduce the level of docu- 
ment structure we are talking about, and the kind of 
meta-comments we employ in discovering it. In the 
rest of the paper, we report on ongoing work on auto- 
matically filtering meta-comments from annotated and 
unannotated text. Finding recta-comments in text is 
an attractive task because it would allow for the auto- 
matic adaptation of sytem s using such phrases to new 
domains. 
2 Discourse structure and argumenta- 
tion in scientific articles 
Discourse linguistic theory suggests that texts serv- 
ing a common purpose among a community of users 
eventually take on a predictable structure of presenta- 
tion (Kintsch & van Dijk 1978) - and scientific articles 
certainly serve a well-defined communicative purpose: 
they "present, retell and refer to the results of spe- 
cific research" (Salager-Meyer 1992). Particularly in 
the life and experimental sciences, a rigid building plan 
for research articles has evolved over the years, where 
rhetorical divisions tend to be very clearly marked in 
section headers. Prototypical rhetorical divisions in- 
clude Introduction, Purpose, Experimental Design, Re- 
suits, Discussion, Conclusions. One of the reasons for 
this rigidly-defined structure seems to be that the sci- 
entific community in these fields has more or less agreed 
on how to do research: methodologies and evaluation 
methods are long-lived research entities that do not 
change often. 
One of the corpora we are using is a good example 
of such texts. It consists of 129 articles in cardiology, 
taken from the American Heart Journa/, which have a 
fixed structure with respect to rhetorical divisions and 
section headers. The other corpus, in contrast, consist- 
ing of 123 (mostly conference) articles in computational 
linguistics (CL), displays an heterogeneous mixture of 
methodologies and traditions of presentation one would 
expect in an interdisciplinary field. Most of the articles 
cover more than one single discipline, but as a rough 
estimate one can say that about 45% of the articles 
in the collection are predominantly technical in style, 
describing implementations (i.e. engineering solutions); 
about 25% report on research in theoretical linguis- 
tics, with an argumentative tenet; the remaining 30% 
are empirical (psycholinguistic or psychological experi- 
ments or corpus studies). Even though most of the ar- 
ticles have an introduction and conclusions (sometimes 
occurring under headers with different names), and al- 
most all of them cite previous work, the presentation of 
the problem and the methodology/solution are idiosyn- 
cratic and depend on individual writing style. Very few 
of the headers in the computational linguistics articles 
correspond to prototypical rhetorical divisions; the rest 
contain content specific terminology (cf. Figure 1 which 
compares relative frequencies of headers for the two cor- 
pora). 
Computational 
Linguistics I 
Header Freq:'l 
Introduction 85% Introduction 
Conclusion 46% Results 
Conclusions 22% Discussion 
Acknowledgments 1.7% Methods 
Discussion 12% Tables 
Results 11% Statistics 
Experimental Patients 
Results 9% Limitations 
Related Work 77% Conclusions 
Implementation 7~ Statistical 
Evaluation 7~ Analysis 
Example 6¢~ . Conclusion 
Background 6% Patient 
Summary 4cX Characteristics 
Cardiology 
Header Freq. 
too% 
94% 
94% 
92% 
79% 
40% 
29% 
2s% 
25% 
2°.% 
17% 
9% 
Figure 1: Highest-frequency headers (with relative oc- 
currence frequencies) 
Because the type of research reported in the compu- 
tational linguistics corpus differs so much, the descrip- 
tion of document structure we were looking for had to 
be flexible enough to generalize over differences in pre- 
sentation, yet formal enough for the extraction of the 
information units which are useful for automatic ab- 
stracts. We base our model of argumentation in scien- 
tific articles on Swales' (1990) CAlkS model ("Create a 
Research Space"). Swales' claim is that the main com- 
municative goal of an author of a research article is to 
convince readers (potential reviewers) that the research 
described in the paper constitutes an actual contribu- 
tion to science, in order to have the paper reviewed 
positively and thus published; this is the case whether 
or not the paper tries to give the impression that it 
reports research in an objective, disinterested way. In 
order to successfully present their case, authors argue 
in a goal-directed and prototypical way about problem- 
solving activities -- their own and other researchers'. 
Swales identified prototypical rhetorical building plans 
of introduction sections, along with linguistics surface 
cues that signal rhetorical moves. Examples for rhetor- 
44 
ical moves include the claim that the paper addresses a 
new problem or, if it is a well-known problem, then the 
presented solution has to be better than that of other 
researchers. 
A first analysis of the corpora confirmed many of 
the rhetorical building blocks suggested by Swales. We 
adapted Swales' scheme to the one shown in Figure 8 
(at the end of this paper). In the medical corpus, al- 
most all of the moves we found were of type I (Explicit 
mention); the rigid document structure seems to have 
replaced much of the "argument about problem-solving 
activities" (types II-V) for which we found ample evi- 
dence in the computational linguistics corpus. 
We are interested in identifying these moves auto- 
matically and shallowly in text, and we believe that 
this is technically feasible, because the stereotypical, 
predictable overall structure of the argument can be 
exploited in doing so. 
3 Meta-discourse markers 
In this paper we focus on the linguistic realisations of 
rhetorical moves, i.e. the surfacy signals of the argumen- 
tative status of a given sentence. Consider the strings in 
boldface on the right hand side of Figure 8. They cover 
the activities of reporting about research (reporting and 
presenting verbs), the problem-solving process (prob- 
lems, solutions, tasks); they also include other semantic 
links like necessity, causality and contrast. Due to ex- 
plicit or implicit argumentation, many of these strings 
are evaluative ("efficient, elegant, innovative, insight. 
ftd" vs. "impossible, inadequate, inconclusive, insuf- 
ficient"). We call them meta-comments because they 
talk about information units, as opposed to being sub- 
ject matter (scientific content). Such meta-comments 
are very frequent in our collection. 
Our meta-comments are similar Paice's (1981) indi- 
cator phrases (he was the first to use such phrases for 
abstracting); they are less similar to cue phrases, the 
discourse markers usually studied in discourse analysis, 
because they are not sentence connectives (with some 
exceptions), and because they are typically consider- 
ably longer and far more varied. 
The fact that the computational linguistics texts 
stem from an unmoderated medium (i.e. they are nei- 
ther chosen for publication nor edited by a central au- 
thority), means that there were no external restrictions 
on how exactly to say things. Authors use idiosyncratic 
style, which can vary from formal to informal. There 
are meta-comments that tend to get used in a fixed, for- 
mulaic way, but interestingly, we observed a wide range 
of linguistic variability with respect to the realization of 
some of the meta-comments (whereas their semantics is 
usually perfectly unambiguous). This effect makes the 
meta-commen~ in this text type interesting linguistic 
objects to study. 
We observed that there are certain meta-comments 
which are restricted to certain moves, mostly the evalu- 
ative and contrastive phrases and the phrases occurring 
in moves of type I (Explicit mention). Others occur fre- 
quently across moves, particularly general argumenta- 
tive phrases and relevance markers such as "important", 
"in this paper, we". Argumentative phrases like "we ar. 
9ue that" appeared with solution and problem-related 
moves almost as often as with claims and conclusions. 
These phrases seemed to be the ones that were most 
formulaic/fixed across texts. 
Our goal is to automatically find meta-markers and 
associate them with rhetorical moves (where this makes 
sense). In the next section, we report on a first experi- 
ment in that direction. 
4 Our experiment 
If it is true that most meta-comments are formulaic, re- 
curring expressions, then frequency information should 
help us separate meta-comments from domain-specific 
parts of the sentence. Those strings which occur rarely 
in the corpus will most likely be domain-specific and 
will appear low on frequency listings of strings, whereas 
meta-commeats should appear high on the lists. 
We also used a lexicon of 433 lexical seeds. Lexi- 
cal seeds are words which are semantically related to 
the activities of reporting, problem-solving, argument- 
ing or evaluating, or expressions of deixis ( "we... ") or 
other textual cues (e.g. literature references in text were 
marked up using the symbol \[REF\], which is a signal 
for mentions of other researchers' solutions, tasks or 
problems). 
The computational linguistics corpus was drawn 
from the computation and language archive 
(h~p://xxx.lanl.gov/cmp-lg) and contains 123 
articles; the 129 articles of the cardiology corpus 
appeared in the American Heart Journal. The medical 
corpus is smaller in overall size (436,909 words vs. 
654,477; 14,770 sentences vs. 23,072). 
For the computational linguistics corpus, we addi- 
tionally had a collection of 948 sentences that had been 
identified as relevant by a human annotator in a prior 
experiment (Teufel & Moens 1997). A human judge an- 
notated these with respect to the 23 rhetorical moves 
introduced in Figure 8. 
4.1 Filtering 
First, we compiled the two corpora into those bigrams, 
trigrams, 4-grams, 5-grams and 6-grams which did not 
cross sentence boundaries. We worked with a short 
stop-list compiled from the corpus (60 highest-frequent 
words) from which we had excluded those which we ex- 
pected to be important in an argumentative domain, 
e.g. personal and demonstrative pronouns. We low- 
ercased all words and counted punctuation (including 
brackets) as a full word. 
We then filtered the n-grams through our seed lexi- 
con, i.e. we retained those expressions which contain at 
45 
least one of the words of the seed-lexicon (or a morpho- 
logical variant of it). We also compiled and counted 
n-grams for the 948 computational linguistics target 
sentences, to see how similar these phrases from the an- 
notated parts were to the filtered or unfiltered bigrams 
from the entire corpus. 
Before filtering 
354 \[ref\], \[ref\] , 
3or \[ref\] ), 
297 , \[ref\], \[ref\] 
tTs Ccrefl ). 
144 \[ref\], \[ref\] . 
139 on the other hand 
134 for example , the 
118 in this paper, 
116 the other hand , 
111 • in ( \[cref\] ) 
110 can be used to 
After filtering 
118 in this paper, 
111 in (\[cref\]) 
110 can be used to 
106 in figure \[cref\] . 
100 ( \[cref\] ) , 
99 on the basis of 
83 shown in figure \[cref\] 
83 in section \[cref\] . 
75 this paper, we 
75 in section \[cref\] , 
71 it is possible to 
Figure 2: 4-grams in entire CL corpus 
The list of 4-grams for the computational linguis- 
tics corpus shows a typical picture of the outcome of 
this process (Figure 2). Before filtering, the frequent 
corpus n-grams contain general comments and expres- 
sions like "for example" but content specific expressions 
are already filtered out. (Very rarely, there are some 
content-specific phrases like "natural language ~' in the 
lists -- this is due to the fact that the corpus, even 
though interdisciplinary in nature, is composed of pa- 
pers focusing on language.) After filtering, the meta- 
comments on the lists have two properties: (a) they 
are frequent (b) they contain lexical items that could 
be related to argumentation about problems, research 
tasks, solutions -- in the computational linguistics cor- 
pus, these conditions seem to be enough to produce ex- 
pressions that are good candidates for meta-comments. 
Unfortunately, condition (a) means that a large number 
of recta-comments were lost, because they were of low 
frequency. 
Target sentences 
49 in this paper , \[0 can be used to 
37 this paper , we 9 in this paper is 
25 Cref\], \[ref\], 7 \[ref\], \[ref\]. 
22 , Ire.f\], Cref\] 7 described in this paper 
18 in this paper we 7 this paper ~ i 
Figure 3: 4-grams in CL target sentences 
How similar are the lists for annotated text and entire 
corpus in the computational linguistics domain? Table 
3 shows that they look very similar apart from minor 
differences, e.g. the fact that the list gained from anno- 
tated data contains more \[CREF\] items (internal cross 
reference like ~section \[CREF\]") which tend to appear 
frequently in the sentences where authors state the or- 
ganisation of the paper. The expressions used in such 
organisation statements are typically formulaic and re- 
Before filtering 
120 p < 0.05 ) 
102 p<0.0t) 
93 left ventricular ejec- 
tion fraction 
86 p < 0.001 ) 
86 p < 0.0001 ) 
85 on the basis of 
72 at the time of 
66 of this study was 
64 in this study , 
59 95 % confidence in- 
terval 
58 this study was to 
55 coronary artery dis- 
ease . 
49 acute myocardial in- 
farction . 
~fthr filterin~ 
85 on the basis of 
66 of this study was 
64 in this study , 
58 this study was to 
57 patients with heart 
failure 
45 the purpose of this 
44 there were no signifi- 
cant 
40 new york heart asso- 
ciation 
39 purpose of this study 
35 there was no signifi- 
cant 
32 were no significant 
differences 
31 p = not significant 
30 has been shown to 
Figure 4: 4-grams m medical corpus 
current, but not many of these sentences were consid- 
ered relevant when the 948 target sentences were deter- 
mined. 
The medical corpus shows significant differences (cf. 
Figure 4). Firstly, unfiltered bigrams do not sepa- 
rate content matter from meta-discourse. In these 
lists, there are phrases pertaining to statistical analyses 
("p < 0.0l") and several domain-specific phrases. Fil- 
tering (right hand side of Figure 4) forces the few meta- 
comments that are being used to the top of the list; they 
are linguistically invariant. For instance, "studf seems 
the only acceptable expression used for the current re- 
search, whereas the range is much wider in the other 
corpus ( "paper, article, study, work, research... ", and 
all the meta-comment candidates in the top part of the 
list belonged to one single rhetorical move, viz. Explicit 
Mention of the Research Task (Ex-T in Figure 8). 
A certain amount of noise has been introduced 
through the seed-lexicon because word senses were not 
disambiguated: 'failure" was included in the seed lexi- 
con to indicate mentions of failure of other researchers' 
solution. Because this term obviously has the different 
meaning of "heart failure" in the cardiology context, the 
desired distinction between subject matter strings and 
recta-comments got lost; similarly "New York" was in- 
cluded because the word "new" could potentially point 
to novel approaches. This might mean that it is neces- 
sary to use different stop-lists and/or seed lexicons for 
different domains. 
As we have seen before, associating meta-comments 
with rhetorical moves is a more difficult task for some 
meta-comments than for others. We tried to anchor the 
probable rhetorical move of a phrase in the lexical seed 
it contains, a simplification we are forced to make due 
46 
to the small amount of annotated text we have available 
(which is reflected in the low numbers). We are thus in 
the process of working on a larger scale annotation. 
We used the human judgements to count how often 
each word contained in the target sentences appears 
with a certain rhetorical move. If the difference in fre- 
quency between the best-scoring moves for that word 
was large enough, we assumed it was a good indicator 
for the highest-scoring move, and we then manually as- 
sociated the given rhetorical move with the word if it 
was contained in the seed lexicon, or to semantically 
similar seeds. For example, seeds that are the most 
likely associated with the OWN SOLUTION BETTER (53 
examples of this move in the target sentences) were 
"than" (39), "better" (36), "results" (21), "method" 
(19), "using" (15), "significantly" (14), " outperforms" 
(12) and "more" (12). Filtered meta-comments are 
then assigned the rhetorical move predicted by the first 
seed they contain. Figure 5 shows the meta-comments 
filtered for the seed "better" from both corpora. In the 
medical corpus, there is less argument about method- 
ology/solutions, and as a result the phrases found are 
unfortunately not meta-comments but contain medical 
terminology. 
Computational 
Linguistics 
64 better than 
50 a better 
23 better than the 
20 much better 
19 the better 
19 is better 
16 significantly better 
16 be better 
13 better performance 
11 are better 
10 better the 
9 significantly better 
than 
6 better suited 
6 better than that of 
,= 
Cardiology 
It a better 
7 better than 
6 to better 
6 significantly better 
6 better in 
5 better left 
5 better left ventricu.lar 
5 and better 
4 better preserved 
4 better in smokers 
3 to be somewhat bet- 
ter tolerated 
3 failure symptoms in 
spite of better 
3 better preserved left 
ventricular systolic 
function 
3 better in smokers 
than in nonsmokers 
Figure 5: Potential meta-comments with "better" from 
both corpora 
Also, we observed that it is not easy to predict the op- 
timal length of a certain meta-comment which is indica- 
tive of a certain rhetorical move. For moves containing 
ocher problems/solutions~tasks the very short string 
"\[REF\]" is contained in all successful meta-comments, 
whereas for explicit mention of research goals, the maxi- 
mal length 6 of meta-comments which we chose for these 
experiments might even be too short. As another exam- 
ple, for the STEP move ("goa/is achieved by doing so- 
lution"), the best indicator we found was "in order to". 
4.2 Evaluation 
We evaluate the quality of these automatically gener- 
ated meta-comment lists by comparing them to a man- 
ually created meta-comment list used by a summarisa- 
tion system, cf. (Teufel & Moens To Appear). The per- 
formance of the system - with the two different meta- 
comment lists - is measured by precision and recall val- 
ues of co-selection with the target extracts defined by 
human annotators mentioned earlier. The summarisa- 
tion process consists of two consecutive steps, sentence 
extraction and rhetorical classification, and uses other 
heuristics like location and term frequency. 
The summarisation system requires a list of meta- 
comments of arbitrary length, containing a quality score 
for each phrase which estimates how predictive these 
phrases are in pointing to extract-worthy sentences, and 
the most likely rhetorical label that sentences with this 
meta-comment will receive. 
Quality 
Class 
2 
3 
2 
3 
2 
I 
i 
3 
2 
2 
I 
I 
I 
i 
t 
2 
i 
I 
I 
I 
I 
I 
I 
i 
Rhetorical 
Move 
STEP 
Ex-T 
Co-S 
Ex-T 
Ex-T 
Ex-P 
Ex-T 
Ex-T 
Ex-T 
Ex-C 
Ex-T 
Ex-T 
Ex-E 
NEc-S-T 
Co-C 
Ex-T 
Meta-comment 
paper , 
this paper presents a 
in order to 
in this paper , we will 
in this paper we have 
unlike \[ref\] 
this paper is to 
in this paper , we describe 
paper is 
paper we 
this paper has presented 
, we propose a method 
in p~sage ( \[cref\] 
, we argue that 
argue that 
method for 
we show that the 
show how 
property and the number 
the advantages of 
the wall street journal 
the importance of 
however , we 
be used to 
Figure 6: Extracts from automatic list of meta- 
comments 
We automatically built the meta-comment list in Fig- 
ure 6 (containing 318 entries). We started from all 
n-grams compiled from the target sentences and took 
the following heuristics into account: Firstly, choose 
phrases with a high ratio of target frequency to cor- 
pus frequency, because these are indicative phrases. 
Set the quality value accordingly. Secondly, exclude 
phrases with a low overall frequency, or decrease their 
quality score, because including/overestimating them 
might construct a model that is over-fitted to the data. 
Thirdly, associate each phrase with its most likely 
47 
rhetorical move, by taking the ratio between frequency 
in each rhetorical class and the frequency of the rhetor- 
ical label itself into account. If below a certain thresh- 
old, don't associate any move at all (e.g. "paper ," in 
Figure 6). 
The manual meta-comment list, in contrast, was 
compiled in an extremely labour intensive manner and 
refined over the months. It consists of 1791 meta- 
comments (some of which are much longer than the 
maximum of 6 words that the automatic phrases con- 
sisted of), along with their most plausible rhetorical 
moves and quality scores. 
Extraction 
Classification 
Manual Automatic 
66.4% 52.5% 
66.3% 54.3% 
Figure 7: Evaluation results: precmon and recall of co- 
selection 
As Figure 7 shows, using the automatic meta- 
comment list instead of the manually created one de- 
creased the summarizer's performance from 66.4% to 
52.5% precision and recall for extraction, and from 
66.3% to 54.3% precision and recall for classification. 
5 Discussion and further work 
The evaluation indicates that the quality of the au- 
tomatic meta-comment list is not yet high enough to 
replace the manual list in our summarization system. 
However, a look at the automatic list itself shows 
that, even though it is far from perfect, most of the 
high-frequent strings found are plausible candidates for 
recta-comments (or parts of recta-comments). In most 
cases, subject matter can be successfully filtered out. 
We regard the simple method for automatic meta- 
comment identification discussed in this paper as a 
baseline for further work. We have simplified the prob- 
lem of finding meta-comments enormously by only con- 
sidering verbatim substrings. By doing so, we have 
ignored discontinuous strings, morphological variation 
and statistical interaction between the words in the 
string. In addition, the phrases considered so far have 
been short, and we have not collected many of them, 
because we wanted to rely only on the ones with rea- 
sonably high frequencies. 
The biggest problem for now is that highly indica- 
tive, but infrequent recta-comments cannot be found 
with a simple method like ours. Therefore, it is essen- 
tial to perform some generalization over similar phrases. 
One way would be the automatic clustering of similar 
concepts, e.g. to find out that "argue" and "show" are 
presentational verbs with similar semantics. Another 
idea would be to allow for more flexible patterns con- 
sisting of short n-grams and other words, in order to 
skip over intervening words like adjectives and adverbs. 
This might avoid the data sparseness problems encoun- 
tered with the longer n-grams. 
6 Summary 
We have presented some baseline results from our on- 
going work concerning automatic filtering of meta- 
comments (indicator phrases) from scientific papers. 
Meta-comments can vary considerably from one domain 
to another, as the comparison of the two corpora we 
considered shows. In the computational linguistics ar- 
ticles, authors argue explicitly about problems, solu- 
tions and research tasks, whereas this is less the case 
in the medical domain, where meta-comments are less 
frequent and more formulaic. 
We have shown that lists of recta-comments acquired 
in a simple automatic process can be used to automati- 
cally identify a shallow document structure in scientific 
text, albeit with a certain quality loss when compared 
to manually constructed resources. 
7 Acknowledgements 
Data collection of the computational linguistics corpus 
took place collaboratively with Byron Georgantopolous. 
We thank Kathy McKeown. Vasileios Hatzivassiloglou 
and Olga Merport from Columbia University, NY, for 
kindly allowing us to use their cardiography corpus. 

References 
Brandow, R.; Mitze, K.; and Rau, L. F. 1995. Automatic 
condensation of electronic publications by sentence selec- 
tion. Information Processing and Management 31(5):675- 
685. 
Hovy, E. H., and Lin, C. Y. 1997. Automated text sum- 
marization in SUMMARIST. ACL/EACL-97 Workshop 
'Intelligent Scalable Text Summarization'. 
Kintsch, W., and van Dijk, T. A. 1978. Toward a model of 
text comprehension and production. Psychological Review 
85{5):363-.-394. 
Kircz, J. G. 1991. The rhetorical structure of scientific arti- 
cles: the case for argumentational analysis in information 
retrieval. Journal of Documentation 47(4):354-372. 
Kupiec, J.; Pedersen, J. O.; and Chen, F. 1995. A trainable 
document summarizer. In Proceedings of the 18th ACM- 
SIG IR Conference, Association for Computing Machinery, 
Special Interest Group Information Retrieval, 68-73. 
Salager-Meyer, F. 1992. A text-type and move analysis 
study of verb tense and modality distributions in medical 
English abstracts. English for Specific Purposes 11:93-113. 
Sparck Jones, K. 1994. Discourse modelling for automatic 
summarising. Technical report TR-290, Computer Labo- 
ratory, University of Cambridge. 
Swales, J. 1990. Genre analysis: English in academic and 
research settings. Cambridge University Press. 
Teufel, S., and Moens, M. 1997. Sentence extraction as a 
classification task. In ACL/EACL.97 Workshop 'Intelli- 
gent Scalable Text Summarization', 58---65. 
Teufel, S., and Moens, M. To Appear. Argumentative clas- 
sification of extracted sentences as a first step towards flex- 
ible abstracting. In Mani, I., and Maybury, M., eds., Ad- 
vances in automatic text summarization. MIT Press. 
