The HOLJ Corpus: supporting summarisation of legal texts
Claire Grover Ben Hachey Ian Hughson
School of Informatics, University of Edinburgh
2 Buccleuch Place
Edinburgh, EH8 9LW
Scotland, UK
a0 grover,bhachey
a1 @inf.ed.ac.uk, S.I.Hughson@sms.ed.ac.uk
Abstract
We describe an XML-encoded corpus of texts in
the legal domain which was gathered for an auto-
matic summarisation project. We describe two dis-
tinct layers of annotation: manual annotation of the
rhetorical status of sentences and an entirely auto-
matic annotation process incorporating a host of in-
dividual linguistic processors. The manual rhetor-
ical status annotation has been developed as train-
ing and testing material for a summarisation sys-
tem based on the work of Teufel and Moens, while
the automatic layer of annotation encodes linguistic
information as features for a machine learning ap-
proach to rhetorical status classification.
1 Project Overview
The main aim of our project is to explore techniques
for automatic summarisation of texts in the legal do-
main. A somewhat simplistic characterisation of the
field of automatic summarisation is that there are
two main approaches, fact extraction and sentence
extraction. The former uses Information Extraction
techniques to fill predefined templates which serve
as a summary of the document; the latter compiles
summaries by extracting key sentences with some
smoothing to increase the coherence between the
sentences. Our approach to summarisation is based
on that of Teufel and Moens (1999a, 2002, hence-
forth T&M). T&M work on summarising scientific
articles and they use the best aspects of sentence ex-
traction and fact extraction by combining sentence
selection with information about why a certain sen-
tence is extracted—i.e. its rhetorical role: is it, for
example, a description of the main result, or is it a
criticism of someone else’s work? This approach
can be thought of as a more complex variant of tem-
plate filling, where the slots in the template are high-
level structural or rhetorical roles (in the case of sci-
entific texts, these slots express argumentative roles
such as goal and solution) and the fillers are sen-
tences extracted from the source text using a vari-
ety of statistical and linguistic techniques. With this
combined approach the closed nature of the fact ex-
traction approach is avoided without giving up its
flexibility: summaries can be generated from this
kind of template without the need to reproduce ex-
tracted sentences out of context. Sentences can be
reordered or suppressed depending on the rhetorical
role associated with them.
Taking the work of T&M as our point of depar-
ture, we explore the extent to which their approach
can be transferred to a new domain, legal texts. We
focus our attention on a subpart of the legal domain,
namely law reports, for three main reasons: (a) the
existence of manual summaries means that we have
evaluation material for the final summarisation sys-
tem; (b) the existence of differing target audiences
allows us to explore the issue of tailored summaries;
and (c) the texts have much in common with the sci-
entific articles papers that T&M worked with, while
remaining challengingly different in many respects.
In this paper we describe the corpus of legal texts
that we have gathered and annotated. The texts in
our corpus are judgments of the House of Lords1,
which we refer to as HOLJ. These texts contain a
header providing structured information, followed
by a sequence of Law Lord’s judgments consisting
of free-running text. The structured part of the doc-
ument contains information such as the respondent,
appellant and the date of the hearing. The decision
is given in the opinions of the Law Lords, at least
one of which is a substantial speech. This often
starts with a statement of how the case came before
the court. Sometimes it will move to a recapitula-
tion of the facts, moving on to discuss one or more
points of law, and then offer a ruling.
We have gathered a corpus of 188 judgments
from the years 2001–2003 from the House of Lords
website. (For 153 of these, manually created sum-
maries are available2 and will be used for system
evaluation). The raw HTML documents are pro-
1http://www.parliament.uk/judicial_work/
judicial_work.cfm
2http://www.lawreports.co.uk/
cessed through a sequence of modules which auto-
matically add layers of annotation. The first stage
converts the HTML to an XML format which we re-
fer to as HOLXML. A House of Lords Judgment is
defined as a J element whose BODY element is com-
posed of a number of LORD elements (usually five).
Each LORD element contains the judgment of one
individual lord and is composed of a sequence of
paragraphs (P elements) inherited from the original
HTML. The total number of words in the BODY ele-
ments in the corpus is 2,887,037 and the total num-
ber of sentences is 98,645. The average sentence
length is approx. 29 words. A judgment contains an
average of 525 sentences while an individual LORD
speech contains an average of 105 sentences.
There will be three layers of annotation in the fi-
nal version of our corpus, with work on the first two
well under way. The first layer is manual annotation
of sentences for their rhetorical role. The second
layer is automatic linguistic annotation. The third
layer is annotation of sentences for ‘relevance’ as
measured by whether they match sentences in hand-
written summaries. We describe the first two layers
in Sections 2 and 3, and in Section 4 we discuss
possible approaches to the third annotation layer.
2 Rhetorical Status Annotation
2.1 Rhetorical Roles for Law Reports
The rhetorical roles that can be assigned to sen-
tences will naturally vary from domain to domain
and will reflect the argumentative structure of the
texts in the domain. In designing an annotation
scheme, decisions must be made about how fine-
grained the labels can be and an optimal balance
has to be found between informational richness and
human annotator reliability. In this section we dis-
cuss some of the considerations involved in design-
ing our annotation scheme.
Teufel and Moens’ scheme draws on the CARS
(Create a Research Space) model of Swales (1990).
A key factor in this, for the purposes of summari-
sation, is that each rhetorical move or category de-
scribes the status of a unit of text with respect to
the overall communicative goal of a paper, rather
than relating it hierarchically to other units, as in
Rhetorical Structure Theory (Mann and Thompson,
1987), for example. In scientific research, the goal
is to convince the intended audience that the work
reported is a valid contribution to science (Myers,
1992), i.e. that it is in some way novel and original
and extends the boundaries of knowledge.
Legal judgments are very different in this regard.
They are more strongly performative than research
reports, the fundamental act being decision. In par-
ticular, the judge aims to convince his professional
and academic peers of the soundness of his argu-
ment. Therefore, a judgment serves both a declara-
tory and a justificatory function (Maley, 1994). In
truth, it does more even than this, for it is not enough
to show that a decision is justified: it must be shown
to be proper. That is, the fundamental communica-
tive purpose of a judgment is to legitimise a deci-
sion, by showing that it derives, by a legitimate pro-
cess, from authoritative sources of law.
Table1 provides an overview of the rhetorical an-
notation scheme that we have developed for our
corpus. The set of labels follows almost directly
from the above observations about the communica-
tive purpose of a judgment. The initial parts of
a judgment typically restate the facts and events
which caused the initial proceedings and we label
these sentences with the rhetorical role FACT. By
the time the case has come to the House of Lords it
will have passed through a number of lower courts
and there are further details pertaining to the previ-
ous hearings which also need to be restated: these
sentences are labelled PROCEEDINGS. In consider-
ing the case the law lord discusses precedents and
legislation and a large part of the judgment consists
in presenting these authorities, most frequently by
direct quotation. We use the label BACKGROUND
for this rhetorical role. The FRAMING rhetorical
role captures all aspects of the law lord’s chain of ar-
gumentation while the DISPOSAL rhetorical role is
used for sentences which indicate the lord’s agree-
ment or disagreement with a previous ruling: since
this is a court of appeal, the lord’s actual decision,
either allowing or dismissing the appeal, is anno-
tated as DISPOSAL. The TEXTUAL rhetorical role
is used for sentences which indicate structure in the
ruling, while the OTHER category is for sentences
which cannot be fitted into the annotation scheme.
As the frequency column in Table 1 shows, PRO-
CEEDINGS, BACKGROUND and FRAMING make up
about 75% of the sentences with the other categories
being less frequently attested.
2.2 Manual Annotation of Rhetorical Status
The manual annotation of rhetorical roles is work
in progress and so far we have 40 documents fully
annotated. The frequency figures in Table 1 are
taken from this manually annotated subset of the
corpus and the classifiers described in Section 2.3
have been trained and evaluated on the same subset.
This subset of the corpus is similar in size to the cor-
pus reported in (Teufel and Moens, 2002): the T&M
corpus consists of 80 conference articles while ours
consists of 40 HOLJ documents. The T&M corpus
Label Freq. Description
FACT 862 The sentence recounts the events or circumstances which gave rise
(8.5%) to legal proceedings.
E.g. On analysis the package was found to contain 152 milligrams
of heroin at 100% purity.
PROCEEDINGS 2434 The sentence describes legal proceedings taken in the lower courts.
(24%) E.g. After hearing much evidence, Her Honour Judge Sander, sitting at
Plymouth County Court, made findings of fact on 1 November 2000.
BACKGROUND 2813 The sentence is a direct quotation or citation of source of law material.
(27.5%) E.g. Article 5 provides in paragraph 1 that a group of producers may
apply for registration . . .
FRAMING 2309 The sentence is part of the law lord’s argumentation.
(23%) E.g. In my opinion, however, the present case cannot be brought within
the principle applied by the majority in the Wells case.
DISPOSAL 935 A sentence which either credits or discredits a claim or previous ruling.
(9%) E.g. I would allow the appeal and restore the order of the Divisional Court.
TEXTUAL 768 A sentence which has to do with the structure of the document or with
(7.5%) things unrelated to a case.
E.g. First, I should refer to the facts that have given rise to this litigation.
OTHER 48 A sentence which does not fit any of the above categories.
(0.5%) E.g. Here, as a matter of legal policy, the position seems to me straightforward.
Table 1: Rhetorical Annotation Scheme for Legal Judgments
contains 12,188 sentences and 285,934 words while
ours contains 10,169 sentences and 290,793 words.
The 40 judgments in our manually annotated sub-
set were annotated by two annotators using the NITE
XML toolkit annotation tool (Carletta et al., 2003).
Annotation guidelines were developed by a team in-
cluding a law professional. Eleven files were dou-
bly annotated in order to measure inter-annotator
agreement. We used the kappa coefficient of agree-
ment as a measure of reliability. This showed that
the human annotators distinguish the seven cate-
gories with a reproducibility of K=.83 (N=1,955,
k=2; where K is the kappa co-efficient, N is the
number of sentences and k is the number of anno-
tators). This is slightly higher than that reported by
T&M and above the .80 mark which Krippendorf
(1980) suggests is the cut-off for good reliability.
2.3 Experiments with Rhetorical Role
Classification
Using the manually annotated subset of the corpus
we have performed a number of preliminary exper-
iments to determine which classifier and which fea-
ture set would be appropriate for rhetorical role clas-
sification. A brief summary of the micro-averaged
F-score3 results is given in Table 2 (Detailed results
in Hachey and Grover, 2004).
The features with which we have been experi-
menting for the HOLJ domain are broadly similar
3Micro-averaging weights categories by their prior proba-
bility as opposed to macro-averaging which puts equal weight
on each class regardless of how sparsely populated it might be.
Classifier Features F-score
C4.5 L 65.4
NB LCESQ 51.8
Winnow LCESQT 41.4
SVM LCESQT 60.6
Table 2: Micro-averaged F-score results for rhetori-
cal classification
to those used by T&M and include many of the fea-
tures which are typically used in sentence extrac-
tion approaches to automatic summarisation as well
as certain other features developed specifically for
rhetorical role classification. Briefly, the feature set
includes such features as: (L) location of a sentence
within the document and its subsections and para-
graphs; (C) cue phrases; (E) whether the sentence
contains named entities; (S) sentence length; (T) av-
erage tf a0 idf term weight; and (Q) whether the sen-
tence contains a quotation or is inside a block quote.
While we still expect to achieve gains over these
preliminary scores, our system already exhibits an
improvement over baseline similar to that achieved
by the T&M system, which is encouraging given
that we have not invested any time in developing
the hand-crafted cue phrases that proved the most
useful feature for T&M, but rather have attempted
to simulate these through fully automatic, largely
domain-independent linguistic information.
We plan further experiments to investigate the ef-
fect of other cue phrase features. For example, sub-
ject and main verb hypernyms should allow better
HOLXML
Conversion to
document
HTML
Automatically
annotated
HOLXML
document
Recognition
Named
Entity
Identification
Chunking
& Clause
Verb &
subject
featuressation
Lemmati−
Tokenisation
POS Tagging
& Sentence
Identification
TOKENISATION MODULE
LINGUISTIC ANALYSIS MODULE
Figure 1: HOLJ Processing Stages
generalisation over agent and type of action infor-
mation. We will also experiment with maximum
entropy–a machine learning method that allows the
integration of a very large number of diverse infor-
mation sources and has proved highly effective in
other natural language tasks–in both classification
and sequence modelling frameworks.
3 Automatic Linguistic Markup
One of the aims of our project is to create an anno-
tated corpus of legal texts which will be available
to NLP researchers. We encode all the results of
linguistic processing as HOLXML annotations. Fig-
ure 1 shows the broad details of the automatic pro-
cessing that we perform, with the processing di-
vided into an initial tokenisation module and a later
linguistic annotation module. The architecture of
our system is one where a range of NLP tools is used
in a modular, pipelined way to add linguistic knowl-
edge to the XML document markup.
In the tokenisation module we convert from the
source HTML to HOLXML and then pass the data
through a sequence of calls to a variety of XML-
based tools from the LT TTT and LT XML toolsets
(Grover et al., 2000; Thompson et al., 1997). The
core program is the LT TTT program fsgmatch, a
general purpose transducer which processes an in-
put stream and adds annotations using rules pro-
vided in a hand-written grammar file. The other
main LT TTT program is ltpos, a statistical combined
part-of-speech (POS) tagger and sentence boundary
disambiguation module (Mikheev, 1997). The first
step in the tokenisation modules uses fsgmatch to
segment the contents of the paragraphs into word el-
ements. Once the word tokens have been identified,
the next step uses ltpos to mark up the sentences and
add part of speech attributes to word tokens.
The motivation for the module that performs fur-
ther linguistic analysis is to compute information to
be used to provide features for the sentence classi-
fier. However, the information we compute is gen-
eral purpose and makes the data useful for a range
of NLP research activities.
The first step in the linguistic analysis module
lemmatises the inflected words using Minnen et al.’s
(2000) morpha lemmatiser. As morpha is not XML-
aware, we use xmlperl (McKelvie, 1999) as a wrap-
per to incorporate it in the XML pipeline. We use a
similar method for other non-XML components.
The next stage, described in Figure 1 as Named
Entity Recognition (NER), is in fact a more complex
layering of two kinds of NER. Our documents con-
tain the standard kinds of entities familiar from the
MUC and CoNLL competitions (Chinchor, 1998;
Daelemans and Osborne, 2003), such as person, or-
ganisation, location and date but they also contain
domain-specific entities. Table 3 shows examples
of the entities we have marked up in the corpus (in
our annotation scheme these are noun groups (NG)
with specific type and subtype attributes). In the
top two blocks of the table are examples of domain-
specific entities such as courts, judges, acts and
judgments, while in the third block we show exam-
ples of non-domain-specific entity types. We use
different strategies for the identification of the two
classes of entities: for the domain-specific ones we
use hand-crafted LT TTT rules, while for the non-
domain-specific ones we use the C&C named en-
tity tagger (Curran and Clark, 2003) trained on the
MUC-7 data set. For some entities, the two ap-
proaches provide competing analyses, in which case
the domain-specific label is to be preferred since it
provides finer-grained information. Wherever there
is no competition, C&C entities are marked up and
labelled as subtype=‘fromCC’).
During the rule-based entity recognition phase,
an ‘on-the-fly’ lexicon is built from the document
header. This includes the names of the lords judg-
ing the case as well as the respondent and appellant
and it is useful to mark these up explicitly when
they occur elsewhere in the document. We create
an expanded lexicon from the ‘on-the-fly’ lexicon
containing ordered substrings of the original entry
a0 NG type=’enamex-pers’ subtype=’committee-lord’
a1 Lord Rodger of Earlsferry
Lord Hutton
a0 NG type=’caseent’ subtype=’appellant’
a1 Northern Ireland Human Rights Commission
a0 NG type=’caseentsub’ subtype=’appellant’
a1 Commission
a0 NG type=’caseent’ subtype=’respondent’
a1 URATEMP VENTURES LIMITED
a0 NG type=’caseentsub’ subtype=’respondent’
a1 Uratemp Ventures
a0 NG type=’enamex-pers’ subtype=’judge’
a1 Collins J
Potter and Hale LJJ
a0 NG type=’enamex-org’ subtype=’court’
a1 European Court of Justice
Bristol County Court
a0 NG type=’legal-ent’ subtype=’act’
a1 Value Added Tax Act 1994
Adoption Act 1976
a0 NG type=’legal-ent’ subtype=’section’
a1 section 18(1)(a)
para 3.1
a0 NG type=’legal-ent’ subtype=’judgment’
a1 Turner J [1996] STC 1469
Apple and Pear Development Council v Commissioners
of Customs and Excise (Case 102/86) [1988] STC 221
a0 NG type=’enamex-loc’ subtype=’fromCC’
a1 Oakdene Road
Kuwait Airport
a0 NG type=’enamex-pers’ subtype=’fromCC’
a1 Irfan Choudhry
John MacDermott
a0 NG type=’enamex-org’ subtype=’fromCC’
a1 Powergen
Grayan Building Services Ltd
Table 3: Named Entities in the Corpus
in order to perform a more flexible lexical look-
up. Thus the entity Commission is recognised as
an appellant substring entity in the document where
Northern Ireland Human Rights Commission occurs
in the header as an appellant entity.
The next stage in the linguistic analysis module
performs noun group and verb group chunking us-
ing fsgmatch with specialised hand-written rule sets.
The noun group and verb group mark-up plus POS
tags provide the relevant features for the next pro-
cessing step. Elsewhere (Grover et al., 2003), we
showed that information about the main verb group
of the sentence may provide clues to the rhetorical
status of the sentence (e.g. a present tense active
verb correlates with BACKGROUND or DISPOSAL).
In order to find the main verb group of a sentence,
however, we need to establish its clause structure.
We do this with a clause identifier (Hachey, 2002)
built using the CoNLL-2001 shared task data (Sang
and D´ejean, 2001). Clause identification is per-
formed in three steps. First, two maximum entropy
classifiers are applied, where the first predicts clause
start labels and the second predicts clause end la-
bels. In the the third step clause segmentation is
inferred from the predicted starts and ends using a
maximum entropy model whose sole purpose is to
provide confidence values for potential clauses.
The final stages of linguistic processing use hand-
written LT TTT components to compute features of
verb and noun groups. For all verb groups, attributes
encoding tense, aspect, modality and negation are
added to the mark-up: for example, might not have
been brought is analysed as a2 VG tense=’pres’, as-
pect=’perf’, voice=’pass’, modal=’yes’, neg=’yes’ a3 .
In addition, subject noun groups are identified and
lemma information from the head noun of the sub-
ject and the head verb of the verb group are propa-
gated to the verb group attribute list.
4 Automatic Relevance Annotation
In addition to completing the annotation of rhetor-
ical status, in order to make this a useful corpus
for sentence extraction, we also need to annotate
sentences for relevance. Many approaches to rel-
evance annotation use human judges, but there are
some automatic approaches which pair up sentences
from manually created abstracts with sentences in
the source text. We survey these here.
As mentioned earlier, our corpus includes hand-
written summaries from domain experts. This
means that we have the means to relate one to the
other to create a gold standard relevance-annotated
corpus. The aim is to find sentences in the document
that correspond to sentences in the summary, even
though they are likely not to be identical in form.
Table 4 summarises five approaches to automatic
relevance annotation from the literature. The ap-
proaches fall into three basic paradigms based on
the methods they use to match abstract content to
sentences from the source document: longest com-
Authors Paradigm Level
Teufel and Moens (1997) Longest common subsequence matching Sentence
Mani and Bloedorn (1998) IR (a0 overlapping wordsa0a2a1 cosine-based similarity metric) Sentence
Banko et al. (1999) IR (a0 overlapping wordsa0 w/ extra weight for proper nouns) Sentence
Marcu (1999) IR (cosine-based similarity metric) Clause
Jing and McKeown (1999) HMM (prefers ordered, contiguous words, sentences) Word
Table 4: Methods for automatic alignment of abstracts with their source documents.
mon subsequence matching, IR-based matching,
and hidden Markov model (HMM) decoding. The
approaches also differ in the basic alignment unit.
The first three operate at the sentence level in the
source document, while the fourth and fifth operate
at clause level and word level respectively, but can
be generalised to sentence-level annotation.
The first approach (Teufel and Moens, 1997)
uses a simple surface similarity measure (longest
common subsequence of non-stop-list words) for
matching abstract sentences with sentences from the
source document. This approach accounts for order
and length of match, but does not incorporate se-
mantic weight of match terms. Nor does it allow for
reordering of matched terms.
Mani and Bloedorn (1998) discuss an IR method
using a cosine-based similarity metric over tfa0 idf
scores with an additional term that counts the num-
ber of abstract words present in a source sentence.
The entire abstract is treated as a query, effectively
circumventing the level-of-alignment question dur-
ing matching. In the end the top c% of sentences are
labelled as relevant extract sentences.
The third approach (Banko et al., 1999) treats
terms from abstract sentences as query terms. These
are matched against sentences from the original
document. Proper nouns get double weight when
computing overlap. This method was found to be
better than a version which took the top tl a0 tf 4 words
from each summary sentence and used relative fre-
quency to identify matching source sentences.
Marcu (1999) also describes an IR-based ap-
proach. Like Mani and Bloedorn, a cosine-based
similarity metric is used. However, the score is used
to determine the similarity between the full abstract
and the extract. The extract is initialised as the
full source document, then clauses are removed in
a greedy fashion until the maximum agreement be-
tween the abstract and the extract is achieved. Ad-
ditionally, several post-matching heuristics are ap-
plied to remove over-generous matches (e.g. clauses
with less than three non-stop words).
4tl
a3 tf (term length term frequency) is used as an efficient es-
timation of tfa3 idf based on the assumption that frequent words
in any language tend to be short (i.e. term length is proportional
to inverse document frequency).
A shortcoming of the bag-of-words IR ap-
proaches is the fact that they do not encode order
preferences. Another approach that accounts for or-
dering is reported by Jing and McKeown (1999),
who use an HMM with hand-coded transition
weights. On the other hand, Jing and McKeown
do not include features encoding semantic weight
such as tf a0 idf. Like Marcu, correction heuristics are
employed to remove certain matches (e.g sentences
that contribute less than 2 non-stop words).
Before drawing conclusions, we should consider
how appropriate our data is for automatic relevance
annotation. Teufel and Moens (2002) discuss the
difference between abstracts created by document
authors and those created by professional abstrac-
tors noting that the former tend to be less systematic
and more “deep generated” while the latter are more
likely to be created by sentence extraction. T&M
quantify this effect by measuring the proportion of
abstract sentences that appear in the source docu-
ment (either as a close variant or in identical form).
They report 45% for their corpus of author-created
abstracts. Kupiec et al. (1995), by contrast, report
79% for their corpus of professional abstracts. Our
summaries are not author created, so we would ex-
pect a higher rate of close-variant matches. On
the other hand, though our summaries are created
by domain expects, they are not necessarily profes-
sional abstractors so we might expect more variation
in summarisation strategy.
Ultimately, human supervision may be required
as in Teufel and Moens (2002), however we can
make some observations about the automatic anno-
tation methods above. While IR approaches and ap-
proaches that model order and distance constraints
have proved effective, it would be interesting to
test a model that incorporates both a measure of
the semantic weight of matching terms and surface
constraints. Since we have named entities marked
in our corpus, we could modify the Banko (1999)
method by matching terms at the entity-level or sim-
ply apply extra weighting to terms inside entities.
We might also match entity types or even grounded
entities in the case of appellant and respondent.
Also, it may be desirable to annotate sentences
with a weight indicating the degree of relevance.
Then, a numerical prediction method might be used
in place of classification, avoiding information loss
in the model due to discretisation. Also, if the an-
notation is weighted, then we might incorporate de-
gree of relevance into the evaluation metric.
5 Conclusions
We have presented a new corpus of UK House of
Lords judgments which we are in the process of
producing. The current version of the corpus can
be downloaded from http://www.ltg.ed.ac.uk/
SUM/. The final version will contain three lay-
ers: rhetorical status annotation, detailed linguis-
tic markup, and relevance annotation. The linguis-
tic markup is fully automatic and we anticipate that
relevance annotation can be achieved automatically
with a relatively high reliability.
Our rhetorical status annotation derives from
Swales’ (1990) CARS model where each category
describes the status of a unit of text with respect
to the overall communicative goal of a document.
Preliminary experiments using automatic linguistic
markup to extract cue phrase features for rhetorical
role classification give encouraging results.
Acknowledgments
This work is supported by EPSRC grant GR/N35311.
The corpus annotation was carried out by Vasilis
Karaiskos and Hui-Mei Liao with assistance using the
NITE XML Tolkit (NXT) from Jonathan Kilgour.

References
Michele Banko, Vibhu Mittal, Mark Kantrowitz, and
Jade Goldstein. 1999. Generating extraction-based
summaries from hand-written summaries by aligning
text spans. In Proceedings of PACLING’99.
Jean Carletta, Stefan Evert, Ulrich Heid, Jonathan Kil-
gour, Judy Robertson, and Holger Voormann. 2003.
The NITE XML Toolkit: flexible annotation for multi-
modal language data. Behavior Research Methods,
Instruments, and Computers, special issue on Mea-
suring Behavior, 35(3).
Nancy A. Chinchor. 1998. Proceedings of MUC-7. Fair-
fax, Virginia.
James R. Curran and Stephen Clark. 2003. Language
independent ner using a maximum entropy tagger. In
Proceedings of CoNLL-2003.
Walter Daelemans and Miles Osborne. 2003. Proceed-
ings of CoNLL-2003. Edmonton, Canada.
Claire Grover, Colin Matheson, Andrei Mikheev, and
Marc Moens. 2000. LT TTT—a flexible tokenisation
tool. In Proceedings of LREC-2000.
Claire Grover, Ben Hachey, Ian Hughson, and Chris Ko-
rycinski. 2003. Automatic summarisation of legal
documents. In Proceedings of ICAIL’03.
Ben Hachey and Claire Grover. 2004. A rhetorical sta-
tus classifier for legal text summarisation. In Proceed-
ings of the ACL-2004 Text Summarization Branches
Out Workshop.
Ben Hachey. 2002. Recognising clauses using symbolic
and machine learning approaches. Master’s thesis,
University of Edinburgh.
Hongyan Jing and Kathleen R. McKeown. 1999. The
decomposition of human-written summary sentences.
In Proceedings SIGIR’99.
Klaus Krippendorff. 1980. Content analysis: An Intro-
duction to its Methodology. Sage Publications.
Julian Kupiec, Jan O. Pedersen, and Francine Chen.
1995. A trainable document summarizer. In Proceed-
ings of SIGIR’95.
Yon Maley. 1994. The language of the law. In John Gib-
bons, editor, Language and the Law. Longman.
Inderjeet Mani and Eric Bloedorn. 1998. Machine learn-
ing of generic and user-focused summarization. In
Proceedings of AAAI’98.
William Mann and Sandra Thompson. 1987. Rhetorical
structure theory: Description and construction of text
structures. In Gerard Kempen, editor, Natural Lan-
guage Generation. Marinus Nijhoff Publishers.
Daniel Marcu. 1999. The automatic construction of
large-scale corpora for summarization research. In
Proceedings of SIGIR’99.
David McKelvie. 1999. XMLPERL 1.0.4 XML
processing software. http://www.cogsci.ed.
ac.uk/˜dmck/xmlperl.
Andrei Mikheev. 1997. Automatic rule induction for
unknown word guessing. Computational Linguistics,
23(3).
Guido Minnen, John Carroll, and Darren Pearce. 2000.
Robust, applied morphological generation. In Pro-
ceedings of INLG’2000.
Greg Myers. 1992. In this paper we report... - speech
acts and scientific facts. Journal of Pragmatics, 17(4).
Erik Tjong Kim Sang and Herv´e D´ejean. 2001. Intro-
duction to the CoNLL-2001 shared task: clause iden-
tification. In Proceedings of CoNLL-2001.
John M. Swales. 1990. Genre Analysis: English in Aca-
demic and Research Settings. Cambridge University
Press.
Simone Teufel and Marc Moens. 1997. Sentence ex-
traction as a classification task. In Proceedings of the
ACL/EACL’97 Workshop on Intelligent and scalable
Text summarization.
Simone Teufel and Marc Moens. 1999. Argumenta-
tive classification of extracted sentences as a first step
towards fexible abstracting. In Inderjeet Mani and
Mark T. Maybury, editors, Advances in Automatic Text
Summarization. MIT Press.
Simone Teufel and Marc Moens. 2002. Summarising
scientific articles—experiments with relevance and
rhetorical status. Computational Linguistics, 28(4).
Henry Thompson, Richard Tobin, David McKelvie, and
Chris Brew. 1997. LT XML—software API and
toolkit for XML processing. http://www.ltg.
ed.ac.uk/software/.
