File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1007_intro.xml
Size: 4,530 bytes
Last Modified: 2025-10-06 14:02:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1007"> <Title>A Rhetorical Status Classifier for Legal Text Summarisation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Law reports form an interesting domain for automatic summarisation. They are texts which record the proceedings of a court and, due to the role that precedents play in English law, easy access to them is essential for a wide range of people. For this reason, they are frequently manually summarised by legal experts, with summaries varying according to target audience (e.g. students, solicitors).</Paragraph> <Paragraph position="1"> In the SUM project, we are exploring methods for generating flexible summaries of legal documents, taking as our point of departure the Teufel and Moens (2002; 1999a; 1999b) approach to automatic summarisation (henceforth T&M). We have chosen to work with law reports for three main reasons: (a) the existence of manual summaries means that we have evaluation material for the final summarisation system; (b) the existence of differing target audiences allows us to explore the issue of tailored summaries; and (c) the texts have much in common with the academic papers that T&M worked with, while remaining challengingly different in many respects.</Paragraph> <Paragraph position="2"> Our general aims are comparable with those of the SALOMON project (Moens et al., 1997), which also deals with summarisation of legal texts, but our choice of methodology is designed to test the portability of the T&M approach to a new domain.</Paragraph> <Paragraph position="3"> The T&M approach is an instance of what Sp&quot;arck Jones (1999) terms text extraction where a summary typically consists of sentences selected from the source text, with some smoothing to increase the coherence between the sentences. Since the academic texts they use are rather long and the aim is to produce flexible summaries of varying length and for various audiences, T&M go beyond simple sentence selection and classify source sentences according to their rhetorical status (e.g. a description of the main result, a criticism of someone else's work, etc.). With sentences classified in this manner, different kinds of summaries can be generated.</Paragraph> <Paragraph position="4"> Sentences can be reordered, since they have rhetorical roles associated with them, or they can be suppressed if a user is not interested in certain types of rhetorical roles.</Paragraph> <Paragraph position="5"> In the second stage of our project we will explore techniques for sentence selection. Following the T&M methodology, we will annotate sentences in the corpus for 'relevance'. For our corpus we hope to be able to compute relevance by using automatic techniques to pair up sentences from manually created abstracts with sentences in the source text. The addition of this layer of annotation will provide the training and testing material for sentence extraction, with the rhetorical role labels helping to constrain the type of summary generated.</Paragraph> <Paragraph position="6"> In this paper we focus on our rhetorical status classifier. This is a key part of the summarisation process and our work can be thought of as a test of portability of the T&M approach to a new domain.</Paragraph> <Paragraph position="7"> At the same time, our methods differ in important respects from those of T&M and in reporting our work we will attempt to draw comparisons wherever possible.</Paragraph> <Paragraph position="8"> In Section 2 we describe the House of Lords corpus we have gathered and annotated. We explain the rhetorical role annotation scheme that we have developed and contrast it with the T&M scheme for academic articles. We provide inter-annotation agreement results for the annotation scheme. In Section 2.3 we give an overview of the tools and techniques we have used in the automatic linguistic processing of the judgments. Section 3 describes our sentence classifier. In Section 3.1 we review the kinds of features that can be used by a classifier and describe the set of features used in our experiments.</Paragraph> <Paragraph position="9"> In Section 3.2 we present the results of experiments with four classifiers and discuss the relative effectiveness of the methods and the feature sets. Finally, in Section 4 we draw some conclusions and outline future work.</Paragraph> </Section> class="xml-element"></Paper>