File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0505_metho.xml
Size: 31,201 bytes
Last Modified: 2025-10-06 14:08:24
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0505"> <Title>Summarising Legal Texts: Sentential Tense and Argumentative Roles</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Automatic Summarisation 2.1 Background </SectionTitle> <Paragraph position="0"> Much of the previous NLP work in the legal domain concerns Information Retrieval (IR) and the computation of simple features such as word frequency. In order to perform summarisation, it is necessary to look at other features which may be characteristic of texts in general and legal texts in particular. These can then serve to build a model for the creation of legal summaries (Moens and Busser, 2002). In our project, we are developing an automatic summarisation system based on the approach of Teufel and Moens. The core component of this is a statistical classifier which categorises sentences in order that they might be seen as candidate text excerpts to be used in a summary. Useful features might include standard IR measures such as word frequency but other highly informative features are likely to be ones which reflect linguistic properties of the sentences.</Paragraph> <Paragraph position="1"> The texts we are currently exploring are judgments of the House of Lords, a domain we refer to here as HOLJ1.</Paragraph> <Paragraph position="2"> These texts contain a header providing structured information, followed by a sequence of sometimes lengthy judgments consisting of free-running text. Each Law Lord gives his own opinion, so in later phases of this project we will create a strategy for what is effectively multi-document summarisation. The structured part of the document contains information such as the respondent, appellant and the date of the hearing. While this might constitute some part of a summary, it is also necessary to pick out an appropriate number of relevant informative sentences from the unstructured text in the body of the document. This paper focuses on the mixture of statistical and linguistic techniques which aid the determination of the function or importance of a sentence.</Paragraph> <Paragraph position="3"> Previous work on summarisation has concentrated on the domain of scientific papers. This has lent itself to automatic text summarisation because documents of this genre tend to be structured in predictable ways and to contain formalised language which can aid the summarisation process (e.g. cue phrases such as 'the importance of', 'to summarise', 'we disagree') (Teufel and Moens, 2002), (Teufel and Moens, 2000). Although there is a significant distance in style between scientific articles and legal texts, we have found it useful to build upon the work of Teufel and Moens (Teufel and Moens, 2002; Teufel and Moens, 1997) and to pursue the methodology of investigating the usefulness of a range of features in determining the argumentative role of a sentence.</Paragraph> <Paragraph position="4"> Sp&quot;arck Jones (1999) has argued that most practically oriented work on automated summarisation can be classified as either based on text extraction or fact extraction. 1Accessible on the House of Lords website, http://www.</Paragraph> <Paragraph position="5"> parliament.uk/judicial_work/judicial_work.cfm When automated summarisation is based on text extraction, an abstract will typically consist of sentences selected from the source text, possibly with some smoothing to increase the coherence between the sentences. The advantage of this method is that it is a very general technique, which will work without the system needing to be told beforehand what might be interesting or relevant information. But general methods for identifying abstractworthy sentences are not very reliable when used in specific domains, and can easily result in important information being overlooked. When summarisation is based on fact extraction, on the other hand, the starting point is a predefined template of slots and possible fillers. These systems extract information from a given text and fill out the agreed template. These templates can then be used to generate shorter texts: material in the source text not of relevance to the template will have been discarded, and the resulting template can be rendered as a much more succinct version of the original text. The disadvantage of this methodology is that the summary only reflects what is in the template.</Paragraph> <Paragraph position="6"> For long scientific texts, it does not seem feasible to define templates with a wide enough range, however sentence selection does not offer much scope for re-generating the text into different types of abstracts. For these reasons, Teufel and Moens experimented with ways of combining the best aspects of both approaches by combining sentence selection with information about why a certain sentence is extracted--e.g. is it a description of the main result, or an important criticism of someone else's work? This approach can be thought of as a more complex variant of template filling, where the slots in the template are high-level structural or rhetorical roles (in the case of scientific texts, these slots express argumentative roles like main goal and type of solution) and the fillers are sentences extracted from the source text using a variety of statistical and linguistic techniques exploiting indicators such as cue phrases. With this combined approach the closed nature of the fact extraction approach is avoided without giving up its flexibility: summaries can be generated from this kind of template without the need to reproduce extracted sentences out of context. Sentences can be reordered, since they have rhetorical roles associated with them; some can be suppressed if a user is not interested in certain types of rhetorical roles.</Paragraph> <Paragraph position="7"> The argumentative roles which Teufel and Moens settled upon for the scientific domain (Teufel and Moens, 1999) consist of three main categories: BACKGROUND: sentences which describe some (generally accepted) background knowledge.</Paragraph> <Paragraph position="8"> OTHER: sentences which describe aspects of some specific other research in a neutral way.</Paragraph> <Paragraph position="9"> OWN: sentences which describe any aspect of the work presented in the current paper.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Summarisation of HOLJ Texts </SectionTitle> <Paragraph position="0"> Judgments of the House of Lords are based on facts that have already been settled in the lower courts so they constitute a genre given over to largely unadulterated legal reasoning. Furthermore, being products of the highest court in England2, they are of major importance for determining the future interpretation of English law. The meat of a decision is given in the opinions of the Law Lords, at least one of which is a substantial speech. This often starts with a statement of how the case came before the court. Sometimes it will move to a recapitulation of the facts, moving on to discuss one or more points of law, and then offer a ruling.</Paragraph> <Paragraph position="1"> The methodology we implement is based on the approach used for the summarisation of scientific papers as described above, the first two steps of which can be summarised as follows: Task 1. Decide which argumentative roles are important in the source text and are of use in the abstract.</Paragraph> <Paragraph position="2"> Task 2. In a collection of relevant texts, decide for every sentence which argumentative role best describes it; this process is called &quot;argumentative zoning&quot;.</Paragraph> <Paragraph position="3"> Our annotation scheme, like our general approach, is motivated by successful incorporation of rhetorical information in the domain of scientific articles. Teufel et al. (1999) argue that regularities in the argumentative structure of a research article follow from the authors' primary communicative goal. In scientific texts, the author's goal is to convince their audience that they have provided a contribution to science. From this goal follow highly predictable sub-goals, the basic scheme of which was introduced in section 2.1 For the legal domain, the communicative goal is slightly different; the author's primary communicative goal is to convince his/her peers that their position is legally sound, having considered the case with regards to all relevant points of law. A different set of sub-goals follows (refer to Table 1).3 We annotated five randomly selected appeals cases for the purpose of preliminary analysis of our linguistic features. These were marked-up by a single annotator, who assigned a rhetorical label to each sentence. As well as providing a top-level OTHER, we asked the annotator to consider a number of sub-moves for our initial study of the HOLJ domain. These form a hierarchy of rhetorical content allowing the annotator to 'fall-back' to the basic scheme if they cannot place a sentence in a particu- null turns out to be similar to one which was conceived of for work on legal summarisation of Chinese judgment texts (Cheung et al., 2001).</Paragraph> <Paragraph position="4"> BACK- Generally accepted background knowledge: GROUND sentences containing law, summary of law, history of law, and legal precedents.</Paragraph> <Paragraph position="5"> CASE Description of the case including the events leading up to legal proceedings and any summary of the proceedings and decisions of the lower courts.</Paragraph> <Paragraph position="6"> OWN Statements that can be attributed to the Lord speaking about the case. These include interpretation of BACKGROUND and CASE, argument, and any explicit judgment as to whether the appeal should be allowed tinguished in our preliminary annotation experiments.</Paragraph> <Paragraph position="7"> lar sub-move. The following describes the sub-categories we posit in the HOLJ domain and believe will be of use in flexible abstracting: BACKGROUND a0 PRECEDENT - Does the sentence describe a previous case or judgment apart from the proceedings for the current appeal? E.g. &quot;This was recognised in Lord Binning, Petitioner 1984 SLT 18 when the First Division held that for the purposes of section 47, the date of the relevant trust disposition or settlement or other deed of trust was the date of its execution....&quot; a0 LAW - Does the sentence contain public statutes? Does the sentence contain a summary or speak to the history of statutes? E.g. &quot;Section 12 (3A) begins with the words: &quot;In determining for the purposes of this section whether to provide assistance by way of residential accommodation to a person....&quot; CASE a0 EVENT - Does the sentence describe the events that led up to the beginning of the legal proceedings? E.g. &quot;The appellant lived at 87 Main Street, Newmills until about April 1998.&quot; a0 LOWER COURT DECISION - Does the sentence describe or summarise decisions or proceedings from the lower courts? E.g. &quot;Immediately following Mr Fitzgerald's dismissal IMP brought proceedings and obtained a Mareva injunction against</Paragraph> <Paragraph position="9"> to whether the appeal should be allowed? E.g. &quot;For the reasons already given I would hold that VAT is payable in the sum of PS1.63 in respect of postage and I would allow the appeal.&quot; a0 INTERPRETATION - Does the sentence contain an interpretation of BACKGROUND or CASE items? E.g. &quot;The expression 'aids' in section 33(1) is a familiar word in everyday use and it bears no technical or special meaning in this context.&quot; a0 ARGUMENT - Does the sentence state the question at hand, apply points of law to the current case, or otherwise present argument which is to form the basis of a ruling? E.g. &quot;The question is whether the direction which it contains applies where the local authority are considering whether to provide a person with residential accommodation with nursing under section 13A.&quot;</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Linguistic Analysis </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Processing with XML-Based Tools </SectionTitle> <Paragraph position="0"> As described in Section 2.2, the sentences in our small pilot corpus were hand annotated with labels reflecting their rhetorical type. This annotation was performed on XML versions of the original HTML texts downloaded from the House of Lords website. In this section we describe the use of XML tools in the conversion from HTML and in the linguistic annotation of the documents.</Paragraph> <Paragraph position="1"> A wide range of XML-based tools for NLP applications lend themselves to a modular, pipelined approach to processing whereby linguistic knowledge is computed and added as XML annotations in an incremental fashion. In processing the HOLJ documents we have built a pipeline using as key components the programs distributed with the LT TTT and LT XML toolsets (Grover et al., 2000), (Thompson et al., 1997) and the xmlperl program (McKelvie, 1999). The overall processing stages contained in our pipeline are shown in Figure 1.</Paragraph> <Paragraph position="2"> In the first stage of processing we convert from the source HTML to an XML format defined in a DTD, hol.dtd, which we refer to as HOLXML in Figure 1. The DTD defines a House of Lords Judgment as a J element whose BODY element is composed of a number of LORD elements. Each LORD element contains the judgment of one individual lord and is composed of a sequence of paragraphs (P elements) inherited from the original HTML.</Paragraph> <Paragraph position="3"> Once the document has been converted to this basic XML structure, we start the linguistic analysis by passing the data through a pipeline composed of calls to a variety of XML-based tools from the LT TTT and LT XML toolsets.</Paragraph> <Paragraph position="4"> The core program in our pipelines is the LT TTT program fsgmatch, a general purpose transducer which processes an input stream and rewrites it using rules provided in a hand-written grammar file, where the rewrite usually takes the form of the addition of XML mark-up. Typically, fsgmatch rules specify patterns over sequences of XML elements or use a regular expression language to identify patterns inside the character strings (PCDATA) which are the content of elements. The other main LT TTT program is ltpos, a statistical combined part-of-speech (POS) tagger and sentence identifier (Mikheev, 1997).</Paragraph> <Paragraph position="5"> The first step in the linguistic annotation process uses fsgmatch to segment the contents of the paragraphs into word tokens encoded in the XML as W elements. Once the word tokens have been identified, the next step uses ltpos to mark up the sentences as SENT elements and to add part of speech attributes to word tokens (e.g. <W C='NN'>opinion</W> is a word of category noun). Note that the tagset used by ltpos is the Penn Treebank tagset (Marcu et al., 1994).</Paragraph> <Paragraph position="6"> The following step performs a level of shallow syntactic processing known as &quot;chunking&quot;. This is a method of partially identifying constituent structure which stops short of the fully connected parse trees which are typically produced by traditional syntactic parsers/grammars. The output of a chunker contains &quot;noun groups&quot; which are similar to the syntactician's &quot;noun phrases&quot; except that post-head modifiers are not included. It also includes &quot;verb groups&quot; which consist of contiguous verbal elements such as modals, auxiliaries and main verbs. To illustrate, the sentence &quot;I would allow the appeal and make the order he proposes&quot; is chunked in this way:4 <NG>I</NG> <VG>would allow</VG> <NG>the appeal</NG> and <VG>make</VG> <NG>the order</NG> <NG>he</NG> <VG>proposes</VG> The method we use for chunking is another use of fsgmatch, utilising a specialised hand-written rule set for noun and verb groups.</Paragraph> <Paragraph position="7"> Once verb groups have been identified we use another fsgmatch grammar to analyse the verb groups and encode information about tense, aspect, voice and modality in attributes on the VG elements. Table 2 gives some examples of verb groups and their analysis.</Paragraph> <Paragraph position="8"> The final stage in the process is the step described in detail in Section 3.2, namely the process of identifying which verb group is the main verb group in the sentence.</Paragraph> <Paragraph position="9"> We call this process from our pipeline using xmlperl to pass each sentence in turn to the main verb identifier and to receive its verdict back and encode it in the XML as the value of the MV attribute on sentence elements. Figure 2 shows a small part of one of our documents after it has been fully processed by the pipeline.5</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Clause and Main Verb Identification </SectionTitle> <Paragraph position="0"> The primary method for identifying the main verb and thus the tense of a sentence is through the clause structure. We employ a probabilistic clause identifier to do this. This section gives an overview of the clause identification system and then describes how this information is incorporated into the main verb identification algorithm.</Paragraph> <Paragraph position="1"> The clause identifier was built as part of a postconference study (Hachey, 2002) of the CoNLL-2001 shared task (Sang and D'ejean, 2001). CoNLL (Conference on Natural Language Learning) is a yearly meeting of researchers interested in using machine learning to solve problems in natural language processing. Each year an outstanding issue in NLP is the focus of the shared task portion of the conference. The organisers make some data set available to all participants and specify how they are to be evaluated. This allows a direct comparison of a number of different learning approaches to a specific problem. As we will report, the system we have built ranks among the top designed for 2001 shared task of clause identification.</Paragraph> <Paragraph position="2"> The clause identification task is divided into three phases. The first two are classification problems similar to POS tagging where a label is assigned to each word depending on the sentential context. In phase one, we predict for each word whether it is likely that a clause starts at that position in the sentence. In phase two, we predict clause ends. In the final step, phase three, an embedded clause structure is inferred from these start and end predictions.</Paragraph> <Paragraph position="3"> The first two phases are approached as straightforward classification in a maximum entropy framework (Berger et al., 1996). The maximum entropy algorithm produces a distribution pa0 a1a3a2xa4 ca5 based on a set of labelled training examples, where a2x is the vector of active features. In evaluation mode, we select the class label c that maximises pa0 .</Paragraph> <Paragraph position="4"> The features we use include words, part-of-speech tags, and chunk tags within a set window. The classifier also incorporates features that generalise about long distance dependencies such as sequential patterns of individual attributes. Consider the task of predicting whether a clause starts at the word which in the following sentence:6 null Part IV ... is of obvious importance if the Act is to have the teeth which Parliament doubtless intended it should.</Paragraph> <Paragraph position="5"> The fact that there is this subordinating conjunction at the current position followed by a verb group (intended) to the right gives much stronger evidence than if we only looked at the word and its immediate context.</Paragraph> <Paragraph position="6"> The more difficult part of the task is inferring clause segmentation from the predicted starts and ends. This does not translate to a straightforward classification task as the resulting structure must be a properly embedded and more than one actual clause may begin (or terminate) at a start (or end) position predicted in the previous two phases. Because of the limited amount of labelled train- null ing material, we run into data sparsity problems if we try to predict 3 or more starts at a position.</Paragraph> <Paragraph position="7"> To deal with this, we created a maximum entropy model whose sole purpose was to provide confidence values for potential clauses. This model uses features similar to those described above to assign a probability to each clause candidate (defined as all ordered combinations of phase one start points and phase two end points). The actual segmentation algorithm then chooses clause candidates one-by-one in order of confidence. Remaining candidates that have crossed brackets with the chosen clause are removed from consideration at each iteration.</Paragraph> <Paragraph position="8"> We obtained a further improvement (our F score increased from 73.94 to 76.99) by training on hand-annotated POS and chunk data from the Treebank. Table 3 compares precision, recall, and F scores for our system with CoNLL-2001 results training on sections 15-18 of the Penn Treebank and testing on section 21 (Marcus et al., 1993). The F score is more than 10 points above the average scores, failing to surpass only the best performing CoNLL system.</Paragraph> <Paragraph position="9"> Once clause boundaries have been determined, they are used to identify a sentence's main verb group. A verb group that is at the top level according to the clause segmentation is considered a stronger candidate than any embedded verb group (i.e. a verb group that is part of a subordinate clause). In addition, there are several other heuristics encoded in the algorithm. These sanity checks watch for cases in which the complex clause segmenting algorithm described above misses certain strong formal indicators of subordination. First, we consider whether or not a verb group is preceded by a subordinating conjunction (e.g. that, which) and there is no other verb group between the subordinator and the current verb group.</Paragraph> <Paragraph position="10"> Second, we consider whether a verb group starts with a participle or infinitive to (e.g. provided in &quot;accommodation provided for the purpose of restricting liberty&quot;, to in &quot;counted as a relevant period to be deducted&quot;). These heuristics are in the following ranked order (those closer to the beginning of the list being more likely characteristics of a main verb group): 1. Does not occur within an embedded clause, is not preceded by a subordinating conjunction, does not start with a participial or infinitival verb form.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> PRECISION RECALL Fb </SectionTitle> <Paragraph position="0"> 2. Does occur within an embedded clause, is not preceded by a subordinating conjunction, does not start with a participial or infinitival verb form.</Paragraph> <Paragraph position="1"> 3. Does not occur within an embedded clause, is preceded by a subordinating conjunction.</Paragraph> <Paragraph position="2"> 4. Does not occur within an embedded clause, does start with a participial or infinitival verb form.</Paragraph> <Paragraph position="3"> 5. Does occur within an embedded clause, is preceded by a subordinator.</Paragraph> <Paragraph position="4"> 6. Does occur within an embedded clause, does start with a participial or infinitival verb form.</Paragraph> <Paragraph position="5"> We also observed in the corpus that verb groups closer to the beginning of a sentence are more likely to be the main verb group. Therefore we weight verb groups slightly according to their sentence position in order to prefer those closer to the beginning of a sentence within a given category. Scores for main verb group identification are presented below in the results section below.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Results </SectionTitle> <Paragraph position="0"> As mentioned above, the current work has concentrated on identifying the rhetorical structure of the HOLJ domain. In studying this structure, we have begun looking for formal indicators of rhetorical categories. The linguistic analysis described in the previous sections is motivated by an observation that tense may be a useful feature.</Paragraph> <Paragraph position="1"> Specifically, it was observed in the corpus that sentences belonging to the CASE rhetorical role are nearly always in the past tense while sentences belonging to the other rhetorical categories are very seldom in the past tense.</Paragraph> <Paragraph position="2"> Here, we report a preliminary analysis of this relationship. An empirical study of the annotated files reported in section 2.2 provides the starting point for these tasks.</Paragraph> <Paragraph position="3"> Our identification of the inflection for a sentence depends on the tools described in sections 3.1 and 3.2 above. These consist of (1) identifying the tense of verb groups, and (2) identifying the main verb group. Results for these two steps of automatic linguistic analysis calculated from a sample of 100 sentences from the HOLJ corpus are summarised in Table 4.7 7For main verb group identification, we report scores that take points away for missing coordinated main verbs. This is For the evaluation of verb group tense identification, we report scores for identifying past and present, defined by the tense, aspect, and modality features on verb groups as follows: past: TENSE=PAST, ASPECT=SIMPLE, MOD=NO pres: TENSE=PRES, ASPECT=SIMPLE, MOD=NO The source of errors for tense identification is mainly due to errors in the POS and chunking phases. In the case of past tense, the POS tagger has difficulty identifying past participles because of their similarity to simple past tense verbs. Performance for present tense verbs is lower because they are more easily mistaken for, say, nouns with the same spelling. For example, there were two errors in our sample where the verb falls was tagged as a noun and assigned to a noun group chunk instead of a verb group.</Paragraph> <Paragraph position="4"> The main verb group identification algorithm considers only verb groups assigned by the chunker, whether they are true verb groups or not. Thus, these scores also reflect the algorithm's ability to deal with noise introduced in earlier stages.8 One obvious problem is that the algorithm is thus not capable of identifying a verb group as being main if the chunker does not identify it at all. The primary source of errors in the remaining sentences are also propagated from earlier stages in the pipeline. The six cases where the algorithm did not identify the main verb group can be attributed to bad part-of-speech tags, bad chunk tags, or poor clause segmentation.</Paragraph> <Paragraph position="5"> Teufel et al. (1999) do not explicitly use tense information in their heuristic categories. They also point out that their process of identifying indicator phrases is completely manual. Our integration of linguistic analysis techniques allows us to automate the availability of certain linguistic features we think will be useful in sentence extraction and rhetorical classification.</Paragraph> <Paragraph position="6"> Our analysis not only makes available information about the tense of the main verb, but all the acquired annotation from intermediate steps: part-of-speech tags, chunk tags, clause structure, and tense information for all verb groups. To illustrate the utility of tense information, we will look at the relationship between our main rhetorical categories and simple present and past tense.</Paragraph> <Paragraph position="7"> probably too strict an evaluation as like constituents tend to be coordinated meaning that the tense of a sentence can normally be identified from just one of the top-level main verb phrases. nore sentences that are not properly segmented (i.e. part of a sentence is missing or more material is included in a sentence than there should be). In these cases, the actual main verb group may or may not be present when the main verb identification algorithm is run. Sentence segmentation is an interesting problem in its own right. A state-of-the-art approach is included in our XML pipeline (Mikheev, 2002). Though we may get slightly better performance if we tailor the segmentation algorithm to our domain, in a random sample of 100 sentences, there were only 4 cases of bad segmentation.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> BACKGROUND CASE OWN </SectionTitle> <Paragraph position="0"> rhetorical scheme and sentential tense information.</Paragraph> <Paragraph position="1"> The correlation coefficient is a statistical measure of 'related-ness'. Values fall in the range a0a2a1 1a3 0a4 1a3 0a4 , where a1 1 means the variables are always different, 0 means the variables are not correlated, and 1 means the variables are always the same. Table 5 presents correlation scores between our basic rhetorical scheme and verb tense.</Paragraph> <Paragraph position="2"> For illustrative purposes, we will focus on identifying the CASE rhetorical move. There is a moderate positive correlation between sentences determined to be past tense and sentences marked as belonging to the case rhetorical category. Also, present tense and the CASE rhetorical move have a moderate negative correlation. This suggests two features based on our linguistic analysis that will help a statistical classifier identify the CASE rhetorical move: (1) the sentence is past tense, and (2) the sentence is not present tense. Furthermore, comparing rows indicates that these are both good discriminative indicators. In the case of past tense, there is a positive correlation with the CASE rhetorical move while there is a very weak negative correlation with BACKGROUND and a slightly stronger negative correlation with OWN.</Paragraph> <Paragraph position="3"> These results also illustrate the complexity of tense information. In order to identify simple past tense sentences, we look to see if the TENSE attribute of the main verb group has the value PAST, the ASPECT attribute has the value SIMPLE and the MODAL attribute has the value NO. Feature construction techniques offer a means for automatic discovery of complex features of higher relevance to a concept being learned. Employing machine learning approaches that are capable of modelling dependencies among features (e.g. maximum entropy) is another way to deal with this.</Paragraph> </Section> class="xml-element"></Paper>