File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1007_concl.xml
Size: 3,094 bytes
Last Modified: 2025-10-06 13:54:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1007"> <Title>A Rhetorical Status Classifier for Legal Text Summarisation</Title> <Section position="5" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> We have presented new work on the summarisation of legal texts for which we are developing a new corpus of UK House of Lords judgments with detailed linguistic markup in addition to rhetorical status and sentence extraction annotation.</Paragraph> <Paragraph position="1"> We have effectively laid the ground work for detailed experiments with robust and generic methods for capturing cue phrase information. This is favourable as it can be automatically ported to new text summarisation domains where the tools are available for linguistic analysis, as opposed to relying on cue phrases which need to be hand-crafted for each domain. Hand-crafted cue phrase lists are necessarily more fragile and more susceptible to over-fitting in large-scale applications.</Paragraph> <Paragraph position="2"> Future experiments will use maximum entropy modelling to incorporate our diverse range of sparse linguistic and textual features. We plan to experiment with maximum entropy for sentence-level rhetorical status prediction in both standard classification and sequence modelling frameworks.</Paragraph> <Paragraph position="3"> We also intend to incorporate bootstrapped named entity recognition systems. While generic linguistic analysis tools (e.g. part-of-speech tagging, chunking) are easy to come by in many languages, domain-specific named entity recognition is not. We have invested a considerable amount of time in writing named entity rules by hand for the HOLJ domain. However, current research is investigating methods for bootstrapping named entity systems from small amounts of seed data. Effective methods will make our linguistic features fully domain-independent for domains and languages where linguistic analysis tools are available. For future work, we are considering active learning and co-training. Active learning (Cohn et al., 1994) would seem the appropriate starting point for our task as we currently have no gold standard data but we do have annotation resources. We may also benefit from co-training (Blum and Mitchell, 1998) and rule induction (Riloff and Jones, 1999) with the seed data set from the initial annotation for active learning.</Paragraph> <Paragraph position="4"> We have also performed a preliminary experiment with hypernym features for subject and verb lemmas which should allow better generalisation over cue phrase information. This is a rather noisy feature as we are not performing word sense disambiguation, but adding all WordNet hypernyms of the first three senses as features. Nevertheless, this has shown an improvement with the na&quot;ive Bayes classifier from 24.75 for the cue phrase features sets (minus lemma features) to 27.45 when hypernyms are included. Future work will further investigate hypernym features.</Paragraph> </Section> class="xml-element"></Paper>