File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0203_metho.xml
Size: 29,993 bytes
Last Modified: 2025-10-06 14:10:33
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0203"> <Title>Parser Errors</Title> <Section position="5" start_page="1" end_page="20" type="metho"> <SectionTitle> 2 Definitions in Legal Language </SectionTitle> <Paragraph position="0"> Two central kinds of knowledge contained in the statutes of a code law system are normative knowledge, connecting legal consequences to descriptions of certain facts and situations, and terminological knowledge, consisting in definitions of some of the concepts used in these descriptions (Valente and Breuker 1994).</Paragraph> <Paragraph position="1"> Normative content is exemplified by (1), parts of section 324a of the German criminal law. The legal consequence consisting in the specified punishment is connected to the precondition of soil pollution: (1) Whoever (...) allows to penetrate or releases substances into the soil and thereby pollutes it or otherwise detrimentally alters it: 1. in a manner that is capable of harming (...) property of significant value or a body of water (...) shall be punished with imprisonment for not more than five years or a fine.</Paragraph> <Paragraph position="2"> Terminological knowledge consists in definitions of concepts used to describe the sanctioned facts. E.g., soil is defined in article 2 of the German soil protection law as follows: (2) Soil within the meaning of this Act is the upper layer of the earth's crust (...) including its liquid components (soil solution) and gaseous components (soil air), except groundwater and beds of bodies of water.</Paragraph> <Paragraph position="3"> If the definitions contained in statutes would fully specify how the relevant concepts are to be applied, cases could be solved (once the relevant statutes have been identified) by mechanically checking which of some given concepts apply, and then deriving the appropriate legal consequences in a logical conclusion. However such a simple procedure is never possible in reality. Discussions in courts (and consequently in all legal texts that document court decisions) are in large parts devoted to pinning down whether certain concepts apply. Controversies often arise because not all relevant concepts are defined at all within statutes, and because the terms used in legal definitions are often in need of clarification themselves. For instance it may be unclear in some cases what exactly counts as the bed of a body of water mentioned in Example (2). Additionally, reality is complex and constantly changing, and these changes also pertain to the applicability of formerly clear-cut concepts. While this is especially true of social reality, rather physical concepts may also be affected. An often cited example is a case where the German Reichsgericht had to decide whether electricity was to be counted as a thing.</Paragraph> <Paragraph position="4"> At the heart of these difficulties lies the fact that statutes are written in natural language, not in a formalized or a strongly restricted specialized language. It is widely assumed in the philosophical literature that most natural language concepts do not lend themselves to definitions fixing all potential conditions of applicability a priori. From the point of view of legal theory this open-textured character of natural language concepts is often seen as essential for the functioning of any legal system (the term open texture was introduced into this discussion by (Hart 1961)).</Paragraph> <Paragraph position="5"> The use of natural language expressions allows for a continuous re-adjustment of the balance between precision and openness. This possibility is needed to provide regulations that are on the one hand reliable and on the other hand flexible enough to serve as a common ground for all kinds of social interaction. For the solution of concrete cases, the concepts made available within statute texts are supplemented by further definitions (in a wide sense, covering all kinds of modification and adaptation of concepts) given in the courts' decisions (in particular within the reasons for judgement). Such definitions for instance fix whether a certain stretch of sand counts as the bed of a body of water or if something is of significant value in the case at hand. These definitions are generally open for later amendment or revision. Still they almost always remain binding beyond the case at hand.</Paragraph> <Paragraph position="6"> Easy access to definitions in decisions is therefore of great importance to the legal practitioner. Sections 3 and 4 show how computational linguistic analysis helps answering this need by enabling an accurate search for definitions in a large collection of court decisions. Accurate definition extraction is a prerequisite to building up an information system that allows for concept-centred access to the interpretational knowledge spread over tens of thousands of documents produced by courts every year.</Paragraph> <Paragraph position="7"> Definitions are however not only of direct value as a source of information in legal practice. They also provide contexts that contain particularly much relevant terminology, and are therefore a good place to search for concepts to be integrated in a domain ontology. Given the importance and frequency of definitions in legal text, such an approach seems particularly promising for this domain. Section 5 describes how automatically extracted definitions improve the results of a standard ontology learning method.</Paragraph> </Section> <Section position="6" start_page="20" end_page="21" type="metho"> <SectionTitle> 3 Structure of Definitions </SectionTitle> <Paragraph position="0"> Our current work is based on a collection of more than 6000 verdicts in environmental law.</Paragraph> <Paragraph position="1"> As a starting point however we conducted a survey based on a random selection of 40 verdicts from various legal fields (none of them is in our present test set), which contained 130 definitions.</Paragraph> <Paragraph position="2"> Inspection of these definitions has shown a range of common structural elements, and has allowed us to identify typical linguistic realizations of these structural elements. We will illustrate this with the example definition given in (3): die Haustrennwand einschalig errichtet wurde] (...).</Paragraph> <Paragraph position="3"> (One-family row-houses have insufficient noise insulation if the separating wall is one-layered.) This definition contains: 1. The definiendum, i.e. the element that is defined (unzureichender Schallschutz - insufficient noise insulation).</Paragraph> <Paragraph position="4"> 2. The definiens, i.e. the element that fixes the meaning to be given to the definiendum (die Haustrennwand einschalig errichtet wurde - the separating wall is one-layered).</Paragraph> <Paragraph position="5"> Apart from these constitutive parts, it contains: 3. A connector, indicating the relation between definiendum and definiens (liegt...vor, wenn, have..., if).</Paragraph> <Paragraph position="6"> 4. A qualification specifying a domain area of applicability, i.e. a restriction in terms of the part of reality that the regulation refers to (bei Einfamilienreihenhausern - one-family row-houses). 5. Signal words that cannot be assigned any clear function with regard to the content of the sentence, but serve to mark it as a definition (dann o). null The connector normally contains at least the predicate of the main clause, often together with further material (subjunction, relative pronoun, determiner). It not only indicates the presence of a definition. It also determines how definiens and definiendum are realized linguistically and often contains information about the type of the given definition (full, partial, by examples...). The linguistic realization of definiendum and definiens depends on the connector. One common pattern realizes the definiendum as the subject, and the definiens within a subclause. The domain area is often specified by a PP introduced by bei (&quot;in the field of&quot;, for), as seen in the example. Further possibilities are other PPs or certain subclauses. Signal words are certain particles (dann in the example), adverbs (e.g. begrifflich - conceptually) or nominal constructions containing the definiendum (e.g. der Begriff des..., the concept of...).</Paragraph> <Paragraph position="7"> Of course many definitions also contain further structural elements that are not present in Example (3). For instance certain adverbials or modal verbs modify the force, validity or degree of commitment to a definition (e.g. only for typical cases). The field of law within which the given definition applies is often specified as a PP containing a formal reference to sections of statutes or simply the name of a statute, document, or even a complete legal field (e.g. Umweltrecht - environmental law). Citation information for definitions is standardly included in brackets as a reference to another verdict by date, court, and reference number.</Paragraph> </Section> <Section position="7" start_page="21" end_page="24" type="metho"> <SectionTitle> 4 Automatic extraction of definitions </SectionTitle> <Paragraph position="0"> The corpus based pilot study discussed in the last section has on the one hand shown a broad linguistic variation among definitions in reasons for judgement. No simple account, for instance in terms of keyword spotting or pattern matching, will suffice to extract the relevant information from a significant amount of occurrences.</Paragraph> <Paragraph position="1"> On the other hand our survey has shown a range of structural uniformities across these formulations. This section discusses computational linguistic analysis techniques that are useful to identify and segment definitions based on these uniformities.</Paragraph> <Section position="1" start_page="21" end_page="22" type="sub_section"> <SectionTitle> 4.1 Linguistic Analysis </SectionTitle> <Paragraph position="0"> Our current work is based on a collection of more than 6000 verdicts in environmental law that were parsed using the Preds-parser (Preds stands for partially resolved dependency structure), a semantically-oriented parsing system that has been developed in the Saarbrucken Computational Linguistics Department within the project COLLATE. It was used there for information extraction from newspaper text (Braun 2003, Fliedner 2004). The Preds-parser balances depth of linguistic analysis with robustness of the analysis process and is therefore able to provide relatively detailed linguistic information even for large amounts of syntactically complex text.</Paragraph> <Paragraph position="1"> It generates a semantic representation for its input by a cascade of analysis components. Starting with a topological analysis of the input sentence, it continues by applying a phrase chunker and a named entity recognizer to the contents of the topological fields. The resulting extended topological structure is transformed to a semantic representation (called Preds, see above) by a series of heuristic rules. The Preds-format encodes semantic dependencies and modification relations within a sentence using abstract categories such as deep subject and deep object. This way it provides a common normalized structure for various surface realizations of the same content (e.g. in active or passive voice).</Paragraph> <Paragraph position="2"> The Preds-parser makes use of syntactic under-specification to deal with the problem of ambiguity. It systematically prefers low attachment in case of doubt and marks the affected parts of the result as default-based. Later processing steps are enabled to resolve ambiguities based on further information. But this is not necessary in general.</Paragraph> <Paragraph position="3"> Common parts of multiple readings can be accessed without having to enumerate and search through alternative representations. Figure 1 shows the parse for the definition in Example (3).</Paragraph> <Paragraph position="4"> The parser returns an XML-tree that contains this structure together with the full linguistic information accumulated during the analysis process.</Paragraph> </Section> <Section position="2" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 4.2 Search and processing </SectionTitle> <Paragraph position="0"> The structures produced by the Preds parser provide a level of abstraction that allows us to turn typical definition patterns into declarative extraction rules. Figure 2 shows one such extraction rule. It specifies (abbreviated) XPath-expressions describing definitions such as Example (3). The field query contains an expression characterising a sentence with the predicate vorliegen and a subclause that is introduced by the subjunction wenn (if). This expression is evaluated on the Preds of the sentences within our corpus to identify definitions. Other fields determine the locations containing the structural elements (such as definiendum, definiens and domain area) within the Preds of the identified definitions.</Paragraph> <Paragraph position="1"> The field filters specifies a set of XSLT-scripts used to filter out certain results. In the example we exclude definienda that are either pronominal (because we do not presently resolve anaphoric references) or definite (because these are often also anaphoric, or indicate that the sentence at hand is valid for that particular case only). Figure 3 shows how the definition in Example (3) is analyzed by this rule.</Paragraph> </Section> <Section position="3" start_page="22" end_page="23" type="sub_section"> <SectionTitle> 4.3 Evaluation </SectionTitle> <Paragraph position="0"> We currently use 33 such extraction rules based on the connectors identified in our pilot study, together with various kinds of filters.</Paragraph> <Paragraph position="1"> When applied to the reasons for judgement in all 6000 decisions (containing 237935 sentences) in our environmental law corpus, these rules yield 5461 hits before filtering (since not all patterns are mutually exclusive, these hits are all within 4716 sentences). After exclusion of pronominal and in some cases definite definienda (see above), as well as definienda containing stop-words (certain very common adjectives and nouns) the number of remaining hits decreases to 1486 (in 1342 sentences).</Paragraph> <Paragraph position="2"> A selection of 492 hits (in 473 sentences; all hits for rules with less than 20, at least 20 hits for others) was checked for precision by two annotators. The evaluation was based on a very inclusive concept of definition, covering many cases of doubt such as negative applicability conditions, legal preconditions or elaborations on the use of evaluative terms. Clear &quot;no&quot;-judgements were e.g. given for statements referring only to one particular case without any general elements, and for purely contingent statements. The overall agreement of the judgements given was relatively high, with an overall k of 0.835.</Paragraph> <Paragraph position="3"> Precision values within the checked hits vary considerably. However in both cases more than 50 % of all hits are by patterns that together still reach a precision of well above 70 % (Table 1).</Paragraph> </Section> <Section position="4" start_page="23" end_page="24" type="sub_section"> <SectionTitle> 4.4 Discussion </SectionTitle> <Paragraph position="0"> So far, our focus in selecting rules and filters has been on optimizing precision. As our present results show, it is possible to extract definitions at an interesting degree of precision and still achieve a reasonable number of hits. However we have not addressed the issue of recall systematically yet. The assessment of recall poses greater difficulties than the evaluation of the precision of search patterns. To our knowledge no reference corpus with annotated definitions exists. Building up such a corpus is time intensive, in particular because of the large amount of text that has to be examined for this purpose. Within the 3500 sentences of the 40 decisions examined in our pilot study mentioned above, we found only about 130 definitions. While this amount is significant from the perspective of information access, it is quite small from the annotator's point of view. Moreover it has become clear in our pilot study that there is a considerable amount of definitions that cannot be identified by purely linguistic features, and that many of these are unclear cases of particular difficulty for the annotator. The proportion of such problematic cases will obviously be much higher in free text annotation than in the evaluation of our extraction results, which were generated by looking for clear linguistic cues.</Paragraph> <Paragraph position="1"> Taking the ratio observed in our pilot study (130 definitions in 3500 sentences) as an orientation, the set of rules we are currently using is clearly far from optimal in terms of recall. It seems that a lot of relatively simple improvements can be made in this respect. A variety of obvious good patterns are still missing in our working set. We are currently testing a bootstrapping approach based on a seed of various noun-combinations taken from extracted definitions in order to acquire further extraction patterns. We hope to be able to iterate this procedure in a process of mutual bootstrapping similar to that described in (Riloff and Jones 1999).</Paragraph> <Paragraph position="2"> Moreover all presently employed rules use patterns that correspond to the connector-parts (cf. Section 3) of definitions. Accumulations of e.g. certain signals and modifiers may turn out to indicate definitions with equal precision. We identified a range of adverbial modifiers that are highly associated with definitions in the corpus of our pilot study, but we have not yet evaluated the effect of integrating them in our extraction patterns.</Paragraph> <Paragraph position="3"> We also assume that there is great potential for more fine-grained and linguistically sensitive filtering, such that comparable precision is achieved without losing so many results.</Paragraph> <Paragraph position="4"> Even with all of the discussed improvements however, the problem of definitions without clear linguistic indicators will remain. Heuristics based on domain specific information, such as citation and document structure (e.g. the first sentence of a paragraph is often a definition), may be of additional help in extending recall of our method to such cases.</Paragraph> <Paragraph position="5"> Apart from integrating further features in our extractors and using bootstrapping techniques for identifying new patterns, another option is to train classifiers for the identification of definitions based on parse features, such as dependency paths. This approach has for instance been used successfully for hypernym discovery (cf.</Paragraph> <Paragraph position="6"> Snow et al., 2005). For this task, WordNet could be used as a reference in the training and evaluation phase. The fact that no comparable reference resource is available in our case presents a great difficulty for the application of machine learning methods.</Paragraph> </Section> </Section> <Section position="8" start_page="24" end_page="31" type="metho"> <SectionTitle> 5 Ontology Extraction </SectionTitle> <Paragraph position="0"> Occurrence of a concept within a definition is likely to indicate that the concept is important for the text at hand. Moreover in court decisions, a great deal of the important (legal as well as sub-ject domain) concepts will in fact have at least some occurrences within definitions. This can be assumed because legal argumentation (as discussed in Section 2) characteristically proceeds by adducing explicit definitions for all relevant concepts. Definition extraction therefore seems to be a promising step for identifying concepts, in particular within legal text. This section discusses how extracted definitions can be used to improve the quality of text-based ontology learning from court decisions. For this purpose we first examine the results of a standard method identification of terms and potential classsubclass-relations through weighted bigrams and then look at the effect of combining this method with a filter based on occurrence within definitions.</Paragraph> <Section position="1" start_page="24" end_page="24" type="sub_section"> <SectionTitle> 5.1 Bigram Extraction </SectionTitle> <Paragraph position="0"> Adjective-noun-bigrams are often taken as a starting point in text based ontology extraction because in many cases they contain two concepts and one relation (see e.g. Buitelaar et al. 2004).</Paragraph> <Paragraph position="1"> The nominal head represents one concept, while adjective and noun together represent another concept that is subordinate to the first one. There are however obvious limits to the applicability of this concept-subconcept-rule: (1) It may happen that the bigram or even al null ready the nominal head on its own do not correspond to relevant concepts, i.e. that one or both of the denoted classes are of no particular relevance for the domain.</Paragraph> <Paragraph position="2"> (2) Not all adjective-noun-bigrams refer to a subclass of the class denoted by the head noun. Adjectives may e.g. be used redundantly, making explicit a part of the semantics of the head noun, or the combination may be non-compositional and therefore relatively unrelated to the class referred to by the head noun.</Paragraph> <Paragraph position="3"> For these reasons, extracted bigrams generally need to be hand-checked before corresponding concepts can be integrated into an ontology. This time-intensive step can be facilitated by providing a relevance-ranking of the candidates to be inspected. Such rankings use association measures known from collocation discovery (like kh , pointwise mutual information or log-likelihoodratios). But while the elements of a collocation are normally associated in virtue of their meaning, they do not necessarily correspond to a domain concept just by this fact. Moreover, many collocations are non-compositional. An association based ranking therefore cannot solve Problem (2) just mentioned, and only partially solves Problem (1). However it seems likely that the definiendum in a definition is a domain concept, and for the reasons discussed in Section 2, it can be assumed that particularly many concepts will in fact occur within definitions in the legal domain. In order to investigate this hypothesis, we extracted all head-modifier pairs with nominal head and adjectival modifier from all parsed sentences in our corpus. We then restricted this list to only those bigrams occurring within at least one identified definiendum, and compared the proportion of domain concepts following the concept-subconcept-rule on both lists.</Paragraph> </Section> <Section position="2" start_page="24" end_page="25" type="sub_section"> <SectionTitle> 5.2 Unfiltered Extraction and Annotation </SectionTitle> <Paragraph position="0"> We found a total 165422 bigram-occurrences of 73319 types (in the following we use bigrams to refer to types, not to occurrences) within the full corpus. From this list we deleted combinations with 53 very frequent adjectives that are mainly used to establish uniqueness for definite reference (such as vorgenannt - mentioned above). All types with more than 5 occurrences were then ranked by log-likelihood of observed compared to independent occurrence of the bi-gram elements.</Paragraph> <Paragraph position="1"> The resulting list contains 4371 bigrams on 4320 ranks. Each bigram on the first 600 ranks of this list (601 bigrams, two bigrams share rank 529) was assigned one of the following five categories: 1. Environmental domain: Bigrams encoding concepts from the environmental domain (e.g. unsorted construction waste). These occur because our corpus deals with environmental law. 2. Legal domain: Bigrams encoding concepts from the legal domain. These range from concepts that are more or less characteristic of environmental law (e.g. various kinds of townplanning schemes) to very generic legal concepts (such as statutory prerequisite) 3. No subconcept: Bigrams that would be categorized as 1. or 2., but (typically for one of the reasons explained above) do not encode a subconcept of the concept associated with the head noun. An example is offentliche Hand (&quot;public hand&quot;, i.e. public authorities - a non-compositional collocation).</Paragraph> <Paragraph position="2"> 4. No concept: All bigrams that - as a bigram - do not stand for a domain concept (although the nominal head alone may stand for a concept).</Paragraph> <Paragraph position="3"> 5. Parser error: Bigrams that were obviously misanalysed due to parser errors.</Paragraph> <Paragraph position="4"> Figure 4 shows the distribution of categories among the 600 top-ranked bigrams, as well as within an additionally annotated 100 ranks towards the end of the list (ranks 3400-3500). 20 41 94 118 The ranking was calculated by the Ngram Statistics Package described in (Bannerjee and Pedersen 2003) For selecting the two categories of central interest, namely those of legal and environmental concepts to which the concept-subconcept rule applies, the ranking is most precise on the first few hundred ranks, and looses much of its effect on lower ranks. The percentage of such concepts decreases from 56% among the first 100 ranks to 51% among the first 200, but is roughly the same within the first 500 and 600 ranks (with even a slight increase, 45.6% compared to 46.8%). Even the segment from rank 3400 to 3500 still contains 39% of relevant terminology. There are no bigrams of the &quot;no subconcept&quot; category within this final segment. The explanation for this fact is probably that such bigrams (especially the non-compositional ones) are mostly established collocations and therefore show a particularly high degree of association.</Paragraph> <Paragraph position="5"> It must be noted that the results of our annotation have to be interpreted cautiously. They have not yet been double-checked and during the annotation process there turned out to be a certain degree of uncertainty especially in the subclassification of the various categories of concepts (1, 2 and 3). A further category for concepts with generic attributes (e.g. permissible, combining with a whole range of one-word terms) would probably cover many cases of doubt. The binary distinction between concepts and non-concepts in contrast was less difficult to make, and it is surely safe to conclude about general tendencies based on our annotation.</Paragraph> </Section> <Section position="3" start_page="25" end_page="31" type="sub_section"> <SectionTitle> 5.3 Filtering and Combined Approach </SectionTitle> <Paragraph position="0"> By selecting only those bigrams that occur within defienda, the 4371 items on the original list were were reduced to 227 (to allow for comparison, these were kept in the same order and annotated with their ranks as on the original list).</Paragraph> <Paragraph position="1"> Figure 5 shows how the various categories are distributed within the items selected from the top segments of the original list, as well as within the complete 227 filtering results.</Paragraph> <Paragraph position="2"> The proportion of interesting concepts reaches about 80% and is higher than 60% on the complete selection. This is still well above the 56% precision within the top 100-segment of the original list. However the restriction to a total of 227 results on our filtered list (of which only 145 are useful) means a dramatic loss in recall. This problem can be alleviated by leaving a top segment of the original list in place (e.g. the top 200 or 500 ranks, where precision is still at a tolerably high level) and supplementing it with the lower ranks from the filtered list until the desired number of items is reached. Another option is to apply the filtering to the complete list of extracted bigrams, not only to those that occur more than 5 times. We assume that a concept that is explicitly defined is likely to be of particular relevance for the domain regardless of its frequency. Hence our definition-based filter should still work well on concept candidates that are too infrequent to be considered at all in a log-likelihood ranking, and allow us to include such candidates in our selection, too.</Paragraph> <Paragraph position="3"> We investigated the effect of a combination of both methods just described. For this purpose, we first extracted all noun-adjective bigrams occurring within any of the identified definienda, regardless of their frequency within the corpus.</Paragraph> <Paragraph position="4"> After completing the annotation on the 627 resulting bigrams they were combined with various top segments of our original unfiltered list.</Paragraph> <Paragraph position="5"> Figure 6 shows the distribution of the annotated categories among the 627 bigrams from definienda, as well as on two combined lists.</Paragraph> <Paragraph position="6"> Cutoff 200/750 is the result of cutting the original list at rank 200 and filling up with the next 550 items from the filtered list. For cutoff 500/1000 we cut the original list at rank 500 and filled up with the following 500 items from the filtered one. The distribution of categories among the original top 200 is repeated for comparison.</Paragraph> <Paragraph position="7"> Precision among the 627 filtering results is higher than among the original top 200 (almost 56% compared to 51%), and only slightly smaller even for the 1000 results in the cutoff 500/1000 setting. Using definition extraction as an additional knowledge source, the top 1000 results retrieved are thus of a quality that can otherwise only be achieved for the top 200 results. null</Paragraph> </Section> </Section> class="xml-element"></Paper>