File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/w93-0112_metho.xml
Size: 30,160 bytes
Last Modified: 2025-10-06 14:13:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W93-0112"> <Title>Structural Methods for Lexical/Semantic Patterns</Title> <Section position="4" start_page="128" end_page="129" type="metho"> <SectionTitle> 3. Lexical Semantics </SectionTitle> <Paragraph position="0"> The structure of the denotational representation is important not only for its expressiveness, but also in its relationship to the structure of the language it is to be derived from. In part, the structure of the language is determined by the semantic constraints of relations that are conveyed by its use. If the model is accurate enough, these constraints will be reflected in the representation.</Paragraph> <Paragraph position="1"> Many, if not most, semantic theories used in computational linguistics today assume some degree of functionality in language -- words act as operators, or take arguments, or act as logical quantifiers over the things denoted in their context. The corresponding grammatical theories (e.g. CFG, LFG, HPSG, GB) assume a parallel functional structure, incorporating notions of combinational categories, argument structure, or selectional frames into the lexical representation. These structures use individual words or constituent phrases as functional objects, projecting expectations or constraints to subcatcgorize for arguments to be incorporated into the function. null 3.1. Structurally specified semantic relations The functional semantics of the operators, then, specify the nature and lexical appearance of the arguments. The appearance of a particular head will generate expectations for the number and kind of arguments to be found, and dictate the semantic relation to be applied to them -- because we have seen the operator, we can expect to find its operands in the vicinity. Further, if these operands do not have distinct types, we will need some other mechanism, such as position or order, to be ablc to distinguish them. In this way, the need for syntactic structure is driven by typing ambiguities in the semantics.</Paragraph> <Paragraph position="2"> There is an immediate parallel between the semantic specification of function/argument structure and the specification of the reference-relation representations: the function is analogous to the predicate relation, while the arguments are the referenced components of the relation. In computational linguistic models, this sort of functional semantics has proved very useful in providing a mechanism for deriving frame-like denotations when processing language (predicate logic and unification frames, two of the more popular denotation schemes, can both be transformed to general RR models). In fact, it is often the case that the relations of the RR model are the same as the semantic relations specified by the language. (Whether this is because of a desire for representational efficiency or for other reasons I will leave unexplored.) Semantically specified structural interpretation: We can rephrase the relation between a functional head and its arguments in the following way: since the head requires a particular semantic relation to its arguments, an argument with an otherwise ambiguous interpretation must be treated as being of the type required by the head. Because we know the interpretation of the operator, we can constrain the various arguments to have a corresponding and consistent interpretation.</Paragraph> <Paragraph position="3"> This type of argument disambiguation is exhibited in the phenomenon of type coercion (\[16\]).</Paragraph> <Paragraph position="4"> 3.2. The syntax-semantics boundary In terms of the function-argument structure or reference-relation representations, words or categories with similar type amibiguities and similar argument number are described as being syntactically similar, while differing in interpretation. On the other side, categories with similar functional or relational type are said to have similar semantics, even though the number and typical realization of arguments might differ considerably.</Paragraph> <Paragraph position="5"> As the specificity of the relational constraints varies, the distinction between the two can also vary. Some highly cased languages (e.g. Japanese and Latin) have loose syntactic constraints; the case marking develops constraints for the consistent semantic incorporation of the various arguments within the functional scope of the heads. Other languages, such as English, have a much more definite word order, where the configuration of arguments and heads constrains their semantic relationships. Some constructions, such as idiomatic expressions, have both completely a fixed syntax and semantics. Poetic use has both a freedom of word order and a loose interpretation. Each form of linguistic construction, however, has a consistency of interpretation derived from its components.</Paragraph> <Paragraph position="6"> By using a mechanism of language interpretation that explicitly examines the degree of specificity in argument position and in argument type, and especially their interaction with one another in use, one should be better able to achieve the goals of interpretation; that is, to relate the text to a particular denotation.</Paragraph> </Section> <Section position="5" start_page="129" end_page="130" type="metho"> <SectionTitle> 4. The Generative Lexicon </SectionTitle> <Paragraph position="0"> Theoretical approaches to lexical semantics have begun to incorporate this merging of syntactic and semantic description. The incorporation of argument structure or selectional frames is a large step in this direction. While the notion of argument structure is usually reserved for verbs, some theories, such as Pustejovsky's generative lexicon (GL), extend the idea to include all lexical categories (\[16, 17\]). For the purposes of this discussion, we can consider the GL lexicon to carry two sorts of selectional information with every term: * Qualia structure, which provide semantic type constraints on arguments. These constraints are used both in deriving expectations for the syntactic form of arguments, and in coercing ambiguous or polysemous arguments into the required types (\[16\]).</Paragraph> <Paragraph position="1"> * Cospecifications, which constrain the syntactic realizations and ordering of arguments in relation to the lexical entry and to each other. These constraints are specified much like regular expressions, and can provide varying degrees of 'fit' to the syntax.</Paragraph> <Paragraph position="2"> In addition to these selectional constraints, each term has a mapping from the arguments to a predicate logic denotation, detailing the relationship in which the arguments participate.</Paragraph> <Paragraph position="3"> These three together embody what Pustejovsky calls a lexieal-eouceptuai paradigm (LCP), a representation of the expression of a particular concept, and the paradigmatic usage in context of the lexical entry to express that concept (\[19\]).</Paragraph> <Paragraph position="4"> It is easy to see how a theoretical approach such as GL can be operationalized: A local grammar, corresponding to the cospecifications, and indexed off the lexical entry, could be used in conjunction with a type matching system which imposes the semantic constraints of the qualia structure. The resulting mechanism, when matched against text, could place the matching arguments appropriately in the predicate denotation to return an interpretation of the text.</Paragraph> <Paragraph position="5"> This system, which by conjoining argument type and positional information avoids making a distinction between separate syntactic and semantic analysis, would be a pattern system.</Paragraph> <Paragraph position="6"> This system has been implemented, in part, in the D1BEROT information extraction system (\[4\]).</Paragraph> </Section> <Section position="6" start_page="130" end_page="131" type="metho"> <SectionTitle> 5. Patterns </SectionTitle> <Paragraph position="0"> Pattern-based extraction systems combine syntactic and semantic processing through the use of patterns. Patterns consist of lexically specified syntactic templates that are matched to text, in much the same way as regular expressions, that are applied along with type constraints on substrings of the match. These patterns are iexically indexed local grammar fragments, annotated with semantic relations between the various arguments and the knowledge representation. In the most general system, the units of matching could range from single lexical items to phrasal components or variables with arbitrary type constraints. The variables in the pattern can be mapped directly into the knowledge representation, or, through type constraints, used as abstract specifications on the syntax. Pattern-based systems operate by combining numerous local parses, without relying on a full syntactic analysis.</Paragraph> <Paragraph position="1"> 5.1. DIDEROT, a pattern example For example, in the DIDEROT project (\[4\]), a pattern is represented as a GL structure (GLS) which gives the syn-</Paragraph> <Paragraph position="3"> tactic context along with mappings from text variables to an predicate logic knowledge representation. A typical set of patterns used to cxtract joint-venture events, indexed here from the word 'establish', is given in figure 1.</Paragraph> <Paragraph position="4"> The GL cospecification information is contained in thc cospec field. The index variable 'self' is used to rcfer to an appearance of any of the morphological forms of 'establish'. These forms are given in the gyn(... ) ) field (omitted here for brevity). Literals, such as 'venture' or 'agreement' must match the text exactly. The args field indicates that argument variables AI and A2 must bc realized syntactically as type(np), where np designates a class of strings which are heuristically noun phrases. The argument variables are further restricted to the semantic type path \[code_2,joint_organ)\]. The type path establishes a region in a type hierarchy which must contain the type of the argument (\[20\]). The last component of the cospec, '*', is a Kleene star over all tokens -- anything or nothing may appear in this position.</Paragraph> <Paragraph position="5"> Because of the difficulty and expense of deriving patterns, GLSs cannot be produced for every term of importance. Rather, large segments of the lexicon are statically typed in a sublexicon less intricate than the GLS lexicon. When the GLS is applied to text, the matching of argument variables is accomplished either by calls to GLSs of the appropriate type, or by the invocation of small heuristic grammars. These small grammars combine the type information of their constituents to match the constraints of the governing GLS.</Paragraph> <Paragraph position="6"> These grammars are used especially for proper name recognition. Both company names and human names are matched using small grammars based on part-of-speech tags and the sublexicon typing. Some company names are keyed from semantic indicators such as 'Corp.' and 'Inc.', while many human and place names are identified from a large fixed name lexicon.</Paragraph> <Paragraph position="7"> Overall, other pattern-based systems operate in much the same manner, varying somewhat in the amount of machinery for pattern-matching, and the richness of the typing systems.</Paragraph> </Section> <Section position="7" start_page="131" end_page="135" type="metho"> <SectionTitle> 6. The current state of Pattern Acquisition </SectionTitle> <Paragraph position="0"> The TIPSTER and MUC projects have provided a wealth of knowledge about building pattern-based systems. The hardest and most time-consuming task involved is certainly the acquisition of patterns, which is still done primarily by tedious hand analysis. Working backwards from the key templates (hand generated knowledge representations of texts as interpretted by the project sponsors), one can, by careful reading of the text, usually find those segments of text which corresponds to the representation entries Although the key templates are originally created by a researcher doing a careful reading, the correspondence between text segments and the key templates has not been recorded, making the process error prone and leaving the text open for reinterpretation.</Paragraph> <Paragraph position="1"> The next step, that of correlating the text with the representation and deriving a pattern which captures the relation, is the most tedious and difficult part of the task.</Paragraph> <Paragraph position="2"> Typing constraints for each class of predicate must be remembered by the researcher performing the task, and interactions between patterns must be identified and analyzed for possible interference.</Paragraph> <Paragraph position="3"> Here is a short (and most likely incomplete) review of the state-of-the-art in pattern acquisition, as it exists in the IE community: CIRCUS (Lehnert et al. \[11\]) -- Handwritten CN (concept node) patterns for partial template extraction.</Paragraph> <Paragraph position="4"> Many man-hours were spent reading text, extracting all possibly relevant contexts. Patterns were checked by running the system. A knowledge-poor method with good coverage due to large numbers of trials.</Paragraph> <Paragraph position="5"> Shogun (Jacobs et al) -- Handwritten AWK scripts. Derived from compiled lists of company names, locations, and other semi-regular semantic types. Also from researcher analysis of these in context. Designed to augment or replace previous methods with similar functionality. null FASTUS (Hobbs, Appelt, et al \[8\]) -- Handwritten regular expression-like patterns for partial template extraction. Years of linguistic system building expertise improved pattern generality and helped avoid interactions between patterns.</Paragraph> <Paragraph position="6"> DIDEROT (Cowie, Pustejovsky, et al \[4\]) -- Patterns for full template extraction. Initial patterns automatically derived from structured dictionary entries \[2, 25\] give moderately effective high level patterns. Partly auto~ mated tuning to corpus usage. Hand analysis of contexts and addition of patterns was used to complete coverage.</Paragraph> <Paragraph position="7"> CIRCUS + AutoSlog (Lehnert et al \[12\]) -- Automated reference from template to text, using machine learning inference techniques, gives much of the coverage previously provided by hand-analysis. Patterns must still be corrected by the researcher.</Paragraph> <Paragraph position="8"> The AutoSlog approach has obtained the most significant benefit from automated acquisition. In this system, a sentence containing a string which corresponds to a template entry is analyzed for part-of-speech and major phrasal boundaries. If the string entry from the template aligns with one of the phrases, a pattern is gem crated corresponding to the observed syntactic structure.</Paragraph> <Paragraph position="9"> However, since the generated AutoSlog patterns are produced from single occurrences of context patterns, they are not likely to capture patterns generalizing to varying contexts. In addition, the acquisition method is so closely tied only to specific parts of the knowledge representation (in that string entries only are matched) that extending the coverage, or generalizing the domain appears to be as difficult as porting to entirely new domains. null 7. Structural Similarity Clustering The pattern systems described here attempt to relate the use of terms in context to corresponding denotations.</Paragraph> <Paragraph position="10"> One of the major assumptions made here, as well as in all algorithmic computational linguistic systems, is one of consislency of use and meaning -- that a term or phrase (or any linguistic structure) used in a particular fashion will give rise to a particular denotation. The goals of any grammar induction or lexical semantic acquisition problem are to define those particulars -- to find the distinguishing features of the usage as they relate to the features of the denotation.</Paragraph> <Paragraph position="11"> The approach given here chooses to focus only on the structural features of usage and denotation. By classifying features relevant to the text~--,denotation mapping, the aim is to provide a vocabulary and mechanism for deriving and evaluating interpretation procedures.</Paragraph> <Paragraph position="12"> It has been noted already that there exist paradigmatic usages of terms to express particular concepts (the LCPs). It is not a large leap to venture also that particular concepts have paradigmatic expressions in words idiomatic expressions, 'stock phrases' and proper names being the most obvious examples. The relationship between the two can be approached from both directions by classifying the uses of a word in terms of their conventional expression of a concept, or by classifying the expressions of a concept in terms of the words used. These classifications create a vocabulary that can be used to compare and relate words with concepts.</Paragraph> <Paragraph position="13"> This work provides a step in forming such a vocabulary by examining methods for classifying the structural properties of the words and denotations separately, and in suggesting methods by which they could be unified. Classification methods for both lexical and semantic structure are outlined here. An experimental implementation of the lexical approach is presented in the latter sections of the paper.</Paragraph> <Paragraph position="14"> 7.1. Lexical structure Without considering its semantics, the use of a word can be expressed solely by its lexical environment, or context. Grammar-driven systems as well as pattern systems achieve their performance by relying on the expected structural properties of the language. We can express the consistencies and paradigms in the usage of a word in explicit terms of the similarities and common structural properties of the lexical environment in which that word appears, A large collection of usages could be analyzed to find natural classes of context, defined purely in terms of the lexical environment, to give a vocabulary of context types that can be used to compare and relate differing words. The similarities of context would be determined by the structural similarities of their component strings of words. The presence and relative ordering of identical words, words belonging to the same structural similarity classes, or phrasal components, recursively defined in terms of context types, would be the environment features necessary for determining these classes.</Paragraph> <Paragraph position="15"> Groups of contexts could be organized into context types based on these similarity measures, with group membership determined by similarity. The contexts could be assembled into a hierarchical structure, in which groups of high similarity combine to form higher-order clusters encompassing the structural features of their component groups.</Paragraph> <Paragraph position="16"> Word classes could be defined inductively on this tree of context types by classifying words according to the sets of context types in which they have appeared. The hierarchy of context types and word classes encodes the specificity of the relation to the category. Lower levels of the hierarchy have strict context constraints, while higher levels, combining the classes beneath them, place looser constraints on context patterns. By studying the lexical context classes in relation to the semantic properties of the terms, we could illuminate those features of context which correlate with, and in theory constrain, their semantic properties.</Paragraph> <Paragraph position="17"> An experimental method for performing these sorts of classification is presented in the later part of this paper, using string edit distance as a metric of similarity, and agglomerative clustering techniques to provide the classification structure.</Paragraph> <Paragraph position="18"> 7.2. Semantic structure In an analogous way, the predicate denotations of text could be classified purely from their structural properties. In exactly the same manner as for context classes, relation predicates could be grouped hierarchically based on their structural features. The features one could use to derive predicate classes include predicate arity, specificity, argument types, and structure depth, as well as a semantic type hierarchy or lattice defined for specific domain.</Paragraph> <Paragraph position="19"> The large databases of parallel text and denotations that, would be necessary for this are not as freely available as text corpora for study. Representations would have to be generated by hand. However, the work in template filling and analysis contributed by the research community to the TIPSTER effort has shown that deriving a sufficient volume is not out of the question.</Paragraph> <Paragraph position="20"> This classification of predicate structure would provide a basis for examining the constraints which predicate structure enforces on lexical realization.</Paragraph> <Section position="1" start_page="132" end_page="134" type="sub_section"> <SectionTitle> 7.3. Integration </SectionTitle> <Paragraph position="0"> The natural integration of these two lines of study would result in a vocabulary of semantic and lexical classes that would enable the correlation of the lexieal structure of a text with its denotational structure, and the derivation of structural mappings between the two.</Paragraph> <Paragraph position="1"> As an example of the benefits this integration might give to interpretation or IE systems, consider the following example, from the TIPSTER/MUC-5 domain: Imagine a researcher developing the domain-dependent vocabulary for an IE system. Assume that the system has a classification of the structural properties of general text, and has also a type hierarchy for general and domain-specific representations.</Paragraph> <Paragraph position="2"> The researcher has annotated a short segment of text with its interpretation in the problem domain. (See fig. 2). In the figure, the indices relate segments of text to their corresponding denotations. SMALL CAPS are used in the denotation to indicate known quantities in thc domain specific type hierarchy; mixed case is used for unknown types.</Paragraph> <Paragraph position="3"> \[A\[BIBM\]B is jointly developing \[cpractical X-ray tools for \[Dthe manufacture of \[Gdevices\]G \[Bbased on 0.25 micron or small geometries\]E\]D\]C with interpretation Now that the researcher has provided a connection between text and denotation, the system can use the classifications of context and mapping types as a vocabulary to describe the relation. For instance, it is now known that 'IBM', and also 'Motorola', can be AGENT arguments, and specifically the AGENT arguments of a DEVELOPMENT predicate. The system probably has an LCP encoding the co-agentive functionality of'with', but now learns specifically that the DEVELOPMENT predicate allows this behavior, and that a configuration giving that interpretation is: \[A1 ... PRODUCT with A2\] This knowledge can augment both the LCP for 'with' and the mapping structures for DEVELOPMENT relations. Once the system has been provided with more textdenotation pairs particular to the domain, it may find a correlation between lexical structures containing the word 'developing' and DEVELOPMENT predicate structures, and then postulate mappings between the two, building an LCP for 'developing'. Or, relying more heavily on general structural knowledge, the system could use an existing LCP for the word 'is', as represented by the (where the word 'X' is correlated with the predicate X-PRED). This general mapping for 'is' could be used to postulate a correlation between 'developing' and DEVELOPMENT. null Only through the development of a catalog and vocabulary of structural descriptions, however, could one hope to build a system such as this.</Paragraph> <Paragraph position="4"> 8. Edit Distance One method for judging the similarity between strings of lexical items (tokens) is the edit distance formulated by Levenshtein (\[13\]). This is a similarity measure based on the minimum number of token insertions, deletions, and substitutions (mutations) required to transform one string into another. A generalization of this edit distance can be made by assigning differing weights to insertions of particular tokens or classes of tokens, and by also assigning weights to token substitution pairs. Straight-forward computational methods for finding the edit distance between two strings (\[22, 24\]) have been used on a variety of problems in biology, genetics, speech and handwriting analysis (\[21\]), as well as in syntactic analysis of formal languages (\[14\]). (For a good introduction with applications to many domains, see \[21\].) To demonstrate the generalized edit distance, consider the two strings: the path that is the path the way that i:~ not the way The first string can be transformed into the second by a number of insertion, deletion, and substitution operations. Substitutions are commonly counted as two operations, since they give the same effect as a deletioninsertion combination. In this example, 'not' could be inserted; 'path' could be substituted by 'way', then the second 'path' deleted at the end, then 'way' inserted; 'that' could be deleted then reinserted, and then 'not' inserted; etc. Many different sequences lead to the same result, but there will be a minimum number of operations required for the transformation.</Paragraph> <Paragraph position="5"> After a short inspection, we could expect a minimum of 5 operations in this case - two for each change from 'path' to 'way', and one for the insertion of 'not'.</Paragraph> <Paragraph position="6"> This distance measure can be generalized to compensate for different similarities between types of tokens. For instance, if one decides that 'way' and 'path' are more similar to each other than either is, say, to 'is' or 'the', then it would be good to have the substitution of'path'-'way' amount to less than the possible substitution 'path'-'is'. To accomplish this, a cost can be associated with each operation, perhaps even a different cost for each sort of insertion or substitution. Then a transformation of minimum cost, rather than minimum operations, can be defined. If one makes the simple assumption that a substitution costs no more than the corresponding deletioninsertion pair, then this minimum cost can be shown to obey metric properties, and defines the generalized edit distance between the two strings, with larger distances corresponding to less similar strings.</Paragraph> <Paragraph position="7"> There is a straightforward method for computing edit distance. In a prime example of dynamic programming, the edit distance is computed for every pair of initial substrings of the two strings under study, with results for shorter substrings combining to give results for longer substrings.</Paragraph> <Paragraph position="8"> More explicitly, let our two strings be A = (a0,al,...,am) and B = (bo,bl,...,bn), where a, is the ith token in string A, starting with token 1. We let the first component of the the string, a0, be a null token, representing an empty position into which we can insert. Define also the initial substring Ai = (a0, hi,..., hi) of a string to be the first i tokens, including the null token at the beginning.</Paragraph> <Paragraph position="9"> The computation starts by assigning D(A0, B0)) = 0, the cost of transforming a0 to b0, the null token to itself. Each subsequent step in the computation proceeds with the simple rule:</Paragraph> <Paragraph position="11"> where Dinsert(X) is the cost for inserting z, and Dsubstitute($~, y) is the cost of substituting x for y.</Paragraph> <Paragraph position="12"> Starting with D(0,0), one can fill each D(i,j) in a table, ending at D(m, n), the edit distance between the two strings. The table is filled from upper left to lower right, as each entry is computed from its upper, leftward, and diagonal neighbors using the minimum rule above.</Paragraph> <Paragraph position="13"> Figure 3 gives this table for the example strings.</Paragraph> </Section> <Section position="2" start_page="134" end_page="135" type="sub_section"> <SectionTitle> 8.1. String alignments </SectionTitle> <Paragraph position="0"> As a by-product of the edit distance computation, one can create an alignment of the two strings. This alignment matches the elements of the two sequences in linear order and shows the correspondence between tokens and substrings of the two matched strings. An alignment can be generated directly from the table created in the edit distance computation by following the path of minima chosen during the computation from the upper left corner to the lower right. Rightward travel along this path corresponds to insertion of a token from string A, downward travel to tokens from string B, and diagonal paths to substitutions. (Multiple minimum paths may result, giving alternate but equivalent alignments.) The alignment created from our two example strings (figure 4) gives the correspondence between the tokens of the two initial strings. From the figure, it is easy to see the structural similarities of the two strings.</Paragraph> <Paragraph position="1"> Alignments can be created for sets of more than two p i the i pat i that i isl j the i path I the way that is not the way strings. These can be expressed in terms of extended alignment tables, with added rows corresponding to the additional strings. These alignment tables could further be abstracted to probabilistic descriptions of the sequences, using either a zero-order or Markov chain description. Chan and Wang (\[3\]) have used syntheses, zero-order probabilistic descriptions of alignment tables in order to generalize the edit distance and capture the notion of distance between two sets of sequences. Techniques such as this may prove useful in later work.</Paragraph> </Section> </Section> class="xml-element"></Paper>