File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1050_intro.xml
Size: 2,657 bytes
Last Modified: 2025-10-06 14:05:29
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1050"> <Title>SMOOTHING OF AUTOMATICALLY GENERATED SELECTIONAL CONSTRAINTS</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Semantic (selectional) constraints are necessary for the accurate analysis of natural language text. Accordingly, the acquisition of these constraints is an essential yet time-consuming part of porting a natural language system to a new domain.</Paragraph> <Paragraph position="1"> Several research groups have attempted to automate this process by collecting co-occurrence patterns (e.g., subject-verb-object patterns) from a large training corpus. These patterns are then used as the source of selectional constraints in analyzing new text.</Paragraph> <Paragraph position="2"> However, the patterns collected in this way involve specific word combinations from the training corpus. Unless the training corpus is very large, this will provide only limited coverage of the range of acceptable semantic combinations, even within a restricted domain. In order to obtain better coverage, it will be necessary to generalize from the patterns collected so that patterns with semantically related words will also be considered acceptable. In most cases this has been done by manually assigning words to semantic classes and then generalizing from specific words to their classes. This approach still implies a substantial manual burden in moving to a new domain, since at least some of the semantic word classes will be domain-specific.</Paragraph> <Paragraph position="3"> In order to fully automate the process of semantic constraint acquisition, we would like to be able to automatically identify semantically related words. This can be done using the co-occurrence data, by identifying words which occur in the same contexts (for example, verbs which occur with the same subjects and objects). From the co-occurrence data one can compute a similarity relation between words, and then cluster words of high similarity. This approach was taken by Sekine et al. at UMIST, who then used these clusters to generalize semantic patterns \[6\]. A similar approach to word clustering was reported by Hirschman et al. in 1975 \[5\].</Paragraph> <Paragraph position="4"> For our current experiments, we have adopted a slightly different approach. We compute from the co-occurrence data a confusion matrix, which also measures the interchangeability of words in particular contexts. We then use the confusion matrix directly to generalize the semantic patterns.</Paragraph> </Section> class="xml-element"></Paper>