XML Viewer - h93-1050

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1050_metho.xml
Size: 9,102 bytes
Last Modified: 2025-10-06 14:13:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1050">
  <Title>SMOOTHING OF AUTOMATICALLY GENERATED SELECTIONAL CONSTRAINTS</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. THE NATURE OF THE CONSTRAINTS
</SectionTitle>
    <Paragraph position="0"> The constraints we wish to acquire are local semantic constraints; more specifically, constraints on which words can occur together in specific syntactic relations. These include head-argument relations (e.g., subject-verb-object) and head-modifier relations. Some constraints may be general (domain independent), but others will be specific to a particular domain. Because it is not practical to state all the allowable word combinations, we normally place words into (semantic) word classes and then state the constraints in terms of allowable combinations of these classes.</Paragraph>
    <Paragraph position="1"> When these constraints were encoded by hand, they were normally stated as absolute constraints--aparticular combination of words was or was not acceptable. With corpus-derived constraints, on the other hand, it becomes possible to think in terms of a probabilistic model. For example, based on a training corpus, we would estimate the probability that a particular verb occurs with a particular subject and object (or with subject and object from particular classes), or that a verb occurs with a particular modifier. Then, using the (obviously crude) assumption of independent probabilities, we would estimate the probability of a particular sentence derivation as the product of the probabilities of all the operations (adding arguments to heads, adding modifiers to heads) required to produce the sentence, and the probability of a sentence as the sum of the probabilities of its derivations.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="254" type="metho">
    <SectionTitle>
3. ACQUIRING SEMANTIC PATTERNS
</SectionTitle>
    <Paragraph position="0"> Based on a series of experiments over the past year (as reported at COLING-92) we have developed the following procedure for acquiring semantic patterns from a text corpus:  1. Using unsupervised training methods, create a stochastic grammar from a (non-stochastic) augmented context-free grammar. Use this stochastic grammar to parse the training corpus, taking only the most probable parse(s) of each sentence.</Paragraph>
    <Paragraph position="1"> 2. Regularize the parses to produce something akin to an LFG f-structure, with explicitly labeled syntactic relations such as SUBJECT and OBJECT. l 3. Extract from the regularized parse a series of triples of the form  head syntactic-relation arg where arg is the head of the argument or modifier. We will use the notation &lt; wi r wj &gt; for such a triple, and &lt; r w i &gt; for a relation-argument pair.</Paragraph>
    <Paragraph position="2"> 4. Compute the frequency F of each head and each triple in the corpus. If a sentence produces N parses, a triple generated from a single parse has weight 1/N in the total. For example, the sentence Mary likes young linguists from Limerick.</Paragraph>
    <Paragraph position="3"> would produce the regularized syntactic structure (s like (subject (np Mary)) (object (np linguist (a-pos young) (from (np Limerick))))) from which the following four triples are generated: like subject Mary like object linguist linguist a-pos young linguist from Limerick Given the frequency information F, we can then estimate the probability that a particular head wi appears with a particular argument or modifier &lt; v wj &gt;:2 F(&lt; wi r wj &gt;) F(wi appears as a head in a parse tree) This probability information would then be used in scoring alternative parse trees. For the evaluation below, however, we will use the frequency data F directly.</Paragraph>
    <Paragraph position="4"> 1 But with somewhat more regulafization than is done in LFG; in particular, passive structures are converted to corresponding active forms. 2Note that F(wl appears as a head in a parse tree) is different from F(wi appears as a head in a triple) since a single head in a parse tree may produce several such triples, one for each argument or modifier of that head. Step 3 (the triples extraction) includes a number of special  if a verb has a separable particle (e.g., &amp;quot;out&amp;quot; in &amp;quot;carry out&amp;quot;), this is attached to the head (to create the head carry-out) and not treated as a separate relation. Different particles often correspond to very different senses of a verb, so this avoids conflating the subject and object distributions of these different senses.</Paragraph>
    <Paragraph position="5"> if the verb is &amp;quot;be&amp;quot;, we generate a relation becomplement between the subject and the predicate complement.</Paragraph>
    <Paragraph position="6"> triples in which either the head or the argument is a pronoun are discarded triples in which the argument is a subordinate clause are discarded (this includes subordinate conjunctions and verbs taking clausal arguments) triples indicating negation (with an argument of &amp;quot;not&amp;quot; or &amp;quot;never&amp;quot;) are ignored</Paragraph>
  </Section>
  <Section position="6" start_page="254" end_page="255" type="metho">
    <SectionTitle>
4. GENERALIZING SEMANTIC PATTERNS
</SectionTitle>
    <Paragraph position="0"> The procedure described above produces a set of frequencies and probability estimates based on specific words. The &amp;quot;traditional&amp;quot; approach to generalizing this information has been to assign the words to a set of semantic classes, and then to collect the frequency information on combinations of semantic classes \[7,3\].</Paragraph>
    <Paragraph position="1"> Since at least some of these classes will be domain specific, there has been interest in automating the acquisition of these classes as well. This can be done by clustering together words which appear in the same context. Starting from the file of triples, this involves: 1. collecting for each word the frequency with which it occurs in each possible context; for example, for a noun we would collect the frequency with which it occurs as  the subject and the object of each verb 2. defining a similarity measure between words, which reflects the number of common contexts in which they appear 3. forming clusters based on this similarity measure  Such a procedure was performed by Sekine et al. at UMIST \[6\]; these clusters were then manually reviewed and the resulting clusters were used to generalize selectional patterns.  A similar approach to word cluster formation was described by Hirschman et al. in 1975 \[5\].</Paragraph>
    <Paragraph position="2"> Cluster creation has the advantage that the clusters are amenable to manual review and correction. On the other hand, our experience indicates that successful cluster generation depends on rather delicate adjustment of the clustering criteria. We have therefore elected to try an approach which directly uses a form of similarity measure to smooth (generalize) the probabilities.</Paragraph>
    <Paragraph position="3"> Co-occurrence smoothing is a method which has been recently proposed for smoothing n-gram models \[4\] .3 The core of this method involves the computation of a co-occurrence matrix (a matrix of confusion probabilities) Pc(wj \]wi), which indicates the probability of word wj occurring in contexts in which word wi occurs, averaged over these contexts.</Paragraph>
    <Paragraph position="5"> where the sum is over the set of all possible contexts s. For an n-gram model, for example, the context might be the set of n - 1 prior words. This matrix can be used to take a basic trigram model PB (wn Iw,~-2, wn-0 and produce a smoothed model</Paragraph>
    <Paragraph position="7"> We have used this method in a precisely analogous way to compute smoothed semantic triples frequencies, Fs. In triples of the form wordl relation word2 we have initially chosen to smooth over wordl , treating relation and word2 as the context.</Paragraph>
    <Paragraph position="9"> F(w~ appears as a head of a triple) ! rs(&lt; wi r &gt;) = Pc(wil ) ..r(&lt; &gt;) In order to avoid the generation of confusion table entries from a single shared context (which quite often is the result of an incorrect parse), we apply a filter in generating Pc: for i C/ j, we generate a non-zero Pc(wj Iwi) only if the wi and wj appear in at least two common contexts, and there is some common context in which both words occur at least 3We wish to thank Richard Schwartz of BBN for referring us to this method and article.</Paragraph>
    <Paragraph position="10"> twice. Furthermore, if the value computed by the formula for Pc is less than some threshold ~'c, the value is taken to be zero; we have used rc = 0.001 in the experiments reported below. (These filters are not applied for the case</Paragraph>
    <Paragraph position="12"> always computed exactly.) Because these filters may yeild an un-normalized confusion matrix (i.e., ~oj Pc(wj Iwi) &lt; 1), we renormalize the matrix so that ~w Pc(wj Iwi) = 1.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML