XML Viewer - w00-1325

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1325_intro.xml
Size: 6,045 bytes
Last Modified: 2025-10-06 14:01:04
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1325">
  <Title>Statistical Filtering and Subcategorization Frame Acquisition</Title>
  <Section position="4" start_page="199" end_page="201" type="intro">
    <SectionTitle>
2 Method
2.1 Framework for SCF Acquisition
</SectionTitle>
    <Paragraph position="0"> Briscoe and Carroll's (1997) verbal acquisition system distinguishes 163 SCFs and returns relative frequencies for each SCF found for a given predicate. The SCFs are a superset of classes found in the Alvey NL Tools (ANLT) dictionary, Boguraev et al. (1987) and the COML~X Syntax dictionary, Grishman et al. (1994).</Paragraph>
    <Paragraph position="1"> They incorporate information about control of predicative arguments, as well as alternations such as extraposition and particle movement. The system employs a shallow parser to obtain the subcategorization information. Potential SCF entries are filtered before the final SCF lexicon is produced. The filter is the only component of this system which we experiment with here. The three filtering methods which we compare are described below.</Paragraph>
    <Section position="1" start_page="199" end_page="201" type="sub_section">
      <SectionTitle>
2.2 Filtering Methods
</SectionTitle>
      <Paragraph position="0"> Briscoe and Carroll (1997) used a binomial hypothesis test (BHT) to filter the acquired SCFs. They applied BHT as follows. The system recorded the total number of sets of SCF cues (n) found for a given predicate, and the number of these sets for a given SCF (ra). The system estimated the error probability (pe) that a cue for a SCF (scfi) occurred with a verb which did not take scfi. pe was estimated in two stages, as shown in equation 1.</Paragraph>
      <Paragraph position="1"> Firstly, the number of verbs which are members of the target SCF in the ANLT dictionary were extracted. This number was converted to a probability of class membership by dividing by the total number of verbs in the dictionary. The complement of this probability provided an estimate for the probability of a verb not taking scfi. Secondly, this probability was multiplied by an estimate for the probability of observing the cue for scfi. This was estimated using the number of cues for i extracted from the Susanue corpus (Sampson, 1995), divided by the total number of cues.</Paragraph>
      <Paragraph position="2"> pe = (1 - Iverbsl i cZass il I eSlc e l, for il (1) The probability of an event with probability p happening exactly rn times out of n attempts is given by the following binomial distribution:</Paragraph>
      <Paragraph position="4"> The probability of the event happening m or more times is:</Paragraph>
      <Paragraph position="6"> Finally, P(m+, n,p e) is the probability that m or more occurrences of cues for scfi will occur with a verb which is not a member ofscfi, given n occurrences of that verb. A threshold on this probability, P(m+,n, pe), was set at less than or equal to 0.05. This yielded a 95% or better confidence that a high enough proportion of cues for scfi have been observed for the verb to be legitimately assigned scfi.</Paragraph>
      <Paragraph position="7"> Other approaches which use a binomial filter differ in respect of the calculation of the error probability. Brent (1993) estimated the error probabilities for each SCF experimentally from the behaviour of his SCF extractor, which detected simple morpho-syntactic cues in the corpus data. Manning (1993) in-Creased the number of available cues at the expense of the reliability of these cues. To maintain high levels of accuracy, Manning applied higher bounds on the error probabilities for certain cues. These bounds were determined experimentally. A similar approach was taken by Briscoe, Carroll and Korhonen (1997) in a modification to the Briscoe and Carroll system. The overall performance was increased by changing the estimates of pe according to the performance of the system for the target SCF. In the work described here, we use the original BHT proposed by Briscoe and Carroll.</Paragraph>
      <Paragraph position="8">  Ratio as a Statistical Filter Dunning (1993) demonstrates the benefits of the LLR statistic, compared to Pearson's chisquared, on the task of ranking bigram data.</Paragraph>
      <Paragraph position="9"> The binomial log-likelihood ratio test is simple to calculate. For each verb and SCF combination four counts are required. These are the number of times that:  1. the target verb occurs with the target SCF (kl) 2. the target verb occurs with any other SCF (nl - kl) 3. any other verb occurs with the target SCF (k2) 4. any other verb occurs with any other SCF</Paragraph>
      <Paragraph position="11"> The LLR statistic provides a score that reflects the difference in (i) the number of bits it takes to describe the observed data, using pl = p(SCFIverb ) and p2 = p(SCFl-~verb ), and (ii) the number of bits it takes to describe the expected data using the probability p = p(scFlany verb).</Paragraph>
      <Paragraph position="12"> The LLR statistic detects differences between pl and p2. The difference could potentially be in either direction, but we are interested in LLRS where pl &gt; p2, i.e. where there is a positive association between the SCF and the verb. For these cases, we compared the value of -2logA to the threshold value obtained from Pearson's Chi-Squared table, to see if it was significant at the 95% level 2. 2.2.3 Using a Threshold on the Relative Frequencies as a Baseline In order to examine the baseline performance of this system without employing any notion of the significance of the observations, we used a threshold on relative frequencies. This was done by extracting the SCFS, and ranking them in the order of the probability of their occurrence with the verb. The probabilities were estimated using a maximum likelihood estimate (MLE) from the observed relative frequencies. A threshold, determined empirically, was applied to these probability estimates to filter out the low probability entries for each verb. ....</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML