File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2247_metho.xml

Size: 5,566 bytes

Last Modified: 2025-10-06 14:15:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2247">
  <Title>Detecting Verbal Participation in Diathesis Alternations</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 SCF Identification
</SectionTitle>
    <Paragraph position="0"> The SCFs applicable to each verb are extracted automatically from corpus data using the system of Briscoe and Carroll (1997). This comprehensive verbal acquisition system distinguishes 160 verbal SCFs. It produces a lexicon of verb entries each organised by SCF with argument head instances enumerated at each slot.</Paragraph>
    <Paragraph position="1"> The hand-crafted diathesis alternation classification links Levin's (1993) index of alternations with the 160 SCFs to indicate which classes are involved in alternations.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="1493" type="metho">
    <SectionTitle>
3 Selectional Preference Acquisition
</SectionTitle>
    <Paragraph position="0"> Selectional preferences can be obtained for the subject, object and prepositional phrase slots for any specified SCF classes. The input data includes the target verb, SCF and slot along with the noun frequency data and any prepo- null sition (for PPs). Selectional preferences are represented as Association Tree Cut Models (ATCMS) aS described by Abe and Li (1996).</Paragraph>
    <Paragraph position="1"> These are sets of classes which cut across the WordNet hypernym noun hierarchy (Miller et al., 1993) covering all leaves disjointly. Association scores, given by ~ are calculated for p(c) ' the classes. These scores are calculated from the frequency of nouns occurring with the target verb and irrespective of the verb. The score indicates the degree of preference between the class (c) and the verb (v) at the specified slot. Part of the ATCM for the direct object slot of build is shown in Figure 1. For another verb a different level for the cut might be required. For example eat might require a cut at the FOOD hyponym of OBJECT.</Paragraph>
    <Paragraph position="2"> Finding the best set of classes is key to obtaining a good preference model. Abe and Li use MDL to do this. MDL is a principle from information theory (Rissanen, 1978) which states that the best model minimises the sum of i the number of bits to encode the model, and ii the number of bits to encode the data in the model.</Paragraph>
    <Paragraph position="3"> This makes the compromise between a simple model and one which describes the data efficiently. null Abe and Li use a method of encoding tree cut models using estimated frequency and probability distributions for the data description length. The sample size and number of classes in the cut are used for the model description length.</Paragraph>
    <Paragraph position="4"> They provide a way of obtaining the ATCMS using the identity p(clv ) = A(c, v) x p(c). Initially a tree cut model is obtained for the marginal probability p(c) for the target slot irrespective of the verb. This is then used with the conditional data and probability distribution p(clv ) to obtain an ATCM aS a by-product of obtaining the model for the conditional data. The actual comparison used to decide between two cuts is calculated as in equation 1 where C represents the set of classes on the cut model currently being examined and Sv represents the sample specific to the target verb. 2.</Paragraph>
    <Paragraph position="6"> In determining the preferences the actual en-SAil logarithms are to the base 2  coding in bits is not required, only the relative cost of the cut models being considered. The WordNet hierarchy is searched top down to find the best set of classes under each node by locally comparing the description length at the node with the best found beneath. The final comparison is done between a cut at the root and the best cut found beneath this. Where detail is warranted by the specificity of the data this is manifested in an appropriate level of generalisation. The description length of the resultant cut model is then used for detecting diathesis alternations.</Paragraph>
  </Section>
  <Section position="5" start_page="1493" end_page="1494" type="metho">
    <SectionTitle>
4 Evidence for Diathesis
Alternations
</SectionTitle>
    <Paragraph position="0"> For verbs participating in an alternation one might expect that the data in the alternating slots of the respective SCFs might be rather homogenous. This will depend on the extent to which the alternation applies to the predominant sense of the verb and the majority of senses of the arguments. The hypothesis here is that if the alternation is reasonably productive and could occur for a substantial majority of the instances then the preferences at the corresponding slots should be similar. Moreover we hypothesis that if the data at the alternating slots is combined then the cost of encoding this data in one ATCM will be less than the cost of encoding the data in separate models, for the respective slot and SCF.</Paragraph>
    <Paragraph position="1"> Taking the causative-inchoative alternation as an example, the object of the transitive frame switches to the subject of the intransitive frame: The boy broke the window ~ The window broke.</Paragraph>
    <Paragraph position="2"> Our strategy is to find the cost of encoding the data from both slots in separate ATCMS and compare it to the cost of encoding the combined data. Thus the cost of an ATCM for / the sub- null true positives begin end Change swing false positives cut true negatives choose like help</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML