File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2043_metho.xml
Size: 1,555 bytes
Last Modified: 2025-10-06 14:10:25
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2043"> <Title>Improving English Subcategorization Acquisition with Diathesis Alternations as Heuristic Information</Title> <Section position="5" start_page="331" end_page="332" type="metho"> <SectionTitle> 3 The MLE Filtering Method </SectionTitle> <Paragraph position="0"> The present SCF acquisition system for English verbs employs a MLE filter to test the automatically generated SCF hypotheses. Due to noises accumulated while tagging, lemmatizing and parsing the corpus, even though correction is implemented for some typical errors when classifying the extracted patterns, the hypothesis generator does not perform as efficiently as hoped.</Paragraph> <Paragraph position="1"> Sampling analysis on the unfiltered hypotheses in Korhonen's evaluation corpus indicates that about 74% incorrectly proposed and rejected SCF types come from the defects of the MLE filtering method.</Paragraph> <Paragraph position="2"> Performance of the MLE filter is closely related to the actual distributions p(scf</Paragraph> <Paragraph position="4"> predicates and SCF types in the input corpus.</Paragraph> <Paragraph position="5"> First, from the overall corpus a training set is drawn randomly; it must be large enough to ensure a similar distribution. Then, the frequency of a subcategorization frame scf i occurring with a verb v is recorded and used to estimate the probability p(scf i |v). Thirdly, an empirical threshold th is determined, which ensures that a maximum</Paragraph> </Section> class="xml-element"></Paper>