File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1302_concl.xml
Size: 3,334 bytes
Last Modified: 2025-10-06 13:54:19
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1302"> <Title>On Statistical Parameter Setting</Title> <Section position="4" start_page="14" end_page="15" type="concl"> <SectionTitle> 3 Conclusion </SectionTitle> <Paragraph position="0"> The evaluations on two related morphology systems show that with a restrictive setting of the parameters in the described algorithm, approx 99% precision can be reached, with a recall higher than 60% for the portion of the Brown corpus, and even higher for the Peter corpus.</Paragraph> <Paragraph position="1"> We are able to identify phases in the generation of rules that turn out to be for English: a. initially inflectional morphology on verbs, with the plural &quot;s&quot; on nouns, and b. subsequently other types of morphemes. We believe that this phenomenon is purely driven by the frequency of these morphemes in the corpora. In the manually segmented portion of the Brown corpus we identified on the token level 11.3% inflectional morphemes, 6.4% derivational morphemes, and 82.1% stems. In average there are twice as many inflectional morphemes in the corpus, than derivational.</Paragraph> <Paragraph position="2"> Given a very strict parameters, focusing on the description length of the grammar, our system would need long time till it would discover prefixes, not to mention infixes. By relaxing the weight of description length we can inhibit the generation and identification of prefixing rules, however, to the cost of precision.</Paragraph> <Paragraph position="3"> Given these results, the inflectional paradigms can be claimed to be extractable even with an incremental approach. As such, this means that central parts of the lexicon can be induced very early along the time line.</Paragraph> <Paragraph position="4"> The existing signatures for each morpheme can be used as simple clustering criteria.</Paragraph> <Paragraph position="5"> Clustering will separate dependent (affixes) from independent morphemes (stems). Their basic distinction is that affixes will usually have a long signature, i.e. many elements they co-occur with, as well as a high frequency, while for stems the opposite is true.</Paragraph> <Paragraph position="6"> Along these lines, morphemes with a similar signature can be replaced by symbols, expressing the same type information and compressing the grammar further. This type information, especially for rare morphemes is essential in subsequent induction of syntactic structure. Due to space limitations, we cannot discuss in detail subsequent steps in the cross-level induction procedures. Nevertheless, the model presented here provides an important pointer to the mechanics of how grammatical parameters might come to be set.</Paragraph> <Paragraph position="7"> Additionally, we provide a method by which to test the roles different statistical algorithms play in this process. By adjusting the weights of the contributions made by various constraints, we can approach an understanding of the optimal ordering of algorithms that play a role in the computational framework of language acquisition.</Paragraph> <Paragraph position="8"> This is but a first step to what we hope will eventually finish a platform for a detailed study of various induction algorithms and evaluation metrics.</Paragraph> </Section> class="xml-element"></Paper>