File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/w96-0106_concl.xml
Size: 2,787 bytes
Last Modified: 2025-10-06 13:57:39
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0106"> <Title>Relating Turing's Formula and Zipf's Law</Title> <Section position="7" start_page="770" end_page="770" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> The relationship between Turing's formula and Zipf's law, which both concern population frequencies, was explored in the present article. The asymptotic behavior of the relative frequency as a function of rank implicit in one interpretation of Turing's local reestimation formula was derived and compared with Zipf's law. While the latter relates the rank and relative frequency as asymptotically inversely proportional, the former states that the frequency declines exponentially with rank. This means that while Zipf's law implies a finite total population, Turing's formula yields a proper probability distribution also for infinite populations.</Paragraph> <Paragraph position="1"> In fact, it is tempting to interpret Turing's formula as smoothing the relative-frequency estimates towards a geometric distribution. This could potentially be used to improve sparse-data estimates by assuming a geometric distribution (tail), and introducing a ranking based on direct frequency counts, frequency counts when backing off to more general conditionings, order of appearance in the training data, or, to break any remaining ties, lexicographical order.</Paragraph> <Paragraph position="2"> Conversely, a local reestimation formula in the vein of Turing's formula was derived from Zipf's law. Although the two equations are similar, Turing's formula shifts the frequency mass towards more frequent species. The two cases were generalized to a single spectrum of reestimarion formulas and corresponding asymptotes, parameterized by one real-valued parameter.</Paragraph> <Paragraph position="3"> Furthermore, the two cases correspond to the upper and lower bounds of this parameter for which the cumulative of the frequency function converges as rank tends to infinity.</Paragraph> <Paragraph position="4"> These results are in sharp contrast to common belief in the field; in \[Baayen 1991\], for example, we read: &quot;Other models, such as Good (1953) ... have been put forward, all of which have Zipf's law as some special or limiting form.&quot; All of the Zipf-Simon-Mandelbrot distributions exhibit the same basic asymptotic behavior, C f(r) = r&quot;'~ parameterized by the positive real-valued parameter 8- Comparing this with Eq. (15), we find 1 1 that ~ - 8--Z-1 > 0 and thus 8 = 1 + ~ > 1. In view of the established exponentially declining asymptote of the ideal Turing distribution, corresponding to 8 = 1, we can conclude that the latter is qualitatively different.</Paragraph> </Section> class="xml-element"></Paper>