Dependence of exponents on text length versus finite-size scaling for word-frequency distributions

Álvaro Corral, Francesc Font-Clos

    Research output: Contribution to journalArticleResearchpeer-review

    4 Citations (Scopus)

    Abstract

    © 2017 American Physical Society. Some authors have recently argued that a finite-size scaling law for the text-length dependence of word-frequency distributions cannot be conceptually valid. Here we give solid quantitative evidence for the validity of this scaling law, using both careful statistical tests and analytical arguments based on the generalized central-limit theorem applied to the moments of the distribution (and obtaining a novel derivation of Heaps' law as a by-product). We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [X. Yan and P. Minnhagen, Physica A 444, 828 (2016)PHYADX0378-437110.1016/j.physa.2015.10.082] does not stand with rigorous statistical analysis. Instead, we show that the distributions are perfectly described by power-law tails with stable exponents, whose values are close to 2, in agreement with the classical Zipf's law. Some misconceptions about scaling are also clarified.
    Original languageEnglish
    Article number022318
    JournalPhysical Review E
    Volume96
    Issue number2
    DOIs
    Publication statusPublished - 22 Aug 2017

    Fingerprint Dive into the research topics of 'Dependence of exponents on text length versus finite-size scaling for word-frequency distributions'. Together they form a unique fingerprint.

    Cite this