Topic-based classification and identification of global trends for startup companies

Ivan Savin*, Kristina Chukavina, Andrey Pushkarev

*Corresponding author for this work

Research output: Contribution to journalArticleResearchpeer-review

5 Citations (Scopus)


To foresee global economic trends, one needs to understand the present startup companies that soon may become new market leaders. In this paper, we explore textual descriptions of more than 250 thousand startups in the Crunchbase database. We analyze the 2009–2019 period by using topic modeling. We propose a novel classification of startup companies free from expert bias that contains 38 topics and quantifies the weight of each of these topics for all the startups. Taking the year of establishment and geographical location of the startups into account, we measure which topics were increasing or decreasing their share over time, and which of them were predominantly present in Europe, North America, or other regions. We find that the share of startups focused on data analytics, social platforms, and financial transfers, and time management has risen, while an opposite trend is observed for mobile gaming, online news, and online social networks as well as legal and professional services. We also identify strong regional differences in topic distribution, suggesting certain concentration of the startups. For example, sustainable agriculture is presented stronger in South America and Africa, while pharmaceutics, in North America and Europe. Furthermore, we explore which pairs of topics tend to co-occur more often together, quantify how multisectoral the startups are, and which startup classes attract more investments. Finally, we compare our classification to the one existing in the Crunchbase database, demonstrating how we improve it.

Original languageEnglish
Number of pages31
JournalSmall business economics (Print)
Early online date1 Mar 2022
Publication statusPublished - 1 Mar 2022


  • Crunchbase
  • Entrepreneurship
  • Investments
  • Machine learning
  • Natural language processing


Dive into the research topics of 'Topic-based classification and identification of global trends for startup companies'. Together they form a unique fingerprint.

Cite this