Current limitations in predicting mRNA translation with deep learning models

Niels Schlusser, Asier González, Muskan Pandey, Mihaela Zavolan

Research output: Contribution to journalArticleResearchpeer-review

Abstract

The design of nucleotide sequences with defined properties is long-standing problem in bioengineering. An important application is protein expression, be it in the context of research or the production of mRNA vaccines. The rate of protein synthesis depends on the 5’ untranslated region (5’UTR) of the mRNAs, and recently, deep learning models were proposed to predict the translation output of mRNAs from the 5’UTR sequence. At the same time, large data sets of endogenous and reporter mRNA translation have become available. In this study we use complementary data obtained in two different cell types to assess the accuracy and generality of currently available models of translation. We find that while performing well on the data sets on which they were trained, deep learning models do not generalize well to other data sets, in particular of endogenous mRNAs, which differ in many properties from reporter constructs. These differences limit the ability of deep learning models to uncover mechanisms of translation control and to predict the impact of genetic variation. We suggest directions that combine high-throughput measurements and machine learning to unravel mechanisms of translation control and improve construct design.
Original languageEnglish
Article number227
Number of pages29
JournalGenome Biology
Volume25
Issue number1
DOIs
Publication statusPublished - 22 Jan 2024

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Deep learning
  • Explainable AI
  • Systems biology
  • Translation control

Fingerprint

Dive into the research topics of 'Current limitations in predicting mRNA translation with deep learning models'. Together they form a unique fingerprint.

Cite this