Skip to main navigation Skip to search Skip to main content

Machine learning for the analysis of healthy lifestyle data: a scoping review protocol

Tony Estrella, Carla Alfonso, Lluis Capdevila, Josep-Maria Losilla

Research output: Book/ReportCommissioned report

2 Downloads (Pure)

Abstract

BACKGROUND Advances in data science and technology have transformed lifestyle studies by enabling the integration of multimodal information and generation of large volumes of data. Despite the growing interest in machine learning (ML) in health behaviour research, significant methodological gaps remain. OBJECTIVE The study aims to systematically review the applications of supervised ML algorithms in analyzing healthy lifestyle (HL) data, with a specific focus on the methodological approach employed. The specific objectives are to explore the types and sources of data used in health outcomes, examine the ML processes employed, including explainability artificial intelligence (XAI) methods, and review the software tools utilized. Additionally, this review aims to provide practical guidelines to enhance the quality and transparency of future ML research in health. METHODS Following the PRISMA-ScR recommendations, the search was conducted across PubMed, PsychINFO, and Web of Science, resulting in 48 studies that meet the inclusion criteria. RESULTS Most studies (37, 77%), integrated multidomain data from physical activity, diet, sleep, and stress. Data sources were split between self-acquired (25, 52.08%) and health repositories (23, 47.92%). Single items measurements were common, particularly for physical activity, diet and sleep. Despite a multimodel approach in 28 studies, random forest was the most frequently used algorithm. Only 10 studies (20.83%) employed XAI methods, with 9 using SHapley Additive exPlanation (SHAP) values and 1 using Local Interpretable Model-agnostic Explanations (LIME). R was the most widely used software, with variations in the libraries employed. CONCLUSIONS This review highlights methodological gaps in the application of supervised ML to HL data. The ML workflow should span from data acquisition to explainability, with iterative steps to improve the process. Multidomain approaches in data acquisition enhance understanding of health issues related to lifestyle but are constrained by low data representativeness due to methodological limitations in acquisition. While random forest was prevalent, a multimodel approach is recommended for comprehensive comparison. Lifestyle components consistently ranked among the top features in studies that incorporated XAI. Integrating XAI methods into the ML pipeline can support personalized interventions, provided the data is accurately collected. The R metapackage tidymodels facilitates process evaluation through unified syntax, improving replicability. Methodological and reporting guidelines are provided to enhance transparency and replicability in multidisciplinary ML research
Original languageEnglish
Number of pages5
DOIs
Publication statusPublished - 18 Mar 2023

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Machine learning for the analysis of healthy lifestyle data: a scoping review protocol'. Together they form a unique fingerprint.

Cite this