Multicenter validation of an artificial intelligence (AI)-based platform for the diagnosis of acute appendicitis

Waleed M. Ghareeb*, Eman Draz, Xianqiang Chen, Junrong Zhang, Pengsheng Tu, Khaled Madbouly, Miriam Moratal, Ahmed Ghanem, Mohamed Amer, Ahmed Hassan, Ahmed H. Hussein, Haitham Gabr, Mohammed Faisal, Islam Khaled, Haidi Abd El Zaher, Mona Hany Emile, Eloy Espin-Basany, Gianluca Pellino, Sameh Hany Emile

*Corresponding author for this work

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Background: The current scores used to help diagnose acute appendicitis have a “gray” zone in which the diagnosis is usually inconclusive. Furthermore, the universal use of CT scanning is limited because of the radiation hazards and/or limited resources. Hence, it is imperative to have an accurate diagnostic tool to avoid unnecessary, negative appendectomies. Methods: This was an international, multicenter, retrospective cohort study. The diagnostic accuracy of the artificial intelligence platform was assessed by sensitivity, specificity, negative predictive value, the area under the receiver curve, precision curve, F1 score, and Matthews correlation coefficient. Moreover, calibration curve, decision curve analysis, and clinical impact curve analysis were used to assess the clinical utility of the artificial intelligence platform. The accuracy of the artificial intelligence platform was also compared to that of CT scanning. Results: Two data sets were used to assess the artificial intelligence platform: a multicenter real data set (n = 2,579) and a well-qualified synthetic data set (n = 9736). The platform showed a sensitivity of 92.2%, specificity of 97.2%, and negative predictive value of 98.7%. The artificial intelligence had good area under the receiver curve, precision, F1 score, and Matthews correlation coefficient (0.97, 86.7, 0.89, 0.88, respectively). Compared to CT scanning, the artificial intelligence platform had a better area under the receiver curve (0.92 vs 0.76), specificity (90.9 vs 53.3), precision (99.8 vs 98.9), and Matthews correlation coefficient (0.77 vs 0.72), comparable sensitivity (99.2 vs 100), and lower negative predictive value (67.6 vs 99.5). Decision curve analysis and clinical impact curve analysis intuitively revealed that the platform had a substantial net benefit within a realistic probability range from 6% to 96%. Conclusion: The current artificial intelligence platform had excellent sensitivity, specificity, and accuracy exceeding 90% and may help clinicians in decision making on patients with suspected acute appendicitis, particularly when access to CT scanning is limited.

Original languageEnglish
Pages (from-to)569-576
Number of pages8
JournalSurgery (United States)
Volume176
Issue number3
Early online date22 Jun 2024
DOIs
Publication statusPublished - Sept 2024

Keywords

  • Acute Disease
  • Adolescent
  • Adult
  • Aged
  • Appendectomy/methods
  • Appendicitis/diagnostic imaging
  • Artificial Intelligence
  • Child
  • Female
  • Humans
  • Male
  • Middle Aged
  • Predictive Value of Tests
  • ROC Curve
  • Retrospective Studies
  • Sensitivity and Specificity
  • Tomography, X-Ray Computed/methods
  • Young Adult

Fingerprint

Dive into the research topics of 'Multicenter validation of an artificial intelligence (AI)-based platform for the diagnosis of acute appendicitis'. Together they form a unique fingerprint.

Cite this