TY - JOUR
T1 - Multicenter validation of an artificial intelligence (AI)-based platform for the diagnosis of acute appendicitis
AU - Ghareeb, Waleed M.
AU - Draz, Eman
AU - Chen, Xianqiang
AU - Zhang, Junrong
AU - Tu, Pengsheng
AU - Madbouly, Khaled
AU - Moratal, Miriam
AU - Ghanem, Ahmed
AU - Amer, Mohamed
AU - Hassan, Ahmed
AU - Hussein, Ahmed H.
AU - Gabr, Haitham
AU - Faisal, Mohammed
AU - Khaled, Islam
AU - El Zaher, Haidi Abd
AU - Emile, Mona Hany
AU - Espin-Basany, Eloy
AU - Pellino, Gianluca
AU - Emile, Sameh Hany
N1 - Copyright © 2024 Elsevier Inc. All rights reserved.
PY - 2024/9
Y1 - 2024/9
N2 - Background: The current scores used to help diagnose acute appendicitis have a “gray” zone in which the diagnosis is usually inconclusive. Furthermore, the universal use of CT scanning is limited because of the radiation hazards and/or limited resources. Hence, it is imperative to have an accurate diagnostic tool to avoid unnecessary, negative appendectomies. Methods: This was an international, multicenter, retrospective cohort study. The diagnostic accuracy of the artificial intelligence platform was assessed by sensitivity, specificity, negative predictive value, the area under the receiver curve, precision curve, F1 score, and Matthews correlation coefficient. Moreover, calibration curve, decision curve analysis, and clinical impact curve analysis were used to assess the clinical utility of the artificial intelligence platform. The accuracy of the artificial intelligence platform was also compared to that of CT scanning. Results: Two data sets were used to assess the artificial intelligence platform: a multicenter real data set (n = 2,579) and a well-qualified synthetic data set (n = 9736). The platform showed a sensitivity of 92.2%, specificity of 97.2%, and negative predictive value of 98.7%. The artificial intelligence had good area under the receiver curve, precision, F1 score, and Matthews correlation coefficient (0.97, 86.7, 0.89, 0.88, respectively). Compared to CT scanning, the artificial intelligence platform had a better area under the receiver curve (0.92 vs 0.76), specificity (90.9 vs 53.3), precision (99.8 vs 98.9), and Matthews correlation coefficient (0.77 vs 0.72), comparable sensitivity (99.2 vs 100), and lower negative predictive value (67.6 vs 99.5). Decision curve analysis and clinical impact curve analysis intuitively revealed that the platform had a substantial net benefit within a realistic probability range from 6% to 96%. Conclusion: The current artificial intelligence platform had excellent sensitivity, specificity, and accuracy exceeding 90% and may help clinicians in decision making on patients with suspected acute appendicitis, particularly when access to CT scanning is limited.
AB - Background: The current scores used to help diagnose acute appendicitis have a “gray” zone in which the diagnosis is usually inconclusive. Furthermore, the universal use of CT scanning is limited because of the radiation hazards and/or limited resources. Hence, it is imperative to have an accurate diagnostic tool to avoid unnecessary, negative appendectomies. Methods: This was an international, multicenter, retrospective cohort study. The diagnostic accuracy of the artificial intelligence platform was assessed by sensitivity, specificity, negative predictive value, the area under the receiver curve, precision curve, F1 score, and Matthews correlation coefficient. Moreover, calibration curve, decision curve analysis, and clinical impact curve analysis were used to assess the clinical utility of the artificial intelligence platform. The accuracy of the artificial intelligence platform was also compared to that of CT scanning. Results: Two data sets were used to assess the artificial intelligence platform: a multicenter real data set (n = 2,579) and a well-qualified synthetic data set (n = 9736). The platform showed a sensitivity of 92.2%, specificity of 97.2%, and negative predictive value of 98.7%. The artificial intelligence had good area under the receiver curve, precision, F1 score, and Matthews correlation coefficient (0.97, 86.7, 0.89, 0.88, respectively). Compared to CT scanning, the artificial intelligence platform had a better area under the receiver curve (0.92 vs 0.76), specificity (90.9 vs 53.3), precision (99.8 vs 98.9), and Matthews correlation coefficient (0.77 vs 0.72), comparable sensitivity (99.2 vs 100), and lower negative predictive value (67.6 vs 99.5). Decision curve analysis and clinical impact curve analysis intuitively revealed that the platform had a substantial net benefit within a realistic probability range from 6% to 96%. Conclusion: The current artificial intelligence platform had excellent sensitivity, specificity, and accuracy exceeding 90% and may help clinicians in decision making on patients with suspected acute appendicitis, particularly when access to CT scanning is limited.
KW - Acute Disease
KW - Adolescent
KW - Adult
KW - Aged
KW - Appendectomy/methods
KW - Appendicitis/diagnostic imaging
KW - Artificial Intelligence
KW - Child
KW - Female
KW - Humans
KW - Male
KW - Middle Aged
KW - Predictive Value of Tests
KW - ROC Curve
KW - Retrospective Studies
KW - Sensitivity and Specificity
KW - Tomography, X-Ray Computed/methods
KW - Young Adult
UR - http://www.scopus.com/inward/record.url?scp=85196659542&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/b6651dd5-bc6d-3445-a2a0-e7598d74ce25/
UR - https://portalrecerca.uab.cat/en/publications/8576e7f8-aa33-4f19-a010-8feabce51412
U2 - 10.1016/j.surg.2024.05.007
DO - 10.1016/j.surg.2024.05.007
M3 - Article
C2 - 38910047
AN - SCOPUS:85196659542
SN - 0039-6060
VL - 176
SP - 569
EP - 576
JO - Surgery (United States)
JF - Surgery (United States)
IS - 3
ER -