Evaluating Advanced AI in Oncology Education and Clinical Knowledge Assessment

Authors

  • Yasar Ahmed Department of Medical Oncology, St. Vincent’s University Hospital, Ireland
  • Hatim Ibrahim Department of Medical Oncology, St. Vincent’s University Hospital, Ireland
  • Simaa Hamid Independent Researcher, Dublin, Ireland

DOI:

https://doi.org/10.47709/ijmdsa.v5i2.5343

Keywords:

Artificial Intelligence, Medical Oncology, Multimodal Large Language Model, ChatGPT

Abstract

The rapid advancement of artificial intelligence (AI) has introduced powerful tools like the Multimodal Large Language Model (MLLM) with the potential to revolutionize medical practices, including oncology. This study investigates the performance of two such MLLMs, GPT-4o and Gemini Advanced, in answering oncology examination questions from the American Society of Clinical Oncology Self-Evaluation Program (ASCO-SEP) Question Bank. We extracted 832 multiple-choice questions from this bank, covering various oncological tasks such as diagnosis, treatment recommendations, and basic science knowledge. Both models were presented with these questions, and their responses were evaluated against the official answer key. Gemini Advanced outperformed GPT-4o, achieving 74.84% accuracy compared to 60% for GPT-4o. Further analysis revealed that Gemini Advanced consistently outperformed GPT-4o across all task categories, particularly in making diagnoses, ordering and interpreting test results, and recommending treatment or patient care. Both models encountered the most difficulty with questions related to pathophysiology and basic science knowledge. These findings suggest that while both MLLMs demonstrate a significant understanding of oncological knowledge, there remains room for improvement, particularly in handling complex clinical scenarios and integrating basic science knowledge. This study contributes to the growing body of evidence assessing the capabilities and limitations of AI in medical oncology, highlighting its potential role in augmenting clinical practice and medical education.

References

1. Raiaan MAK, Mukta MdSH, Fatema K, Fahad NM, Sakib S, Mim MMJ, et al. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access. 2024;12:26839–74.

2. Butte AJ. Artificial Intelligence-From Starting Pilots to Scalable Privilege. JAMA Oncol. 2023 Oct 1;9(10):1341–2.

3. Rane N, Choudhary S, Rane J. Gemini Versus ChatGPT: Applications, Performance, Architecture, Capabilities, and Implementation [Internet]. Rochester, NY; 2024 [cited 2024 Sep 26]. Available from: https://papers.ssrn.com/abstract=4723687

4. Ahmed Y. Utilization of ChatGPT in Medical Education: Applications and Implications for Curriculum Enhancement. Acta Inform Medica. 2023;31(4):300–5.

5. Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent Off Publ Am Acad Esthet Dent Al. 2023 Oct;35(7):1098–102.

6. Weng TL, Wang YM, Chang S, Chen TJ, Hwang SJ. ChatGPT failed Taiwan’s Family Medicine Board Exam. J Chin Med Assoc JCMA. 2023 Aug 1;86(8):762–6.

7. Le M, Davis M. ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags. Glob Pediatr Health. 2024 Mar 24;11:2333794X241240327.

8. Skalidis I, Cagnina A, Luangphiphat W, Mahendiran T, Muller O, Abbe E, et al. ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story? Eur Heart J Digit Health. 2023 May;4(3):279–81.

9. Suchman K, Garg S, Trindade AJ. Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test. Am J Gastroenterol. 2023 Dec 1;118(12):2280–2.

10. Mihalache A, Popovic MM, Muni RH. Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment. JAMA Ophthalmol. 2023 Jun 1;141(6):589–97.

11. Luo H, Yan J, Zhou X. Evaluating artificial intelligence responses to respiratory medicine questions. Respirology. 2024;29(7):640–3.

12. Nicikowski J, Szczepa?ski M, Miedziaszczyk M, Kudli?ski B. The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland. Clin Kidney J. 2024 Jul 1;17(8):sfae193.

13. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023 Feb 9;2(2):e0000198.

14. Jin HK, Lee HE, Kim E. Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis. BMC Med Educ. 2024 Sep 16;24(1):1013.

15. Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, et al. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. J Med Internet Res. 2024 Jul 25;26(1):e60807.

16. ASCO-SEP for Training Programs - Informational Page | ASCO Education [Internet]. [cited 2024 Sep 27]. Available from: https://education.asco.org/product-details/ascoSEPtrainingprograms

17. Longwell JB, Hirsch I, Binder F, Gonzalez Conchas GA, Mau D, Jang R, et al. Performance of Large Language Models on Medical Oncology Examination Questions. JAMA Netw Open. 2024 Jun 18;7(6):e2417641.

18. Chen S, Kann BH, Foote MB, Aerts HJWL, Savova GK, Mak RH, et al. Use of Artificial Intelligence Chatbots for Cancer Treatment Information. JAMA Oncol. 2023 Oct;9(10):1459–62.

19. Barbour AB, Barbour TA. A Radiation Oncology Board Exam of ChatGPT. Cureus. 15(9):e44541.

20. Chow R, Hasan S, Zheng A, Gao C, Valdes G, Yu F, et al. The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions. J Am Coll Radiol [Internet]. 2024 Aug 2 [cited 2024 Sep 26];0(0). Available from: https://www.jacr.org/article/S1546-1440(24)00675-6/fulltext

21. Odabashian R, Bastin D, Jones G, Manzoor M, Tangestaniapour S, Assad M, et al. Assessment of ChatGPT-3.5’s Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks. JMIR AI. 2024 Jan 12;3(1):e50442.

22. Filippov E, Lizogub O, Kovalenko I, Golubykh K, Khunkhun R. Performance of ChatGPT on the European Society for Medical Oncology (ESMO) Exam: Comparative Analysis (Preprint). JMIR Prepr. 2024 Jan 20;

23. Hochmair HH, Juhász L, Kemp T. Correctness Comparison of ChatGPT-4, Gemini, Claude-3, and Copilot for Spatial Tasks. Trans GIS [Internet]. [cited 2024 Oct 1];n/a(n/a). Available from: https://onlinelibrary.wiley.com/doi/abs/10.1111/tgis.13233

24. Rane N, Choudhary S, Rane J. Gemini Versus ChatGPT: Applications, Performance, Architecture, Capabilities, and Implementation [Internet]. Rochester, NY; 2024 [cited 2024 Oct 1]. Available from: https://papers.ssrn.com/abstract=4723687

25. Pan A, Musheyev D, Bockelman D, Loeb S, Kabarriti AE. Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer. JAMA Oncol. 2023 Oct 1;9(10):1437–40.

26. How Hard Is the ABIM Certification Exam? ABIM Exam Explained. [Internet]. 2022 [cited 2024 Sep 28]. Available from: https://challengercme.com/blog/articles/2022/06/how-hard-is-the-abim-internal-medicine-board-exam

27. AHMED Y, TAHA MH, KHAYAL S. Integrating Research and Teaching in Medical Education: Challenges, Strategies, and Implications for Healthcare. J Adv Med Educ Prof. 2024 Jan 1;12(1):1–7.

Downloads

Published

2026-02-23

How to Cite

Ahmed, Y., Ibrahim, H., & Hamid, S. (2026). Evaluating Advanced AI in Oncology Education and Clinical Knowledge Assessment. International Journal of Multidisciplinary Sciences and Arts, 5(2), 320–326. https://doi.org/10.47709/ijmdsa.v5i2.5343

Similar Articles

<< < 1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.