Systematic Review of Reporting guidelines for large language models used in healthcare research
PDF
HTML
XML

Supplementary Files

Supplementary file

Keywords

Large Language Models
Reporting Guidelines
CHART
TRIPOD

How to Cite

Alryalat, S. A., & Iyad Sultan. (2025). Systematic Review of Reporting guidelines for large language models used in healthcare research. High Yield Medical Reviews, 3(2). https://doi.org/10.59707/hymrUXPX7081

Abstract

This systematic review aims to synthesize existing reporting guidelines for large language models (LLMs) in healthcare research and evaluate their adequacy in addressing gaps in transparency, reproducibility, and clinical applicability. A systematic search was conducted to identify relevant studies on reporting guidelines for LLMs used in healthcare research using the PubMed database. We included 18 studies focused on reporting guidelines for LLMs used in healthcare research. The studies primarily aimed to develop or evaluate reporting frameworks to improve transparency, reproducibility, and methodological rigor in LLM applications. Several studies focused on creating structured reporting checklists for LLM applications in healthcare. The Chatbot Assessment Reporting Tool (CHART) was developed across multiple studies. Similarly, TRIPOD-LLM extended the TRIPOD+AI framework with 19 main items and 50 subitems, emphasizing modular reporting for diverse LLM tasks. Ultimately, while existing reporting guidelines represent an important advancement toward standardizing LLM research, their long-term impact will rely on broad adoption and iterative refinement to meet the evolving challenges of artificial intelligence.

https://doi.org/10.59707/hymrUXPX7081
PDF
HTML
XML

References

Bedi S, Liu Y, Orr-Ewing L, Dash D, Koyejo S, Callahan A, Fries JA, Wornow M, Swaminathan A, Lehmann LS, Hong HJ, Kashyap M, Chaurasia AR, Shah NR, Singh K, Tazbaz T, Milstein A, Pfeffer MA, Shah NH. Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. JAMA. 2025

Zhang L, Zhao Q, Zhang D, Song M, Zhang Y, Wang X. Application of large language models in healthcare: A bibliometric analysis. Digital health. 2025

Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, Demner-Fushman D, Dligach D, Daneshjou R, Fernandes C, Hansen LH, Landman A, Lehmann L, McCoy LG, Miller T, Moreno A, Munch N, Restrepo D, Savova G, Umeton R, Gichoya JW, Collins GS, Moons KGM, Celi LA, Bitterman DS. The TRIPOD-LLM reporting guideline for studies using large language models. Nature medicine. 2025

Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, Demner-Fushman D, Dligach D, Daneshjou R, Fernandes C, Hansen LH, Landman A, Lehmann L, McCoy LG, Miller T, Moreno A, Munch N, Restrepo D, Savova G, Umeton R, Gichoya JW, Collins GS, Moons KGM, Celi LA, Bitterman DS. The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use. medRxiv : the preprint server for health sciences. 2024

Fareed M, Fatima M, Uddin J, Ahmed A, Sattar MA. A systematic review of ethical considerations of large language models in healthcare and medicine. Frontiers in digital health. 2025. doi: 10.3389/fdgth.2025.1653631

Iqbal U, Tanweer A, Rahmanti AR, Greenfield D, Lee LT, Li YJ. Impact of large language model (ChatGPT) in healthcare: an umbrella review and evidence synthesis. Journal of biomedical science. 2025

Guo Z, Lai A, Thygesen JH, Farrington J, Keen T, Li K. Large Language Models for Mental Health Applications: Systematic Review. JMIR mental health. 2024

Hobensack M, von Gerich H, Vyas P, Withall J, Peltonen LM, Block LJ, Davies S, Chan R, Van Bulck L, Cho H, Paquin R, Mitchell J, Topaz M, Song J. A rapid review on current and potential uses of large language models in nursing. International journal of nursing studies. 2024

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2025 Saif Aldeen Alryalat, Iyad Sultan