Abstract
AI-assisted workflows are transforming the way systematic reviews are conducted, converting complex evidence synthesis processes into rapid, high-throughput outputs. This shift significantly reduces the time required for evidence synthesis, making systematic reviews more scalable and accessible. A critical comparison of existing appraisal tools including AMSTAR/AMSTAR 2, ROBIS, JBI, CASP, MECIR, and GRADE highlights that these frameworks focus primarily on transparent reporting and retrospective methodological quality, failing to capture the integrity of the review process, measure reproducibility, or adequately assess automation checkpoints inherent in modern hybrid workflows. To address this gap and support critical appraisal, we introduce the High Yield Med Quality Evaluation Tool (HYMQET), a novel framework designed to provide a structured, quantitative assessment of workflow quality, methodological rigor, and automation transparency in both human-led and hybrid human-AI systematic reviews. The HYMQET employs a stepwise, workflow-based scoring system across five core domains: Query Development, Screening Quality, Field Selection for Data Extraction, Full-Text Data Extraction, and Manuscript Writing. Its quantitative, workflow-based structure makes it an essential tool for the external validation, quality control, and reliable benchmarking of emerging automated and hybrid systematic review methodologies.
References
Moosapour, H., F. Saeidifard, M. Aalaa, A. Soltani, and B. Larijani, The rationale behind systematic reviews in clinical medicine: a conceptual framework. J Diabetes Metab Disord, 2021. 20(1): p. 919-929.
van der Braak, K., P. Heus, C. Orelio, F. Netterström-Wedin, K.A. Robinson, H. Lund, and L. Hooft, Perspectives on systematic review protocol registration: a survey amongst stakeholders in the clinical research publication process. Syst Rev, 2023. 12(1): p. 234.
Kwong, J.C.C., A. Khondker, K. Lajkosz, M.B.A. McDermott, X.B. Frigola, M.D. McCradden, M. Mamdani, G.S. Kulkarni, and A.E.W. Johnson, APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support. JAMA Netw Open, 2023. 6(9): p. e2335377.
Cabello, J.B., M. Torralba, M. Maldonado Fernandez, M. Ubeda, E. Ansuategui, L. Ramos-Ruperto, J. Emparanza, I. Urreta, M. Iglesias, and J. Pijoan, CRITICAL APPRAISAL TOOLS FOR ARTIFICIAL INTELLIGENCE CLINICAL STUDIES: A SCOPING REVIEW. Journal of Medical Internet Research, 2025.
Page, M.J., J.E. McKenzie, P.M. Bossuyt, I. Boutron, T.C. Hoffmann, C.D. Mulrow, L. Shamseer, J.M. Tetzlaff, E.A. Akl, S.E. Brennan, R. Chou, J. Glanville, J.M. Grimshaw, A. Hróbjartsson, M.M. Lalu, T. Li, E.W. Loder, E. Mayo-Wilson, S. McDonald, L.A. McGuinness, L.A. Stewart, J. Thomas, A.C. Tricco, V.A. Welch, P. Whiting, and D. Moher, The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ, 2021: p. n71.
Shea, B.J., B.C. Reeves, G. Wells, M. Thuku, C. Hamel, J. Moran, D. Moher, P. Tugwell, V. Welch, E. Kristjansson, and D.A. Henry, AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ, 2017: p. j4008.
Whiting, P., J. Savović, J.P. Higgins, D.M. Caldwell, B.C. Reeves, B. Shea, P. Davies, J. Kleijnen, and R. Churchill, ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol, 2016. 69: p. 225-34.
Shea, B.J., J.M. Grimshaw, G.A. Wells, M. Boers, N. Andersson, C. Hamel, A.C. Porter, P. Tugwell, D. Moher, and L.M. Bouter, Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol, 2007. 7: p. 10.
Hilton, M., JBI critical appraisal checklist for systematic reviews and research syntheses (product review). Journal of the Canadian Health Libraries Association / Journal de l'Association des bibliothèques de la santé du Canada, 2024. 45(3).
Stone, J.C., K. Glass, J. Clark, M. Ritskes-Hoitinga, Z. Munn, P. Tugwell, and S.A.R. Doi, The MethodologicAl STandards for Epidemiological Research (MASTER) scale demonstrated a unified framework for bias assessment. J Clin Epidemiol, 2021. 134: p. 52-64.
Moher, D., D.J. Cook, S. Eastwood, I. Olkin, D. Rennie, and D.F. Stroup, Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet, 1999. 354(9193): p. 1896-900.
Guyatt, G.H., A.D. Oxman, G.E. Vist, R. Kunz, Y. Falck-Ytter, P. Alonso-Coello, and H.J. Schünemann, GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. Bmj, 2008. 336(7650): p. 924-6.
Musleh, A., N. Alwisi, H. Abu Serhan, A. Toubasi, L. Malkawi, and S.A. Alryalat, Artificial Intelligence Powered Research Automation (AIPRA) Versus Human Expert: A Two-Arm Ophthalmology Comparative Study. medRxiv, 2025: p. 2025.10.27.25338904.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2025 Nouran Alwisi, Fatima R Alsharif, Ayman Musleh
