Abstract
Large language model (LLM) tools are transforming the way evidence is retrieved by converting natural language prompts into quick, synthesized outputs. These platforms significantly reduce the time required for literature searches, making them more accessible to users unfamiliar with formal search strategies. A close evaluation of four prominent platforms—Undermind.ai, Scite.ai, Consensus.app, and OpenEvidence—highlights both notable advantages and ongoing limitations. Undermind and Consensus utilize the extensive Semantic Scholar database of over 200 million records, Scite enhances results with “Smart Citations” that indicate supportive or opposing references, and OpenEvidence applies a medically-focused LLM trained on licensed sources, including the complete NEJM archive. Despite their benefits, key limitations persist: opaque algorithms, inconsistent responses to identical queries, paywalls or sign-up barriers, and incomplete recall that may compromise systematic reviews. To support critical appraisal, we outline essential information-retrieval metrics—including recall, precision, F1-score, mean average precision, and specificity—and provide open-source code. Until validated, transparent evaluations demonstrate consistently high recall, these tools should be viewed as rapid, first-pass aids rather than replacements for structured database searches required by PRISMA-compliant methodologies.
References
Lawrence S, Bollacker K, Giles CL. Indexing and retrieval of scientific literature. In: Proceedings of the eighth international conference on Information and knowledge management. New York, NY, USA: ACM; 1999. p. 139–46.
Badami M, Benatallah B, Baez M. Adaptive search query generation and refinement in systematic literature review. Inf Syst. 2023 Jul;117:102231.
Bramer WM, Rethlefsen ML, Kleijnen J, Franco OH. Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev. 2017 Dec 6;6(1):245.
Ycombinator - Undermind [Internet]. [cited 2025 May 14]. Available from: https://www.ycombinator.com/companies/undermind
Undermind.AI [Internet]. [cited 2025 May 14]. Available from: https://www.undermind.ai/
Semantic Scholar [Internet]. [cited 2025 May 14]. Available from: https://www.semanticscholar.org/about
Giustini D. Undermind.ai (Product Review). 2025 Mar;
Undermind.ai - Pricing [Internet]. [cited 2025 May 12]. Available from: https://www.undermind.ai/#pricing
Knowledgie - scite vs. perplexity which is better? How does Knowledgie compare? [Internet]. [cited 2025 May 14]. Available from: https://www.knowledgie.com/compare/scite/perplexity
Scite.ai [Internet]. [cited 2025 May 14]. Available from: http://scite.ai/data-and-services
Hong Kong University of Science and Technology (HKUST) - Use Scite.ai like a Pro [Internet]. [cited 2025 May 14]. Available from: http://knowledgie.com/compare/scite/perplexity
Scite.ai - Pricing [Internet]. [cited 2025 May 12]. Available from: https://scite.ai/pricing
Consensus.app [Internet]. [cited 2025 May 14]. Available from: https://consensus.app/
Consensus - Using the Study Snapshot [Internet]. [cited 2025 May 14]. Available from: https://help.consensus.app/en/articles/10065008-using-the-study-snapshot?
Consensus.app - Pricing [Internet]. [cited 2025 May 12]. Available from: https://consensus.app/home/pricing/
OpenEvidence - OpenEvidence to Become a Mayo Clinic Platform Accelerate Company [Internet]. [cited 2025 May 14]. Available from: https://www.openevidence.com/announcements/openevidence-to-become-a-mayo-clinic-platform-accelerate-company
OpenEvidence [Internet]. [cited 2025 May 14]. Available from: https://www.openevidence.com/about
Wu V, Casauay J. OpenEvidence. Fam Med. 2025 Mar 5;57(3):232–3.
OpenEvidence - OpenEvidence and NEJM Group, publisher of the New England Journal of Medicine, sign content agreement [Internet]. [cited 2025 May 14]. Available from: https://www.openevidence.com/announcements/openevidence-and-nejm
Bramer WM, Giustini D, Kramer BMR. Comparing the coverage, recall, and precision of searches for 120 systematic reviews in Embase, MEDLINE, and Google Scholar: a prospective study. Syst Rev. 2016 Dec 1;5(1):39.
Krithara A, Nentidis A, Bougiatiotis K, Paliouras G. BioASQ-QA: A manually curated corpus for Biomedical Question Answering. Sci Data. 2023 Mar 27;10(1):170.
Roberts K, Demner-Fushman D, Voorhees EM, Bedrick S, Hersh WR. Overview of the TREC 2020 Precision Medicine Track. Text Retr Conf. 2020 Nov;1266.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2025 High Yield Medical Reviews