Comparative Analysis of the Bibliographic Data Sources Using PubMed, Scopus, Web of Science, and Lens

Alaa Tarazi

Introduction

Bibliometric analysis is a quantitative analysis tool that uses various statistical tools to assess cooperation and the impact of publications within a specific field.1 It offers a comprehensive overview of academic literature and effectively identifies influential articles, authors, journals, institutes, and countries over time.2 Before 2004, researchers conducting bibliometric analysis would retrieve data from the Institute of Scientific Information (ISI) in the United States of America (USA), which is now integrated into the Core Collection of the Web of Science (WOS).3 In 2004, Google Scholar and Elsevier Scopus databases emerged, challenging the dominance of WOS. Subsequently, WOS and Scopus became competitors, leading to numerous studies aimed at differentiating between them for bibliometric analysis.4,5 One study stated that there is no superiority of one over the other,6 while another study stated that they are complementary to each other.7

However, WOS and Scopus remain the most commonly used databases across various fields,8 even though these databases are accessible only through subscription. Currently, many databases have emerged, to provide bibliographic data for publications, each offering distinct features and services, such as Microsoft Academic, Dimensions, and Crossref. Also, PubMed stands out as a freely accessible database, particularly renowned for its coverage in biomedical and life sciences.9

Additionally, the Lens database which is a free platform facilitating the discovery of both scholarly and patent literature, started in 1998 as Patent Lens offering access to patent literature.10 Established through a partnership between the Queensland University of Technology and Cambia,11 the database has evolved to encompass over 225 million scholarly works, and more than 127 million global patent records. This unique compilation of both scholarly works and patent data serves as a powerful resource for exploring the connections between research and innovation.12 Lens has been prominently used in several bibliometric analysis studies.13,14

This study aims to elucidate the methodologies used in utilizing PubMed, Scopus, WOS, and Lens for bibliometric analysis. It will encompass a detailed comparison of their characteristics, advantages, disadvantages, and variations in their applications across different bibliometric science mapping tools including VOSviewer. Bibliometric analysis, which is considered a valuable tool for the analysis of research output, is particularly relevant to researchers seeking to analyze publications trends and hotspots.

Methods and material

Sources of data

In this study, we compared the search strategy across four main databases, providing detailed description and practical example on the use of each database:

Data collection

To provide a practical example and assess the differences in search results between included databases, this study analyzed publication trends from the “University of Jordan” published between 2019 and 2023, as an institution, it was chosen due to the complexity associated with conducting bibliometric analysis for institutes as opposed to specific subjects. Moreover, the University of Jordan, is poorly studied in a bibliometric way as there are other universities in Jordan that incorporating “University of Jordan” in their names. These include: “Jordan University of Science and Technology”, “Al-Zaytoonah University of Jordan”, and “Jordan University Hospital”. This paper shows strategies to overcome the challenge posed by such naming overlaps in the search process.

The following describes the search methods and the example of the search which were in each database to produce the required information:

1. PubMed

  1. Choose advanced search from PubMed homepage (www.ncbi.nlm.nih.gov/pubmed).

  2. Enter the desired search term in the search field. Choose the search terms from the medical subject heading database (MeSH) database.

  3. Add as many fields as needed and choose the relation between these fields (AND, OR, or NOT).

  4. Click search

  5. Refine the search results further in the results using different filters available such as: (documents by specific years, publications per author, publications per country, etc.). Note that the final search done will be saved in the history of the advanced search.

  6. Export the results to further analyze them:

  7. Perform the following steps to analyze the University of Jordan research output during a 5-year period (2019-2023), using PubMed:

So as a result, the search query which was used in PubMed is: (((University of Jordan[Affiliation]) NOT (Jordan University of Science and technology[Affiliation])) NOT (Al-Zaytoonah University of Jordan[Affiliation])) NOT (Jordan University Hospital[Affiliation]).

2. Scopus

  1. Open the Scopus website (www.scopus.com), then register in order to get access to the full search capabilities. Check if your institution is already registered and has access to the database.

  2. Click on the “Advanced search” option, then enter the search term in the search field.

  3. Specify the fields in the article to be searched for. These fields include: all fields, titles, abstracts, keywords, authors, affiliations, and others.

  4. Add other desired fields to search for, and indicate the relation between these fields using (AND, OR, or AND NOT).

  5. Refine the search results from the results window directly, by choosing from the filters provided by Scopus, such as: (documents per year, documents per journal, documents per author or institute, the type of documents in addition to other options).

  6. Choose to either analyze results directly on the Scopus by selecting “Analyze Search Results”, or to export the results in different formats which include: CSV, RIS, BibTex, or Plain text.

  7. Perform the following steps to analyze the University of Jordan’s research output during five years (2019-2023) using Scopus:

So, the following search query was used in Scopus

AFFILORG (University of Jordan AND NOT Al-Zaytoonah University of Jordan AND NOT Jordan University of Science and Technology).

3. WOS

  1. Register in order to get access to the full search capabilities of the WOS database, the user can register by the institute if it was already registered with WOS.

  2. Open WOS home page (www.webofknowledge.com), it will open the website’s basic search, which includes Web of Science Core Collection as the selected database for search.

  3. Choose the Advanced Search option, which has a certain field to search for in the article such as organization, authors, title with abstract, and others.

  4. Write the field you need to search for and then the desired search word.

  5. Add another field, with an operator between them (AND, OR, NOT), then click search.

  6. After completing the search, sort the results according to either data, certain duration, journal types, or certain research areas.

  7. There is a choice to view the results and analyze them by clicking on “Analyze results” from the options provided by the WOS results window.

  8. Export the results, WOS provides several formats to export the results (e.g. Plain text, RIS, Excel). Notably, WOS has a limit on the number of the exported documents, it only allows the downloading of 5,000 documents at a time. So, in case there is a need to export more than 5,000 it will be in more than one batch.

  9. Perform the following steps to analyze the University of Jordan research output during a 5-year period (2019-2023), using WOS:

So, the following search query was used in WOS

OG=“University of Jordan” NOT OG=“Jordan University of Science and Technology” NOT OG=“Al-Zaytoonah University of Jordan” NOT OG=“Jordan University Hospital”.

4. Lens

  1. Choose a structured search from the Lens homepage (www.lens.org).

  2. Choose the field to search within, then type the desired search term. It also provides the choice to show the results according to a certain priority (e.g. according to a certain date), or according to a certain flag (e.g. has abstract or full-text). Lens database provides a “Query text editor” which can provide suggestions to edit the query, it also provides a profile search which can search for authors and inventors by their name or ORCID ID.

  3. Type the search term in the search field. It provides the operators “AND”, and “OR” to be inserted between the terms, then click on search.

  4. Refine the results using the different filters provided by Lens which include: date range, certain journals, document type, etc.

  5. Export the results to further analyze them, Lens provides various formats for the export such as: CVS, RIS, and BibTeX. Lens needs a subscription in order to be able to transfer more than 1,000 records and up to 50,000 records.

  6. Perform the following steps to analyze University of Jordan research output during a 5-year period (2019-2023), using Lens:

So, the following search query was used in Lens

“University of Jordan” NOT “Jordan University of Science and Technology” NOT “Al-Zaytoonah University of Jordan” NOT “Jordan University Hospital”.

Data Analysis

For this study, Microsoft Excel 2019 was employed for duplication checking and exporting the analyzed data. To conduct the required quantitative analysis and visualize the collected literature data from PubMed, Scopus, WOS, and Lens, VOSviewer 1.6.20 (Centre for Science and Technology Studies, Leiden University, The Netherlands) was used. Using this tool, collaboration networks of authors, countries, institutions, keywords, journals, and the identification of the most cited documents were generated. Various formats were used to extract data from these databases to suit VOSviewer. PubMed data was exported in the PubMed format, and WOS data in the form of full records with cited references, both stored as Txt files. On the other hand, Scopus and Lens data were exported in CSV format. Top of Form

Results

Data collection

The data was extracted from PubMed, Scopus, WOS, and Lens databases. A total of 2739, 7777, 7518, and 4326 publications were retrieved from these databases, respectively. The data collection was completed on January 21, 2024. All types of published scientific output were considered for this study.

Literature growth/trends

Supplementary table 1 illustrates the yearly distribution of literature resulting from the search strategy for “University of Jordan” in each database for the period 2019-2023. The literature related to the “University of Jordan” has shown a consistent increase in both PubMed and Scopus over the last five years. In the case of WOS, there was a gradual increase until 2023 when it experienced a decrease compared to the preceding year. Conversely, the literature growth trend in Lens demonstrated a gradual rise from 2019 to 2021, followed by a notable decline to nearly half of the previous document count. Figure 1. Shows the annual growth in documents for each database used in this study.

Annual literature growth trends in each database.

Distribution of authors

Among different databases that were used in this study, there exists variability in the ranking of the top ten most productive authors based on the number of documents, as shown in Supplementary table 2. Example of such variability include “Mohammed S. Mubarak”, who is ranked as the second author in PubMed with 92 publications, while he is ranked as the fourth most productive author in Lens with 64 publications. “Saif Aldeen Alryalat” claims the third spot in PubMed with 57 documents, maintaining the same rank in Lens with 67 documents. Additionally, “Muhammad Alshurideh” takes the lead as the most productive author in both Scopus with 146 documents and Lens with 183 documents. However, WOS does not share any authors within the top ten most productive authors when compared with other databases, as shown in Supplementary table 2.

Distribution of countries

The analysis of the country’s scientific production focused on the author’s affiliated country. only Scopus and WOS were considered for affiliated country analysis due to limitations in the VOSviewer software, which doesn’t provide the choice of country analysis for Lens and PubMed. Jordan and USA emerged as the top two most productive countries in both, Scopus and WOS. Jordan had 3,996 documents with 13,615 citations in Scopus, while in WOS it recorded 7,516 documents with 69,563 citations. Despite the overall closeness in the number of articles between Scopus and WOS, substantial differences exist in the document and citation counts for each country. Most shared countries had a lower rank in Scopus compared to WOS, such as Saudi Arabia, Germany, and Italy. However, UAE and Canada demonstrated a higher rank in Scopus than in WOS, though all shared countries reported higher document and citation numbers in WOS than in Scopus, as indicated in Supplementary table 3.

Distribution of keywords

The analysis of the most relevant keywords, based on their occurrence/frequencies and trending topics, was conducted using all keywords (author and MeSH keywords). Supplementary table 4 shows the top ten most frequent keywords in each database using our search example. In both PubMed and Lens, “Humans” emerged as the most frequent keyword. PubMed and Lens share the same top five keywords with variation in occurrence. Furthermore, PubMed, Scopus, and Lens exhibit almost identical keywords with slight differences in ranks and frequencies. In contrast, WOS shares only two keywords with other databases, namely “Jordan” and “Covid-19”.

Distribution of journals

Supplementary table 5. displays the top ten most published sources as retrieved from each data file, excluding PubMed, as it does not offer this option in VOSviewer. In Scopus, 517 documents were published in the listed top ten sources, while in WOS, 590 documents were published. Lens recorded 493 documents published in the top 10 journals. A variability was observed in each database, except for three journals- Heliyon, Plos One, and Sustainability- which were shared across all three databases, each with different ranks and number of documents. Additionally, two journals were shared between Scopus and Lens, namely Dirasat: Human and Social Sciences, and Theory and Practice in Language Studies. Both of these journals held the third position in their respective databases, with almost the same number of documents: 50 and 51, respectively. Top of Form

Discussion

In this article, a comprehensive analysis of prominent bibliometric databases and software has been conducted, focusing on their characteristics and drawing a comparison among them. Table 1 further compares the characteristics of each provided database. The investigation begins with disparities in search strategies, noting that Scopus, WOS, and Lens, offer more detailed filters that can be applied for refining search results. In contrast, PubMed has more limited filtering options in comparison, including the number of open-access publications, and hot papers in a certain field.

199903
Comparing the characteristics of PubMed, Scopus, WOS, and Lens.
Characteristic PubMed Scopus WOS Lens
Covered disciplines Life sciences, behavioral sciences, chemical sciences, and bioengineering disciplines. All disciplines All disciplines All disciplines
Main areas Life sciences and biomedical disciplines. Social sciences and Arts & Humanities, in addition to Science, Technology & Medicine Technology, social sciences, arts and humanities. scholarly research and patent knowledge, policy, laws, regulations, investment, social norms and business data.
Covered duration 1966 1970 1900 NA
Free/ subscription Free Subscription Subscription Free
Ownership National institute of health Elsevier Clarivate Cambia
Open access assessment Gold open access Gold open access Green and gold open access Gold and green open access
Availability of operators + +++ ++ +
Export flexibility + +++ ++ +

Regarding the exporting options of the databases, there is a notable variability among them. PubMed, for instance, permits the export of only the first 10,000 records. WOS offers different export formats, with plain text or tab-delimited being the most suitable for bibliometric purposes. It allows the export of 1,000 records per export for CSV format, while only permitting 500 records per export in plain text format. Scopus provides different exporting options, but for bibliometric purposes, the RIS and CSV formats are the most relevant.15 It allows the export of up to 20,000 records at a time. In the case of Lens, it can export more than 1,000 up to 50,000 records when logged in with a subscription to the database. However, PubMed is the only open-accessible database among these four, while WOS, Scopus, and Lens require a subscription to access more features such as exporting additional records and enhanced search capabilities.

Among the articles published annually in the period (2019-2023), notable variability was observed between databases. PubMed achieved the lowest number of articles; which may be attributed to its narrower scope and coverage compared with the other used databases.16 In contrast, Scopus and WOS showed a close competition in the number of published articles, with Scopus having more articles than WOS in our search example. This discrepancy might be explained by the fact that Scopus offers more extensive citation analysis capabilities than WOS.17 Interestingly, in our search example, Lens yielded a lower number of articles compared to Scopus and WOS, despite being considered a more comprehensive and coverage database in comparison.18

Regarding the distribution of authors between databases, a significant difference was observed in the number of documents in WOS compared to other databases. Additionally, WOS did not share any author within the top ten listed authors with authors from other databases. This discrepancy in the number of documents in WOS might be attributed to the fact that WOS does not index everything, and as a result, it may not always present the complete picture of an author’s body of work. In the analysis of country distribution, the number of citations per country in WOS was much higher compared to Scopus, even though some studies reported that Scopus has a higher citation count than WOS.19,20 It is interesting to note that PubMed doesn’t offer the choice of country analysis in VOSviewer, while this option is available using CiteSpace.21 For the analysis of organizational distribution, there was a higher number of documents in WOS. This could be the fact that WOS reports institutions without specifying the department, making the institute more inclusive in terms of the number of documents.Top of Form

There was not a significant variability observed in the keywords used across the four databases, although WOS has fewer shared keywords compared with the other used databases. However, in the analysis of journals, Scopus demonstrated a higher number of documents per journal compared to WOS. This aligns with previous studies reporting that Scopus has broader journal coverage compared to WOS.22 Furthermore, the journal “Dirasat: Human and Social Sciences” ranked as the top most productive journal in Scopus but did not appear in the top ten journals in WOS. This might be explained by the fact that humanities and social sciences are more comprehensively covered in the Scopus database compared to WOS.23

Among the different databases used in this article, PubMed offers the fewest analysis options when using different analysis software, including VOSviewer and CiteSpace. Data from PubMed cannot be analyzed based on bibliographic coupling or citation analysis.24,25 This limitation stems from the fact that PubMed data doesn’t provide cited references, which makes citation-based analytical methods inapplicable.25 On the other hand, when using the Lens database, the only analysis options that cannot be provided using VOSviewer are the organizations and country distribution. This limitation arises since Lens doesn’t provide data about countries or organizations that can analyzed using CSV format. Indeed, Lens can be utilized in analysis with VOSviewer, offering compatibility for visualization and exploration within this software. However, it’s worth noting that Lens doesn’t have the option for integration with CiteSpace, limiting its use in conjunction with this specific analysis tool.

Conclusion

Each database has its own set of advantages and drawbacks, making it essential for researchers to choose the one that aligns best with their specific purposes. The value of bibliographic data source depends on various factors including the coverage, completeness, and accuracy of the data provided as well as the format through which the data is made accessible. Notable differences were observed in the coverage of the documents in each database, with Scopus and WOS achieving the closest number of documents. Additionally, variations were noted between databases regarding the exporting option and their upper limit for exportation. There was variability in the ranks, number of documents, and citations of authors, countries, journals, and institutions across PubMed, Scopus, WOS, and Lens. Keyword distribution, however, exhibited more shared ranks compared to other variables analyzed across databases. PubMed limited ability for analysis due to the absence of the citation data stands out as a noteworthy constraint. VOSviewer offers extensive visualization options and can load and import data from various databases with a significant number of documents in different formats. Lens, among the databases used, is considered comprehensive and competitive with WOS and Scopus, offering various options. Future bibliometric research should delve deeper into comparing Lens with different databases from various perspectives, providing a more detailed understanding of its strengths and limitations in comparison to other databases.

Conflict of interest

I declare that I have no competing interests.

Acknowledgment

None.

199629
Supplementary Material