Here, we explore the history of scientific publishing delays. The findings from the analysis are discussed in a blog post by Daniel Himmelstein and feature in Nature News.
Delays are calculated from publisher-deposited PubMed history dates. Only journal articles published between 1960 and 2015 are included. Specifically, two delay types are calculated:
- acceptance delay — the number of days from receival to acceptance
- publication delay — the number of days from acceptance to online publication
To re-execute the analysis, run the following notebooks in the following order:
eutilities.ipynb
(python): Use PubMed's EUtility API to retrieve the list of relevant IDs using ESearch and article summaries using ESummary.process-esummary.ipynb
(python): Extract history dates from the ESummary XML output.extract-delays.ipynb
(R): Calculate acceptance and publications delays from the PubMed history dates.process-nlm-catalog.ipynb
(python): Download and process the NLM Catalog which contains the journals indexed by PubMed.visualize-history.ipynb
(R): Plot historical delays and export several TSV summaries of the dataset.webapp.ipynb
(python): Create JSON files used to initialize the select2 journal selection for the blog post.
The following data files are generated during execution:
pubmed-journals.tsv
: a dataframe of the NLM Catalog (journals in PubMed)history-dates.tsv.bz2
: a dataframe with all history dates extracted from the PubMed XMLdelays.tsv.gz
: a dataframe of all acceptance and publication delaysjournal-summaries.tsv
: a dataframe of summarizing delays for each journalyearly-summaries.tsv
: a dataframe of summarizing delays for each yearyearly-percentiles.tsv
: a dataframe of delay percentiles for each yearslopes.tsv
: a dataframe journal-specific delay slopes (Δ days of delay per year)
The following data files, generated by eutilities.ipynb
, are ignored due to large file size:
download/esearch_journal-articles_1960-2015.tsv.gz
with the list of relevant PubMed IDsdownload/esummary_journal-articles_1960-2015.xml.bz2
with combined XML output from the ESummary API queries
These files, along with several of the other files listed above, are available via figshare.