This is a collection of Python scripts for searching pubmed using BioPython and working with eponymous terms.
This project serves to deposit the code used in the manuscript indicated below. In addition to being of interest to those studying medical eponyms, it should also be of general use for anyone looking to develop software for automatically searching Pubmed / Medline. The script pubmed_search_to_csv.py
provides a good example of how to use BioPython's Entrez.esearch and Entrez.efetch to search Pubmed and return search results even when they exceed the NCBI's Entrez eutils built-in limits. While the eutils API can be used directly, BioPython greatly simplifies the tedious aspects of making http requests (such as throttling, re-attempting and error handling) and is highly recommended for this task.
In addition to citing this GitHub repository (https://github.com/cornish/pubmed-eponyms), please cite the following paper:
- Toby C. Cornish, Larry J. Kricka, and Jason Y. Park. A Biopython-based method for comprehensively searching for eponyms in Pubmed. MethodsX. 2021; vol 8. doi: 10.1016/j.mex.2021.101264.
Gnu Public License v3, see text of the full license in project.
- Python 3.6 and up
- BioPython
rebase_terms.py
permute_terms.py
pubmed_search_to_csv.py
remove_pmid_dupes.py
pubmed_journals_by_year.py
A diagrammatic representation of data flow indicating the scripts used in the process. Please see individual scripts for usage and details.
An example of the term permutations created by permute_terms.py for terms with zero, one, and two separate names.
config.ini
- This is an INI-style configuration file where the scripts will look for Entrez-related credentials including your email and API key
- An API key for the e-utilities is not required at the time this was written, but may be in the future; currently it permits more requests per second to Entrez
- See here for more information about API keys for NCBI's E-utilities
-
gastrointestinal eponyms.txt
- This is the original list of terms collected from review articles:
- Kanne JP, Rohrmann CA, Lichtenstein JE. Eponyms in radiology of the digestive tract: historical perspectives and imaging appearances. Part I. Pharynx, esophagus, stomach, and intestine. Radiographics. 26(1) (2006) 129-42.
- Kanne JP, Rohrmann CA, Lichtenstein JE. Eponyms in radiology of the digestive tract: historical perspectives and imaging appearances. Part 2. Liver, biliary system, pancreas, peritoneum, and systemic disease. Radiographics. 26(2) (2006) 465-80.
- This is the original list of terms collected from review articles:
-
gi_eponyms_split.csv
- This is the original list with terms split into Name(s) and Term fields; multiple name eponyms should be separated by by hyphens to distinguish them from last names with internal spaces (i.e. "Van Slyke")
- Input to
rebase_terms.py
-
data/terms_re-base.csv
- Output of the
rebase_terms.py
script - Input to
permute_terms.py
- Standardized version of base names including removal of possessives and use of hyphens for multiple names
- Version of the data from the paper
- Output of the
-
data/terms_permuted.csv
- Output of the
permute_terms.py
script - Input to
pubmed_search_to_csv.py
- Permutations of terms to include possesives, various forms of joining multiple names, and inversions
- See examples above
- Version of the data from the paper
- Output of the
-
data/term_results.csv
- Output of the
pubmed_search_to_csv.py
script - Summarizes pubmed search results for all terms (including terms with no results)
- One row per term permutation
- Version of the data from the paper
- Output of the
-
data/pmid_results.csv
- Output of the
pubmed_search_to_csv.py
script - Input to
remove_pmid_dupes.py
- Input to
pubmed_journals_by_year.py
- Pubmed search results for all terms with hits
- One row per PMID
- Version of the data from the paper
- Output of the
-
data/pmid_results - dupes removed.csv
- Output of the
remove_pmid_dupes.py
script - Duplicate PMIDs removed within base terms
- Version of the data from the paper
- Output of the
-
data/journal_counts.csv
- Output of the
pubmed_journals_by_year.py
script - A matrix of total publications per year for all journals for which we have hits
- Ranges from the earliest year with hits to the latest year with hits
- Version of the data from the paper
- Output of the