Skip to content

Downloading NIMH fulltext papers from PMC with the goal of finding data statements

Notifications You must be signed in to change notification settings

nih-fmrif/nimh_fulltexts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nimh_fulltexts

This directory contains the basic building blocks we are using in a set of projects whose ultimate goal is to identify instances of data sharing and data reuse in PubMed Central texts.

The raw data cannot be shared directly, but the data download pipeline is reproduced here. First, you need a few index files:

  • All projects, publications, and link tables listed at Federal ExPORTER (https://federalreporter.nih.gov/FileDownload). These should be stored in their respective directories in this repo.
  • A PMC to PMID linking file (ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz warning: the file is ~500mb). This should be stored in /data/external/.

After downloading those, run the code in src/data/make_nimh_paper_list.R followed by notebooks/01.0-TAR-pull_fulltexts.ipynb. At the end of this process, you should end up with the full text of around 58k papers funded by the NIMH.

You can also see the submitted OHBM abstract (/reports/obhm_abstract.pdf) and the code used to make the figures therein (visualization/ohbm_figs.R).

About

Downloading NIMH fulltext papers from PMC with the goal of finding data statements

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published