We create a downloader, parser and database for NIH and NSF grant generating from their website. The link for NSF awards data is here and for NIH award is here.
Check out nih and nsf folder, we provide bash and
python script to download and parse data into csv file. Also checkout
dedupe folder soon where we put script to deduplicate and link
NIH/NSF grant together.
First, you have to install awscli using pip (see this instruction).
We now provide parsed data of NSF. You can use awscli to download as follows:
aws s3 cp s3://grant-dataset/ data/ --recursive --exclude dedupe/ --region us-west-2 # download nih, nsf, and grid dataThis contains around 2M grants (1.7 Gb) from NIH and 500k grants from NSF (700 Mb).
We have pandas and lxml as an dependencies provided in requirements.txt.
You can install the dependencies using pip.
pip -r install requirements.txt