A tool to aggregate institutional publications, powered by the OpenAlex API.
- Clone this repository.
- cd to repository location and run
create_database.sh
- Set config options in config.ini. These include:
- institutional ROR id (required)
- email to be used for OpenAlex requests to get into the polite pool (optional, but recommended)
- ORCIDS of researchers to explicitly search for (useful when authors are missed by institutional affiliation search) (optional)
- Run
python main.py
- automatically ingests new records on subsequent runs.
- Run
python main.py --update_works
NOTE: spreadsheet must be tab separated. The CAS sheet is here.
- To load all the author data for the first time into the authors database table:
python populate_authors.py --local_sheet_path [your path] --container_name [container name] --load_data
- Where
[your path]
is the filepath of local downloaded spreadsheet with author records. - And
[container name]
is the name of the Docker container.
- To update the authors table with modified records from the authors spreadsheet:
python populate_authors.py --local_sheet_path [your path] --container_name [container name] --update_data
- Saves a csv of query results.
- Authors are concatenated into one field.
- Most useful for counting total CAS papers and getting institutional overview.
Run python queries.py --from_year [year] --to_year [year]
e.g. python queries.py --from_year 2022 --to_year 2022
- Saves a csv of query results.
- Authors are exploded (papers will appear more than once)
- e.g. publications are repeated for every author on that publication.
- This is useful for further filtering and grouping by individual author names, roles, etc.
Run python queries.py --single_authors --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --from_year 2022 --to_year 2022
Run python queries.py --single_authors --curators --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --curators --from_year 2022 --to_year 2022
Run python queries.py --single_authors --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --from_year 2022 --to_year 2022
Set --department
to one of the following allowed values:
Anthropology
Aquarium
Botany
Center for Biodiversity and Community Science
Center for Comparative Genomics
Center for Exploration and Travel Health
Coral Regeneration Lab
Education
Entomology
Herpetology
Ichthyology
iNaturalist
Invertebrate Zoology and Geology
Microbiology
Ornithology and Mammalogy
Planetarium
Scientific Computing
Run python queries.py --single_authors --department [department] --from_year [year] --to_year [year]
e.g. python queries.py --single_authors --department Botany --from_year 2022 --to_year 2022
- Also saves a csv sorted by counts of papers by publisher + journal.
Run python queries.py --single_authors --department [department] --from_year [year] --to_year [year] --journal_info
e.g. python queries.py --single_authors --department Botany --from_year 2022 --to_year 2022 --journal_info
- Also saves a csv sorted by goal counts.
Run python queries.py --single_authors --department [department] --from_year [year] --to_year [year] --sustainable_goals
e.g. python queries.py --single_authors --department Botany --from_year 2022 --to_year 2022 --sustainable_goals
Run python queries.py --from_year [year] --to_year [year] --open_access_info
e.g. python queries.py --from_year 2022 --to_year 2022 --open_access_info
run python send_emails.py
Please let us know when you encounter bugs and data quality errors with out tool!
You might also search for yourself in OpenAlex's author profiles to check accuracy of our source data.
You can open data curation requests with them by clicking the send feedback exclamation icon. They often take a while to respond though,
so open an issue with us as well and we can try to correct errors on our end.