Skip to content

Generate a wordcloud using abstracts fetched from pubmed query

Notifications You must be signed in to change notification settings

marcodallavecchia/wordcloud_pubmed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Create wordcloud from Pubmed abstracts

A neat script to generate an customizable wordcloud by scraping abstracts on Pubmed using a user-defined query.

The idea is to use smart advanced Pubmed queries to obtain a wordcloud showing the most frequent words appearing in abstracts.

Example

See wordcloud_settings.json for the settings used to generate the output below.

Final output

example wordcloud

Used libraries

Author

Marco Dalla Vecchia

Installation

Python users

  1. Install conda
  2. Create conda environment
$ conda env create -f requirements.yml
  1. Activate environment
$ conda activate biopython-wordcloud-env
  1. Run the script and follow the instructions

In case you want to make use of them, make sure to have the mask and the json file in the same folder as the python script

$ python wordcloud_from_input.py

Windows

I was planning to make a single file executable file for Windows but I don't know how to handle the dependencies yet.

Usage

The script is designed to ask the user the most important information and settings for the creation of the wordcloud.

Settings

The script will ask for the following:

  • Is there a json config file already? If yes, it will generate a wordcloud purely based on those configurations. If not, proceed.
  • Email address → checked if it's valid format
  • Query → this is an pubmed advanced query, use the online tool to find the desired query then copy/paste it here
  • Background color → this is the color used for the background of the generated wordcloud. Defaults to transparent.
  • Colormap → this is a matplotlib valid colormap used to color the text of the wordcloud. Check the webpage and type in the name. Defaults to viridis.
  • Maximum number of fetched publications → this is the max number of papers fetched from Pubmed to create the wordcloud. Defaults to 300.
  • Name of mask file → this is the name of the black and white mask image which can optionally be used to give the wordcloud a custom shape. Defaults to no mask.

Output

  • papers.txt → this will contain DOI info, title, authors and abstract of found papers from Pubmed. Only the abstract texts will be used for the creation of the wordcloud.
  • wordcloud_settings.json → json file containing the settings used in the creation of the last wordcloud. It can be reused.
  • wordcloud.png → final output (png format to allow for transparency)

Contributions

This script can easily be adapted to other circumstances or the wordcloud settings can be further controlled by changing the code directly. Feel free to suggest possible improvements!

About

Generate a wordcloud using abstracts fetched from pubmed query

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages