The OCO Publications Tool gathers, ingests, organizes, reports, and highlights publications related to the OCO missions. This software can be used for any subject matter though, it does not have to be limited to a specific project, misison, or topic. However, as it was created for the OCO missions, you may see examples on pages and their forms that are OCO-specific -- feel free to change as needed.
The OCO Publications Tool was built using Python's Bottle library, with a MySQL database backend. The site employs very basic styling from Bootstrap 5, so you are left to style the site the way you would like.
The data on publications for this tool comes from Google Scholar. You will need to set up alerts for whatever keywords you are interested in, and have them mailed to an Inbox capable of downloading the messages as HTML files that can be parsed on your local system. See the utils/parse_alerts.py for an example of how you can parse these email messages and ingest them into the database. Alert messages come in as they are detected by Google, so check often for new messages.
The website is powered by Python's Bottle library, but launching the main.py script. Remember to set debug=False if using in production. Any credentials should be in the config.json file. You may also need to conigure the site to run over HTTPS.
The front end of the tool icludes the following pages:
- Homepage: The main page that lists all available publications in the database, giving a total count at the top of the page. It shows how many publications were published in each year, in addition to publications that are in press or in review. You can also filter by year.
- Review: This page shows what new publications have been ingested into the database from the Google Scholar Alerts. You can approve or reject entries here. If 'approved,' the publication is added to the database and immediately displayed on the website. If 'rejected,' the publication is removed from the database. You can also add comments about why the publication was approved or rejected, if desired. All data for this page is stored in the
newPublicationstable int eh database, which is meant only hold this temporary data. - Add: Here you can manually add information on a publication if it was not detected by a Google Scholar Alert. You can import a publication using its DOI and verifying all the data it populates (data pulled from CrossRef) or enter it yourself.
- Update: This page shows all existing entries in the database. If you wish to update an entry, click on the 'UPDATE CITATION' link and you can edit its information. Note that there is no 'delete' function through the website. Deletions must be manually done through the database.
- Highlights: This section of the site is used to select any publications you think should be highlighted, and you can assign team members to create and submit highlight slides.
- Pending Highlights: All newly entered publications appear on this pending page. If you think a publication is worth highlighting, assign it a rank and a team member to review it. That team member should receive an email with their assignment.
- In-Progress Highlights: This page shows all highlights that were assigned to team memebers. Here, team memebers can submit their highlight slides, which are uploaded to the
slidesdirectory in the webroot. Slides should be in.pdfor.pptxformat. - Complete Highlights: List of all completed highlights, with links to their slides.
- Graveyard Highlights: List of publications that were not highlighted. If you decide that one of these is actually worth highlighting, you can move it back to the In-Progress page.
- Reports: This page has links to API endpoints that generate CSV reports.
As noted, the database backend is powered by MySQL. The database schema used for the site can be found in db.sql. It will generate a database with empty tables.
- The
reports/folder contains an example report that could be generated by the site. - The
slides/folder contains a blank slide. This is the folder you would want to upload any highlight slides in. - The
utils/folder includes an example email parser and function that alerts assignees that they have new highlights to review. - The
config.jsonis an exmaple of what credentials are needed. - The
db.sqlfile includes the structure of the database.
Note that there are places in various scripts that call for a URL or email addresses. You will have to update these to your website's URL and a list of folks you want to email.
Adding a Publication to the System
flowchart TD;
A[Google Alert of new publications];
A --> B[System adds to Review page];
B --> C[SUPERUSER Approves or Rejects based on relevance];
C --> D[Rejected: Publication removed from system];
C --> E[Approved: CURATOR notified];
E --> F[CURATOR adds publication to website] --> |optional| G[USER updates a publication];
Highlights
flowchart TD;
A[Added publication becomes a pending highlight];
A --> B[SUPERUSER assigns highlight to user or rejects as a highlight] --> C[Rejected: Highlight sent to Graveyard];
B --> D[Assigned: Highlight moved to In-Progress];
D --> E[USER reviews highlight] --> C;
E --> F[Accepted: Highlight slide uploaded by USER];
F --> G[Highlight moved to Complete];
Roles
Roles are defined in the database:
- USERS are responsible for reviewing and adding highlights
- SUPERUSERS are responsible approving/rejecting new publications and assigning/rejecting new highlights
- CURATORS are responsible for adding newly approved publications to the website
Author names are a particular problem since multiple people can share the same name. We make an attempt to store unique names in the database authors.authorID for each unique author name we encounter. We then store a string of these in publications.fullAuthors, with IDs separated by the '#' characters. There are probably better ways to do this, such as just storing a long string of the authors, doing away with the authors table entirely -- though you would still want to make sure all author names entered for a pubication conform to some standard so that all your citations are consistent. Another way would be to use ORCIDs.
You may need to remove the mysqlclient requirement from the file, and install it separately through pip or conda.