Algorithm to analyze existing collection of scientific articles that are of interest for research group. Based on the analysis a machine learning model is built to score the value of newly published articles. Here, a random forest classifier and an word embedding LSTM model give best results but the model's result is limited by the quality of the data.
The Jupyter notebook contains the entire code. As the data is not public, I only show the top part of the database.
The requirements.txt can be used to build a working conda environment.