Repository for exploring recommendation using language model for NBDT journal. We use a finetuned version of arazd/MIReAD trained on journal classification to create our embeddings. You can find the model on huggingface at biodatlab/MIReAD-Neuro
All notebooks were written in colab. The code assumes that you are running in a GPU environment. Please change .cuda()
or 'cuda'
to .cpu()
or 'cpu'
as necessary if you do not have access to a GPU environment.
Indexes created by different models for using in Langchain are availabe on the Hugging Face Space. Download the folder you require into your working directory.
Use the notebook build_abstract_database.ipynb and follow the instructions. You will need to create an account on Pinecone for this. The free tier allows the creation of 1 index. After creation of the account you can see your API key and ENV code in the API Keys section on your organization page. Those need to go in the notebook at -
PINECONE_API_KEY = ""
PINECONE_ENV = ""
Use the notebook fetch_recommendations.ipynb and follow the instructions. You will need to have an index on Pinecone for this. Your API key and ENV code can be found in the API Keys section on your organization page. Those need to go in the notebook at -
PINECONE_API_KEY = ""
PINECONE_ENV = ""
The name of your index must go in at -
index_name = 'reviewer-assignment' # Replace with your index name
index = pinecone.GRPCIndex(index_name)
Use the notebook build_vecstore.ipynb and follow the instructions. The notebook provides an 'index' folder with files named 'index.faiss' and 'index.pkl' which can be loaded in to the model at inference time
Use the notebook inference.ipynb and follow the instructions.
Use the notebook finetune_model_normal.ipynb and follow the instructions. You will need a dataset of paper abstracts with the title, abstract and journal. Load that in to the notebook at -
data = pd.read_csv('your_data.csv')
data.info()
Use the notebook finetune_model_contr.ipynb and follow the instructions. You will need a dataset of paper abstracts with the title, abstract and journal. Load that in to the notebook at -
data = pd.read_csv('your_data.csv')
data.info()