Automatically tagging MEDLINE abstracts with OBO ontologies
Download the UMLS bulk after acquiring the licence (e.g., umls-2022AA-full.zip) and place it in the 'umls' folder.
Run the following commands in the console one by one.
unzip umls-2022AA-full.zip
mkdir META
mkdir NET
unzip 2022AA-full/2022aa-1-meta.nlm
unzip 2022AA-full/2022aa-2-meta.nlm
unzip 2022AA-full/2022aa-otherks.nlm
gunzip 2022AA/META/MRCONSO.RRF.aa.gz
gunzip 2022AA/META/MRCONSO.RRF.ab.gz
gunzip 2022AA/META/MRCONSO.RRF.ac.gz
cat 2022AA/META/MRCONSO.RRF.aa 2022AA/META/MRCONSO.RRF.ab 2022AA/META/MRCONSO.RRF.ac > META/MRCONSO.RRF
gunzip 2022AA/META/MRDEF.RRF.gz
mv 2022AA/META/MRDEF.RRF META/
gunzip 2022AA/META/MRSTY.RRF.gz
mv 2022AA/META/MRSTY.RRF META/
mv 2022AA/NET/SRDEF NET/
mv 2022AA/NET/SRSTRE1 NET/
gunzip 2022AA/META/MRXNS_ENG.RRF.aa.gz
gunzip 2022AA/META/MRXNS_ENG.RRF.ab.gz
cat 2022AA/META/MRXNS_ENG.RRF.aa 2022AA/META/MRXNS_ENG.RRF.ab > META/MRXNS_ENG.RRF
gunzip 2022AA/META/MRXNW_ENG.RRF.aa.gz
gunzip 2022AA/META/MRXNW_ENG.RRF.ab.gz
gunzip 2022AA/META/MRXNW_ENG.RRF.ac.gz
cat 2022AA/META/MRXNW_ENG.RRF.aa 2022AA/META/MRXNW_ENG.RRF.ab 2022AA/META/MRXNW_ENG.RRF.ac > META/MRXNW_ENG.RRF
$ conda create -n medobo python=3.6
$ conda activate medobo
(medobo)$ pip install -r requirements.txt
Download OBO ontologies as a folder and place in the root of project
(medobo)$ python dataset.py
Or download the preprocessed data from the Switch drived (for replication purposes, please make sure not to generate a new dataset, instead download the official splits and contents from Switch drive)
(medobo)$ python chi_sqaure.py
Download BioASK embedding, unzip and place it in 'Resources' folder
(medobo)$ python main_NB.py <num_of_training_data>
(medobo)$ python main_NB.py 100000
(medobo)$ python main_DL.py <num_of_training_data> <num_of_features>
(medobo)$ python main_DL.py 100000 50000