Source code repo for the analysis and experiments found in the paper: Paper title
./dataset_prep
: Scripts and notebooks for all M3 Dataset prep. Running MedCAT the pre-trained NEr+L model and aligning to raw input text.
./extractive_approach
: The extractive approach, model training, run scripts etc.
Our compute for CogStack data requires pre-built containers. I've included the required Dockerfile in the repo. to rebuild them use:
$ docker build -f Dockerfile.builder -t tsearle/summ_exp-base:latest .
This builds the base image, then use:
$ docker build . -t tsearle/summ_exp:latest
Once finished to run the container on all available GPU compute:
$ bash run_container.sh
If you just want to test the container on a CPU machine (i.e. a laptop) use:
$ bash run_container_cpu.sh
These run scripts mount two dirs, ./mimic_summ_data
and ./cg_summ_data/
.
You can use the pre-buil container again here. Open the guidance_experiment_cfg/<mim3 or cg>/<.json>
file, and edit the ds_path
to point to your huggingface
Huggingface, Meta and Nviai libraries enable this research:
MedCAT: docs, downloads and more on our clinical NER+L framework here.
Get in touch with the CogStack team here: contact@cogstack.org
@ARTICLE{Searle2022-bg, title = "Summarisation of Electronic Health Records with Clinical Concept Guidance", author = "Searle, Thomas and Ibrahim, Zina and Teo, James and Dobson, Richard", month = nov, year = 2022, archivePrefix = "arXiv", eprint = "2211.07126", primaryClass = "cs.CL", arxivid = "2211.07126" }