A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
-
Updated
Jan 15, 2025 - Python
A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
Cross-type Biomedical Named Entity Recognition with Deep Multi-task Learning (Bioinformatics'19)
Bioformer: an efficient BERT model for biomedical text mining
[EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".
This repository contains the code used for distillation and fine-tuning of compact biomedical transformers that have been introduced in the paper "On The Effectiveness of Compact Biomedical Transformers"
Systematic evaluation of hallucination risks in Large Language Models (GPT-4, Claude 3, Gemini Pro) for clinical proteomics and mass spectrometry interpretation. Production-ready detection framework with comprehensive benchmarks.
Graph-based RAG system for biomedical nutrigenetic knowledge discovery. Enables natural language queries on gene-nutrient interactions, supports personalized nutrition counseling, and runs 100% locally with Ollama LLMs and SBERT embeddings.
BERT-for-BioNLP-OST2019-AGAC-Task2
AGAC-BioNL-OST2009-Task1 BERT+CRF
Implements relation extraction for biomedical texts using Hard Negative Mining to improve accuracy in identifying complex entity relationships. Includes code for data processing, training, and evaluation with BioC-format datasets.
Cancer-Alterome is a comprehensive and curated dataset that focuses on the investigation of regulatory events caused by gene alteration in the context of cancer.
MSR Cambridge Internship Summer 2023
RAG pipeline for medical question-answering. Fuses lexical and dense retrieval (MedCPT, Contriever, Specter + FAISS) with OpenAI, Gemini, and HuggingFace LLMs. Supports iterative multi-round reasoning, strict typing, structured observability, and a clean layered architecture
Core LLM for M.A.R.S. (Model Assisted Review System). Utilizes fine-tuned Llama 3.2 3B to automate biomedical SLR screening with 92.2% accuracy.
BioGemma — Google Gemma 3 1B fine-tuned on medical/biomedical corpus for clinical NLP tasks
MedQA-NLI is a comprehensive medical reasoning dataset comprising 42,889 instances designed for training and evaluating models on natural language inference (NLI) tasks in biomedical domains.
Clinical trial document intelligence pipelines using medallion architecture. Classification (87 categories) + NER (8 entity types) on Databricks.
Add a description, image, and links to the biomedical-nlp topic page so that developers can more easily learn about it.
To associate your repository with the biomedical-nlp topic, visit your repo's landing page and select "manage topics."