Skip to content

Latest commit

 

History

History
117 lines (103 loc) · 13.1 KB

README.md

File metadata and controls

117 lines (103 loc) · 13.1 KB

LLM4Chemistry

This repository collects papers on Large Language Model for Chemistry.

😎 Welcome to recommend missing papers through Adding Issues or Pull Requests.

Contents

Fine-tuning LLM for Chemistry

  • 2022.05 Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned. ACL Workshop
  • 2022.11 Galactica: A large language model for science. arXiv
  • 2022.11 Is GPT-3 all you need for machine learning for chemistry? NIPS2022 Workshop
  • 2023.08 Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules. Chemical Science
  • 2023.08 HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science. EMNLP2023
  • 2023.10 MatChat: A Large Language Model and Application Service Platform for Materials Science. Chinese Physics B
  • 2024.01 ChemDFM: Dialogue Foundation Model for Chemistry. arXiv
  • 2024.01 Structured information extraction from scientific text with large language models. Nature Communication
  • 2024.02 Leveraging large language models for predictive chemistry. Nature Machine Intelligence
  • 2024.03 SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning. arXiv
  • 2024.03 Domain-Agnostic Molecular Generation with Chemical Feedback. ICLR2024
  • 2024.04 ChemLLM: A Chemical Large Language Model. arXiv
  • 2024.04 BatGPT-Chem: A Foundation Large Model For Chemical Engineering. chemRxiv
  • 2024.04 Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. ICLR2024
  • 2024.04 LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset. arXiv
  • 2024.05 nach0: Multimodal Natural and Chemical Languages Foundation Model. Chemical Science
  • 2024.06 Fine-tuning large language models for chemical text mining. Chemical Science
  • 2024.06 MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction. arXiv
  • 2024.06 SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis. arXiv
  • 2024.06 PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes. arXiv
  • 2024.09 SciDFM: A Large Language Model with Mixture-of-Experts for Science. arXiv

Multi-Modal Chemistry LLM

  • 2023.03 Uni-Mol: A Universal 3D Molecular Representation Learning Framework. ICLR
  • 2023.05 DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs. arXiv
  • 2023.06 MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. EMNLP2023
  • 2023.06 MolFM: A Multimodal Molecular Foundation Model. arXiv
  • 2023.08 BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine. arXiv
  • 2023.09 3D-MOLM: TOWARDS 3D MOLECULE-TEXT INTERPRETATION IN LANGUAGE MODELS. ICLR2024
  • 2023.11 InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery. arXiv
  • 2023.12 MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction. NIPS Workshop
  • 2024.01 MolTC: Towards Molecular Relational Modeling In Language Models ACL2024
  • 2024.01 ReactXT: Understanding Molecular “Reaction-ship” viaReaction-Contextualized Molecule-Text Pretraining. ACL2024
  • 2024.03 GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text. arXiv
  • 2024.06 HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment. arXiv
  • 2024.06 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization. arXiv
  • 2024.06 MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension. arXiv
  • 2024.07 MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics
  • 2024.08 UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation. arXiv
  • 2024.08 ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area. arXiv
  • 2024.09 ChemDFM-X: Towards Large Multimodal Model for Chemistry. arXiv

LLM as A Chemistry Agent

  • 2023.09 Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design. ACS Engineering Au
  • 2023.10 Large language models for chemistry robotics. Autonomous Robots
  • 2023.10 Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design. EMNLP2023
  • 2023.11 Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis. arXiv
  • 2023.12 Autonomous chemical research with large language models. Nature
  • 2024.01 Structured Chemistry Reasoning with Large Language Models. ICML2024
  • 2024.01 ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback. ICML2024
  • 2024.02 An Autonomous Large Language Model Agent for Chemical Literature Data Mining. arXiv
  • 2024.03 From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery. AAAI2024
  • 2024.03 DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs. arXiv
  • 2024.04 Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering. arXiv
  • 2024.04 Large Language Models are In-Context Molecule Learners. arXiv
  • 2024.04 A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions. arXiv
  • 2024.04 Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists. ChemRxiv
  • 2024.05 Augmenting large language models with chemistry tools. Nature Machine Intelligence
  • 2024.05 ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nature Communications
  • 2024.06 LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation. arXiv

LLM Chemistry Benchmark

  • 2017.09 Crowdsourcing multiple choice science questions. ACL Workshop
  • 2020.09 ChemistryQA: A Complex Question Answering Dataset from Chemistry. OpenReview
  • 2023.01 Assessment of chemistry knowledge in large language models that generate code. Digital Discovery
  • 2023.03 Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. Journal of Chemical Information and Modeling
  • 2023.06 Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective. TKDE
  • 2023.07 Can Large Language Models Empower Molecular Property Prediction? arXiv
  • 2023.10 ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction. arXiv
  • 2023.10 GPT-MolBERTa: GPT Molecular Features Language Model for molecular property. arXiv
  • 2023.12 What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks. NeurIPS2023
  • 2023.12 SciMT-Safety: Control Risk for Potential Misuse of Artificial Intelligence in Science. arXiv
  • 2024.01 SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. AAAI2024
  • 2024.01 SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis. arXiv
  • 2024.02 Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science. arXiv
  • 2024.02 Building a Dataset for Language+Molecules. arXiv
  • 2024.03 Benchmarking Large Language Models for Molecule Prediction Tasks. arXiv
  • 2024.03 MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension. arXiv
  • 2024.03 Benchmarking Large Language Models for Molecule Prediction Tasks. arXiv
  • 2024.02 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. arXiv
  • 2024.04 Are large language models superhuman chemists? arXiv
  • 2024.06 SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models. arXiv
  • 2024.07 ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering. arXiv
  • 2024.09 VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning. arXiv
  • 2024.09 ChemEval: A Comprehensive Multi-Level Chemical Evalution for Large Language Models. arXiv

Related Works

  • 2023.04 A Systematic Survey of Chemical Pre-trained Models. IJCAI2023
  • 2023.09 Large Language Models in Molecular Discovery. NIPS2023 Workshop
  • 2024.01 Scientific Large Language Models: A Survey on Biological & Chemical Domains. arXiv
  • 2024.01 From Words to Molecules: A Survey of Large Language Models in Chemistry. IJCAI2024
  • 2024.03 Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule. arXiv
  • 2024.03 Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey. arXiv
  • 2024.06 A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery. arXiv
  • 2024.07 A Review of Large Language Models and Autonomous Agents in Chemistry. arXiv