Skip to content

Latest commit

 

History

History
72 lines (66 loc) · 21.1 KB

README.md

File metadata and controls

72 lines (66 loc) · 21.1 KB

LLM-Survey

The Large Language Models Survey repository is a comprehensive compendium dedicated to the exploration and understanding of Large Language Models (LLMs). It houses an assortment of resources including research papers, blog posts, tutorials, code examples, and more to provide an in-depth look at the progression, methodologies, and applications of LLMs. This repo is an invaluable resource for AI researchers, data scientists, or enthusiasts interested in the advancements and inner workings of LLMs. We encourage contributions from the wider community to promote collaborative learning and continue pushing the boundaries of LLM research.

Timeline of LLMs

evolutionv1 1

List of LLMs

Language Model Release Date Checkpoints Paper/Blog Params (B) Context Length Licence Try it
T5 2019/10 T5 & Flan-T5, Flan-T5-xxl (HF) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 0.06 - 11 512 Apache 2.0 T5-Large
UL2 2022/10 UL2 & Flan-UL2, Flan-UL2 (HF) UL2 20B: An Open Source Unified Language Learner 20 512, 2048 Apache 2.0
Cohere 2022/06 Checkpoint  Code 54 4096 Model Website
Cerebras-GPT 2023/03 Cerebras-GPT Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models (Paper) 0.111 - 13 2048 Apache 2.0 Cerebras-GPT-1.3B
Open Assistant (Pythia family) 2023/03 OA-Pythia-12B-SFT-8, OA-Pythia-12B-SFT-4, OA-Pythia-12B-SFT-1 Democratizing Large Language Model Alignment 12 2048 Apache 2.0 Pythia-2.8B
Pythia 2023/04 pythia 70M - 12B Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling 0.07 - 12 2048 Apache 2.0
Dolly 2023/04 dolly-v2-12b Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM 3, 7, 12 2048 MIT
DLite 2023/05 dlite-v2-1_5b Announcing DLite V2: Lightweight, Open LLMs That Can Run Anywhere 0.124 - 1.5 1024 Apache 2.0 DLite-v2-1.5B
RWKV 2021/08 RWKV, ChatRWKV The RWKV Language Model (and my LM tricks) 0.1 - 14 infinity (RNN) Apache 2.0
GPT-J-6B 2023/06 GPT-J-6B, GPT4All-J GPT-J-6B: 6B JAX-Based Transformer 6 2048 Apache 2.0
GPT-NeoX-20B 2022/04 GPT-NEOX-20B GPT-NeoX-20B: An Open-Source Autoregressive Language Model 20 2048 Apache 2.0
Bloom 2022/11 Bloom BLOOM: A 176B-Parameter Open-Access Multilingual Language Model 176 2048 OpenRAIL-M v1
StableLM-Alpha 2023/04 StableLM-Alpha Stability AI Launches the First of its StableLM Suite of Language Models 3 - 65 4096 CC BY-SA-4.0
FastChat-T5 2023/04 fastchat-t5-3b-v1.0 We are excited to release FastChat-T5: our compact and commercial-friendly chatbot! 3 512 Apache 2.0
h2oGPT 2023/05 h2oGPT Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey 12 - 20 256 - 2048 Apache 2.0
MPT-7B 2023/05 MPT-7B, MPT-7B-Instruct Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs 7 84k (ALiBi) Apache 2.0, CC BY-SA-3.0
PanGU-Σ 2023/3 PanGU Model 1085 - Model Page
RedPajama-INCITE 2023/05 RedPajama-INCITE Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models 3 - 7 2048 Apache 2.0 RedPajama-INCITE-Instruct-3B-v1
OpenLLaMA 2023/05 open_llama_3b, open_llama_7b, open_llama_13b OpenLLaMA: An Open Reproduction of LLaMA 3, 7 2048 Apache 2.0 OpenLLaMA-7B-Preview_200bt
Falcon 2023/05 Falcon-180B, Falcon-40B, Falcon-7B The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only 180, 40, 7 2048 Apache 2.0
MPT-30B 2023/06 MPT-30B, MPT-30B-instruct MPT-30B: Raising the bar for open-source foundation models 30 8192 Apache 2.0, CC BY-SA-3.0 MPT 30B inference code using CPU
LLaMA 2 2023/06 LLaMA 2 Weights  Llama 2: Open Foundation and Fine-Tuned Chat Models 7 - 70 4096 Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivatives HuggingChat
OpenLM 2023/09 OpenLM 1B, OpenLM 7B  Open LM: a minimal but performative language modeling (LM) repository 1, 7 2048 MIT
Mistral 7B 2023/09 Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1 Mistral 7B 7 4096-16K with Sliding Windows Apache 2.0 Mistral Transformer
OpenHermes 2023/09 OpenHermes-7B, OpenHermes-13B Nous Research 7, 13 4096 MIT OpenHermes-V2 Finetuned on Mistral 7B
SOLAR 2023/12 Solar-10.7B Upstage 10.7 4096 apache-2.0
phi-2 2023/12 phi-2 2.7B Microsoft 2.7 2048 MIT
SantaCoder 2023/01 santacoder SantaCoder: don't reach for the stars! 1.1 2048 OpenRAIL-M v1 SantaCoder
StarCoder 2023/05 starcoder StarCoder: A State-of-the-Art LLM for Code, StarCoder: May the source be with you! 1.1-15 8192 OpenRAIL-M v1
StarChat Alpha 2023/05 starchat-alpha Creating a Coding Assistant with StarCoder 16 8192 OpenRAIL-M v1
Replit Code 2023/05 replit-code-v1-3b Training a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit 2.7 infinity? (ALiBi) CC BY-SA-4.0 Replit-Code-v1-3B
CodeGen2 2023/04 codegen2 1B-16B CodeGen2: Lessons for Training LLMs on Programming and Natural Languages 1 - 16 2048 Apache 2.0
CodeT5+ 2023/05 CodeT5+ CodeT5+: Open Code Large Language Models for Code Understanding and Generation 0.22 - 16 512 BSD-3-Clause Codet5+-6B
XGen-7B 2023/06 XGen-7B-8K-Base Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length 7 8192 Apache 2.0
CodeGen2.5 2023/07 CodeGen2.5-7B-multi CodeGen2.5: Small, but mighty 7 2048 Apache 2.0
DeciCoder-1B 2023/08 DeciCoder-1B Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation 1.1 2048 Apache 2.0 DeciCoder Demo
Code Llama 2023 Inference Code for CodeLlama models  Code Llama: Open Foundation Models for Code 7 - 34 4096 Model HuggingChat
Sparrow 2022/09 Inference Code  Code 70 4096 Model Webpage
Mistral 2023/09 Inference Code  Code 7 8000 Model Webpage
Koala 2023/04 Inference Code  Code 13 4096 Model Webpage
PaLM 2 2024 N/A Google AI 540 N/A N/A N/A
Tongyi Qianwen 2024 N/A Alibaba Cloud N/A N/A N/A N/A
Cohere Command 2024 N/A Cohere 6 - 52 N/A N/A N/A
Vicuna 33B 2024 N/A Meta AI 33 N/A N/A N/A
Guanaco-65B 2024 N/A Meta AI 65 N/A N/A N/A
Amazon Q 2024 N/A AWS N/A N/A N/A N/A
Falcon 180B 2024 Falcon-180B Technology Innovation Institute 180 N/A Apache 2.0 N/A
YI 34B 2024 N/A 01 AI 34 Up to 32K N/A N/A
Mixtral 8x7B 2023 Mixtral 8X 7B Mistral AI 46.7 (12.9 per token) N/A Apache 2.0 N/A

If you find our survey useful for your research, please cite the following paper:

@article{hadi2023survey,
  title={A survey on large language models: Applications, challenges, limitations, and practical usage},
  author={Hadi, Muhammad Usman and Qureshi, Rizwan and Shah, Abbas and Irfan, Muhammad and Zafar, Anas and Shaikh, Muhammad Bilal and Akhtar, Naveed and Wu, Jia and Mirjalili, Seyedali and others},
  journal={Authorea Preprints},
  year={2023},
  publisher={Authorea}
}