Category | component | owner | Closed source or property | OSS License | Commercial use | modle size(B) | release date | code/paper | Star | Description |
Multi-Model | ImageBind | Meta | License | No | Github | 5.9k | ImageBind One Embedding Space to Bind Them All | |||
Image | DeepFloyd IF | stability.ai | License Model license |
Github | 6.4k | text-to-image model with a high degree of photorealism and language understanding | ||||
Image | Stable Diffusion Version 2 | stability.ai | MIT, unknown | Github | 23.5k | High-Resolution Image Synthesis with Latent Diffusion Models | ||||
Image | DALL-E | OpenAI | Modified MIT | Yes | Github | 10.3k | PyTorch package for the discrete VAE used for DALL·E. | |||
Image | DALL·E 2 | OpenAI | Yes | product | ||||||
Image | DALLE2-pytorch | lucidrains | MIT | Yes | Github | 9.7k | Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch | |||
Speech | Whisper | OpenAI | MIT | Yes | Github | 37.7k | Robust Speech Recognition via Large-Scale Weak Supervision | |||
Speech | MMS | Meta | Yes | paper | ||||||
Code model | Codex | OpenAI | Yes | 12 | 2021/7/1 | blog | Paper | |||
Code model | AlphaCode | 41 | Feb 2022 | Competition-Level Code Generation with AlphaCode | ||||||
Code model | starcoder | BigCode | No | Apache | 15 | May 2023 | Github | 4.8k | language model (LM) trained on source code and natural language text | |
Code model | CodeGen | Salesforce | No | ? | Github | 3.6k | model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. | |||
Code model | Replit Code | replit | 3 | May 2023 | replit-code-v1-3b model is a 2.7B LLM trained on 20 languages from the Stack Dedup v1.2 dataset. | |||||
Code model | CodeGen2 | Salesforce | BSD | Yes | 1, 3, 7, 16 | May 2023 | Github | Code models for program synthesis. | ||
Code model | CodeT5 and CodeT5+ | Salesforce | BSD | Yes | 16 | May 2023 | CodeT5 | CodeT5 and CodeT5+ models for Code Understanding and Generation from Salesforce Research. | ||
language model | GPT | June 2018 | GPT | Improving Language Understanding by Generative Pre-Training | ||||||
language model | BERT | Oct 2018 | BERT | Bidirectional Encoder Representations from Transformers | ||||||
language model | RoBERTa | 0.125 - 0.355 | July 2019 | RoBERTa | A Robustly Optimized BERT Pretraining Approach | |||||
language model | GPT-2 | 1.5 | Nov 2019 | GPT-2 | Language Models are Unsupervised Multitask Learners | |||||
language model | T5 | 0.06 - 11 | Oct 2019 | Flan-T5 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | |||||
language model | XLNet | Jun 2019 | XLNet | Generalized Autoregressive Pretraining for Language Understanding and Generation | ||||||
language model | ALBERT | 0.235 | Sep 2019 | ALBERT | A Lite BERT for Self-supervised Learning of Language Representations | |||||
language model | CTRL | 1.63 | Sep 2019 | CTRL | CTRL: A Conditional Transformer Language Model for Controllable Generation | |||||
language model | GPT 3 | Azure | Yes | 175 | May 2020 | Paper | Language Models are Few-Shot Learners | |||
language model | GShard | 600 | Jun 2020 | Paper | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | |||||
language model | BART | Jul 2020 | BART | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | ||||||
language model | mT5 | 13 | Oct 2020 | mT5 | mT5: A massively multilingual pre-trained text-to-text transformer | |||||
language model | PanGu-α | 13 | April 2021 | PanGu-α | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation | |||||
language model | CPM-2 | 198 | Jun 2021 | CPM | CPM-2: Large-scale Cost-effective Pre-trained Language Models | |||||
language model | GPT-J 6B | EleutherAI | No | Yes | 6 | June 2021 | GPT-J-6B | A 6 billion parameter, autoregressive text generation model trained on The Pile. | ||
language model | ERNIE 3.0 | Baidu | Yes | 10 | July 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | ||||
language model | Jurassic-1 | 178 | Aug 2021 | Jurassic-1: Technical Details and Evaluation | ||||||
language model | ERNIE 3.0 Titan | 10 | July 2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | ||||||
language model | HyperCLOVA | 82 | Sep 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers | ||||||
language model | FLAN | 137 | 2021/10/1 | Paper | Finetuned Language Models Are Zero-Shot Learners | |||||
language model | GPT 3.5 | Azure | Yes | |||||||
language model | GPT 4 | Azure | Yes | 2023/3/1 | ||||||
language model | ERNIE 3.0 | Baidu | Yes | 10 | 2021/7/1 | Paper | ||||
language model | Jurassic-1 | 178 | 2021/8/1 | Paper | ||||||
language model | T0 | 11 | Oct 2021 | T0 | Multitask Prompted Training Enables Zero-Shot Task Generalization | |||||
language model | Yuan 1.0 | 245 | Oct 2021 | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning | ||||||
language model | WebGPT | 175 | Dec 2021 | WebGPT: Browser-assisted question-answering with human feedback | ||||||
language model | Gopher | 280 | Dec 2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ||||||
language model | GLaM | 1200 | Dec 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | ||||||
language model | LaMDA | Bard | Yes | 137 | Jan 2022 | Paper | LaMDA: Language Models for Dialog Applications | |||
language model | MT-NLG | 530 | Jan 2022 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model | ||||||
language model | InstructGPT | 175 | Mar 2022 | Training language models to follow instructions with human feedback | ||||||
language model | Chinchilla | 70 | Mar 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. | ||||||
language model | GPT-NeoX-20B | 20 | April 2022 | GPT-NeoX-20B | GPT-NeoX-20B: An Open-Source Autoregressive Language Model | |||||
language model | Tk-Instruct | 11 | April 2022 | Tk-Instruct-11B | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | |||||
language model | PALM | Yes | 540 | April 2022 | PaLM: Scaling Language Modeling with Pathways | |||||
language model | OPT | Meta | No | Yes | 175 | May 2022 | OPT-13B, OPT-66B ,Paper | OPT: Open Pre-trained Transformer Language Models | ||
language model | OPT-IML | 30, 175 | Dec 2022 | OPT-IML | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | |||||
language model | GLM-130B | 130 | Oct 2022 | GLM-130B | GLM-130B: An Open Bilingual Pre-trained Model | |||||
language model | AlexaTM | 20 | Aug 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | ||||||
language model | Flan-T5 | 11 | Oct 2022 | Flan-T5-xxl | Scaling Instruction-Finetuned Language Models | |||||
language model | Sparrow | 70 | Sep 2022 | Improving alignment of dialogue agents via targeted human judgements | ||||||
language model | UL2 | 20 | Oct 2022 | UL2, Flan-UL2 | UL2: Unifying Language Learning Paradigms | |||||
language model | U-PaLM | 540 | Oct 2022 | Transcending Scaling Laws with 0.1% Extra Compute | ||||||
language model | BLOOM | BigScience | Bo | Yes | 176 | Nov 2022 | BLOOM ,Paper | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | ||
language model | mT0 | 13 | Nov 2022 | mT0-xxl | Crosslingual Generalization through Multitask Finetuning | |||||
language model | Galactica | 0.125 - 120 | Nov 2022 | Galactica | Galactica: A Large Language Model for Science | |||||
language model | ChatGPT | Nov 2022 | A model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. | |||||||
language model | LLama | Meta | No | No | 7, 13, 33, 65 | 2023/2/1 | Paper,LLaMA | LLaMA: Open and Efficient Foundation Language Models | ||
language model | GPT-4 | March 2023 | ||||||||
language model | PanGU-Σ | Yes | 1085 | 2023/3/1 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | |||||
language model | BloombergGPT | 50 | March 2023 | BloombergGPT: A Large Language Model for Finance | ||||||
language model | Cerebras-GPT | Cerebras | No | Yes | 0.111 - 13 | 2023/3 | hf | Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster | ||
language model | oasst-sft-1-pythia-12b | LAION-AI | No | Yes | 12 | 2023/3 | HF | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. | ||
language model | Pythia | Eleuthera AI | No | Yes | 0.070 - 12 | 2023/3 | Pythia, Paper |
A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. | ||
language model | StableLM | No | No | 3, 7 | April 2023 | Github | Stability AI's StableLM series of language models | |||
language model | Dolly 2.0 | DataBricks | No | Yes | 3, 7, 12 | 2023/4 | Dolly | An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use. | ||
language model | DLite | 0.124 - 1.5 | 2023/5 | HF | Lightweight instruction following models which exhibit ChatGPT-like interactivity. | |||||
language model | MPT-7B | MosaicML | No | Apache 2 | Yes | 7 | 2023/5/5 | blog | a GPT-style model, and the first in the MosaicML Foundation Series of models. | |
h2oGPT | 12 | 2023/5 | HF | h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. | ||||||
language model | LIMA | 65 | 2023/5 | A 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. | ||||||
language model | RedPajama-INCITE | 3, 7 | 2023/5 | HF | A family of models including base, instruction-tuned & chat models. | |||||
language model | Gorilla | 7 | 2023/5 | Gorilla | Gorilla: Large Language Model Connected with Massive APIs | |||||
language model | Med-PaLM 2 | 2023/5 | Towards Expert-Level Medical Question Answering with Large Language Models | |||||||
PaLM 2 | 2023/5 | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. | ||||||||
language model | Falcon LLM | 7, 40 | 2023/5 | 7B, 40B | foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. | |||||
language model | Claude | Anthropic | Yes | |||||||
language model | GPT-Neo | Eleuthera AI | No | Yes | ||||||
language model | GPT-Neox | Eleuthera AI | No | Yes | 20 | 2022/2/1 | Paper | |||
language model | FastChat-T5-3B | LMSYS | No | Apache | Yes | 2023/4/ | ||||
language model | OpenLLama | openlm-research | No | Yes | ||||||
language model | OpenChatKit | Together | No | Yes | ||||||
language model | YaLM | Yandex | No | Yes | 100 | 2022/6/1 | Github | |||
ChatGLM-6B | TsingHua | No | ChatGLM-6B | No | 6 | 2023/3/1 | Github | |||
language model | Alpaca | Stanford | No | No | ||||||
language model | Vicuna | No | No | 13 | 2023/3/1 | Blog | ||||
language model | StableVicuna | No | No | |||||||
language model | RWKV-4-Raven-7B | BlinkDL | No | No | ||||||
language model | Alpaca-LoRA | tloen | No | No | ||||||
language model | Koala | BAIR | No | No | 13 | 2023/4/1 | Blog |