Skip to content

slashml/awesome-small-language-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Awesome Small Language Models

A curated list of awesome resources, tools, and projects related to small language models. This list focuses on modern, efficient language models designed for various applications, from research to production deployment.

Table of Contents

Some famous Small Language Models

  • Alpaca - A fine-tuned version of LLaMA, optimized for instruction following
  • Vicuna - An open-source chatbot trained by fine-tuning LLaMA
  • FLAN-T5 Small - A smaller version of the FLAN-T5 model
  • DistilGPT2 - A distilled version of GPT-2
  • BERT-Mini - A smaller BERT model with 4 layers

Frameworks and Tools

  • Hugging Face Transformers - State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
  • Peft - Parameter-Efficient Fine-Tuning (PEFT) methods
  • Periflow - A framework for deploying large language models
  • bitsandbytes - 8-bit CUDA functions for PyTorch
  • TensorFlow Lite - A set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices
  • ONNX Runtime - Cross-platform, high performance ML inferencing and training accelerator

Fine-tuning Techniques

  • LoRA (Low-Rank Adaptation): Efficient fine-tuning method that significantly reduces the number of trainable parameters
  • QLoRA: Quantized Low-Rank Adaptation for even more efficient fine-tuning
  • P-tuning v2: Prompt tuning method for adapting pre-trained language models
  • Adapter Tuning: Adding small trainable modules to frozen pre-trained models

Fine-tuning Guide

  1. Choose a base model (e.g., FLAN-T5 Small, DistilGPT2)
  2. Prepare your dataset for the specific task
  3. Select a fine-tuning technique (e.g., LoRA, QLoRA)
  4. Use Hugging Face's Transformers and Peft libraries for implementation
  5. Train on your data, monitoring for overfitting
  6. Evaluate the fine-tuned model on a test set
  7. Optimize for inference (quantization, pruning, etc.)

Hardware Requirements

RAM requirements vary based on model size and fine-tuning technique:

  • Small models (e.g., BERT-Mini, DistilGPT2): 4-8 GB RAM
  • Medium models (e.g., FLAN-T5 Small): 8-16 GB RAM
  • Larger models with efficient fine-tuning (e.g., Alpaca with LoRA): 16-32 GB RAM

For training, GPU memory requirements are typically higher. Using techniques like LoRA or QLoRA can significantly reduce memory needs.

Inference Optimization

  • Quantization: Reducing model precision (e.g., INT8, FP16)
  • Pruning: Removing unnecessary weights
  • Knowledge Distillation: Training a smaller model to mimic a larger one
  • Caching: Storing intermediate results for faster inference
  • Frameworks for optimization:

Applications and Use Cases

  • On-device natural language processing
  • Chatbots and conversational AI
  • Text summarization and generation
  • Sentiment analysis
  • Named Entity Recognition (NER)
  • Question Answering systems

Research Papers and Articles

Tutorials and Guides

Community Projects

  • [Add your awesome community projects here!]

Contributing

Your contributions are always welcome! Please read the contribution guidelines first.

License

This awesome list is under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published