GitHub

To run:

python3 scripts/download_models.py -m 370m --bits 32 -md models/370m_32bit.bin
make fast
./build/mamba models/model.bin -n 20 -i "Customer Support should" -t 0.0

Command line arguments will be used to control inference, for example, quantization level, debugging verbosity, input prompt.

You can use the download models shell script to download the useful configurations for testing, including tokenizers.

TODO

Model configuration will be done through model_config.yaml, for example, temperature (text diversity), generated text amount, batch size. There may be multiple selectable configurations, these are selected through the command line arguments.

TODO

Helpful references:

Models

Jamba

Mamba Variants

Model Configuration

https://ivibudh.medium.com/a-guide-to-controlling-llm-model-output-exploring-top-k-top-p-and-temperature-parameters-ed6a31313910

Implementations:

Implementation of some optimization techniques

https://github.com/MDK8888/GPTFast/tree/master

Mamba LLM

https://github.com/redotvideo/mamba-chat

Quantization:

state-spaces/mamba#133 (only quantize nn.linear)

https://huggingface.co/docs/transformers/v4.33.0/en/main_classes/quantization

https://leimao.github.io/article/Neural-Networks-Quantization/

Fast matrix mult:

https://coffeebeforearch.github.io/2020/06/23/mmul.html

https://justine.lol/matmul/

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src		src
.clang-format		.clang-format
.clangd		.clangd
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
download_models.sh		download_models.sh
fast_ssm.sh		fast_ssm.sh
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TODO

TODO

Helpful references:

Models

Model Configuration

Implementations:

Using ReLu instead of SiLu (mamba's default):

Flash memory:

Speculative Streaming:

Speculative Decoding:

1 bit model variant:

Quantization:

Fast matrix mult:

About

Uh oh!

Releases

Packages

Languages

pgosar/mamba.cpp

Folders and files

Latest commit

History

Repository files navigation

TODO

TODO

Helpful references:

Models

Model Configuration

Implementations:

Using ReLu instead of SiLu (mamba's default):

Flash memory:

Speculative Streaming:

Speculative Decoding:

1 bit model variant:

Quantization:

Fast matrix mult:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages