Quantized LLaMas 🎛️ 🦙🦙🦙 🎛️ (Classifier edition)

Wield the impressive ability to accurately spit up to 10 feet away

Quantized LlaMas is a tool that allows you to interact with the LLaMa language models.
It provides a user-friendly interface for generating responses based on system prompts, user inputs, and assistant guides.
The application has a UI built with Gradio.

Dependencies

To run the project, first make sure you have installed:

Python 3.6 or newer
llama-cpp-python (follow installation instructions on https://github.com/abetlen/llama-cpp-python)
gradio
If you are on a Mac, the following additional dependencies are required:
- Rust: brew install rust
- Xcode: xcode-select --install
GGML models used: https://huggingface.co/TheBloke (Tom Jobbins the GOAT)

⚠️ Disclaimer ⚠️

You are responsible for anything you do with the models, just as you are responsible for anything you do with any dangerous object such as a knife, gun, lighter, or car.

Publishing anything the models generates is the same as publishing it yourself.

You are responsible for the content you publish, and you cannot blame the model any more than you can blame the knife, gun, lighter, or car for what you do with it.

To run the app, execute the command python main.py to initiate the Gradio web application.
This action will generate URLs resembling the following examples:

Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://50c0be910a804096fe.gradio.live

To launch the app, open the local URL in your web browser,
You may also use the public URL for devices outside your network.

Features

Token Counter

The Token Counter displays the count of tokens used in the conversation.
It provides information about the number of tokens used by the system, user, assistant guide, and assistant.

Settings

Model: Choose the LLaMa model to use from the available options.
Temperature: Adjust the temperature parameter for controlling the randomness of the generated responses.
Max Tokens: Set the maximum number of tokens in the generated response.
Stop Sequence: Specify a stop sequence to stop the response generation.

Inference

In the Inference section, you can provide inputs for the system, user, and assistant guide.
This will generate a response based on these inputs and display it in the Assistant section.

Refresh Token Count

The Refresh Token Count button updates the token count based on the inputs provided in the System, User, and Assistant Guide fields.

Models

Tested with the following GGML models:

Llama-2-13b
Llama-2-13b-chat
Wizard-Vicuna-30B-Uncensored
Llama-30b
Guanaco-33B
Alpaca-Lora-30B
Lazarus-30B

Run

The Run button generates a response based on the provided inputs in the System, User, and Assistant Guide fields.
The generated response will be displayed in the Assistant section.

UI

The app should resemble this design, with slight modifications to the UI in subsequent versions.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
llamaQ		llamaQ
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantized LLaMas 🎛️ 🦙🦙🦙 🎛️ (Classifier edition)

Wield the impressive ability to accurately spit up to 10 feet away

Dependencies

⚠️ Disclaimer ⚠️

Features

Token Counter

Settings

Inference

Refresh Token Count

Models

Run

UI

About

Releases

Packages

Languages

License

RoySadaka/QuantizedLlaMas

Folders and files

Latest commit

History

Repository files navigation

Quantized LLaMas 🎛️ 🦙🦙🦙 🎛️ (Classifier edition)

Wield the impressive ability to accurately spit up to 10 feet away

Dependencies

⚠️ Disclaimer ⚠️

Features

Token Counter

Settings

Inference

Refresh Token Count

Models

Run

UI

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages