aitg ("ai text generator")
this project allows you to easily run many transformers models locally on the command line or as a http server
don't care about the details and just want to use GPT-2/GPT-3 super fast? this section is for you
grab model (gpt3-125m
, 171 MB):
wget https://github.com/xdrie/aitextgen_host/releases/download/v1.5.2/PT_GPTNEO125_ATG.7z -O /tmp/PT_GPTNEO125_ATG.7z
7z x /tmp/PT_GPTNEO125_ATG.7z -o/tmp
run the container, cli:
docker run -it --rm -v /tmp/PT_GPTNEO125_ATG:/app/model xdrie/aitg:v1.6.0 aitg.cli
in the command line, press Ctrl+D (or whatever your eof key is) to submit a prompt.
models:
- text generation (gpt)
- sequence to sequence generation (t5)
- code generation (sfcodegen)
- text summarization (bart-cnn, longformer-encoder-decoder)
- text classification (bart-mnli, minilm2-nli)
- text embedding (mpnet-paraphrase)
- question answering (minilm-squad2, roberta-squad2)
deployment:
- all of these models are available via http rest api
- optimized model execution with huggingface transformers
- gpu acceleration support
first, you need to install all the dependencies to run the gpt2 host (using the aitextgen
library with the pytorch
backend)
enter src/
poetry install
next, you need a pytorch model.
if you don't have your own yet, you can use a sample model:
point your host to a model and run; usually this is a model directory with config.json
and pytorch_model.bin
. to use huggingface, use @
like this: @EleutherAI/gpt-neo-2.7B
cli:
MODEL=/path/to/your_model poetry run aitg_cli
once it's done loading, it will ask you for a prompt. when you're done typing the prompt, press Ctrl+D (sometimes twice) to send an EOF after entering your prompt, and then the model will generate text.
Usage: aitg_cli [OPTIONS]
Options:
--temp FLOAT [default: 0.9]
--max-length INTEGER [default: 256]
--min-length INTEGER [default: 0]
--seed INTEGER
--top-p FLOAT [default: 0.9]
--top-k INTEGER [default: 0]
--repetition-penalty FLOAT [default: 1.0]
--length-penalty FLOAT [default: 1.0]
--no-repeat-ngram-size INTEGER [default: 0]
--optimize / --no-optimize [default: True]
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
run the server with:
MODEL=/path/to/your_model poetry run aitg_srv gpt
then
GET /gen_gpt.json
with a JSON request body like the following:
{
"key": "secret",
"prompt": "The quick brown",
"temp": 0.9,
"max_length": 256,
"min_length": 0,
"seed": null,
"top_p": 0.9,
"top_k": 0,
"repetition_penalty": 1.0,
"length_penalty": 1.0,
"no_repeat_ngram_size": 0,
}
that's the short version. see the server docs for more details.
obviously, gpt-2/3 get the spotlight and were the original main focus of aitg. but now an array of other models and model types are supported.
more documentation on other models coming soon.
see instructions for running in Docker.
model:
podman run -it --rm -v $(pwd):/stf docker.io/xdrie/aitg:v3.6.0 aitg.model 'AutoModelForCausalLM' '@bigscience/bloomz-560m' /stf/PT_BLOOMZ_560M
server:
podman run -it --rm -v ~/Downloads/PT_BLOOMZ_560M:/app/model -p 24401:6000 xdrie/aitg:v3.6.0 aitg.srv bloom
query:
(export M='Explain in 1 sentence what is backpropagation in neural networks. Explanation:' && printf '{ "prompt": "%s", "answer_only": true }' $M | http POST 'http://localhost:24401/gen_bloom.json') | jq -r '.texts[0]'