Official Python CPU inference for GPT4All language models based on llama.cpp and ggml
pip install pygpt4all
You will need first to download the model weights, you can find and download all the supported models from here.
Once the weights are downloaded, you can instantiate the models as follows:
- GPT4All model
from pygpt4all import GPT4All
model = GPT4All('path/to/ggml-gpt4all-l13b-snoozy.bin')
- GPT4All-J model
from pygpt4all import GPT4All_J
model = GPT4All_J('path/to/ggml-gpt4all-j-v1.3-groovy.bin')
The generate
function is used to generate new tokens from the prompt
given as input:
for token in model.generate("Tell me a joke ?\n"):
print(token, end='', flush=True)
You can set up an interactive dialogue by simply keeping the model
variable alive:
while True:
try:
prompt = input("You: ")
if prompt == '':
continue
print(f"AI:", end='')
for token in model.generate(prompt):
print(f"{token}", end='', flush=True)
print()
except KeyboardInterrupt:
break
You can check the API reference documentation for more details.
This project is licensed under the MIT License.