An easy-to-use, high-performance(?) backend for serving LLMs and other AI models, built on FastAPI.
pip install fastmindapi
# in Shell
fastmindapi-server --port 8000
# in Python
import fastmindapi as FM
server = FM.Server()
server.run()
curl http://IP:PORT/docs#/
import fastmindapi as FM
client = FM.Client(IP="x.x.x.x", PORT=xxx) # 127.0.0.1:8000 for default
client.add_model_info_list(model_info_list)
client.load_model(model_name)
client.generate(model_name, generation_request)
πͺ§ We primarily maintain the backend server; the client is provided for reference only. The main usage is through sending HTTP requests. (We might release FM-GUI in the future.)
-
β Transformers
TransformersCausalLM
(AutoModelForCausalLM
)PeftCausalLM
(PeftModelForCausalLM
)
-
β llama.cpp
LlamacppLM
(Llama
)
-
...
- Function Calling (extra tools in Python)
- Retrieval
- Agent
- ...
- Load the model when coding / runtime
- Add any APIs you want