Release v0.1.2 · c0sogi/llama-api

This release encompasses several enhancements to usability and code refactoring. The primary changes include:

Automatic Model Downloader: In our previous implementation, the model_path attribute in model_definitions.py required an actual filename of a model. We have now upgraded this to accept the name of a HuggingFace repository instead. As a result, the specified model is automatically downloaded when needed. For instance, if you define TheBloke/NewHope-GPTQ as the model_path, the necessary files will be downloaded into models/gptq/thebloke_newhope_gptq. This functionality works similarly for GGML.
Simpler Log Message: We've made our log messages more concise when using Completions, Chat Completions, or Embeddings endpoints. These logs will now fundamentally display elapsed time, token usage, and token generations per second.
Improved Responsiveness for Job Cancellation: The Event object in SyncManager now sends an interrupt signal to worker processes. It checks the is_interrupted property at the most low-level accessible area and tries to cancel the operation.

Provide feedback