v0.1.2
This release encompasses several enhancements to usability and code refactoring. The primary changes include:
-
Automatic Model Downloader: In our previous implementation, the
model_path
attribute inmodel_definitions.py
required an actual filename of a model. We have now upgraded this to accept the name of a HuggingFace repository instead. As a result, the specified model is automatically downloaded when needed. For instance, if you defineTheBloke/NewHope-GPTQ
as themodel_path
, the necessary files will be downloaded intomodels/gptq/thebloke_newhope_gptq
. This functionality works similarly for GGML. -
Simpler Log Message: We've made our log messages more concise when using Completions, Chat Completions, or Embeddings endpoints. These logs will now fundamentally display elapsed time, token usage, and token generations per second.
-
Improved Responsiveness for Job Cancellation: The
Event
object inSyncManager
now sends an interrupt signal to worker processes. It checks theis_interrupted
property at the most low-level accessible area and tries to cancel the operation.