Skip to content

Releases: friendliai/friendli-client

Release v1.5.7 🚀

24 Jan 03:10
Compare
Choose a tag to compare
  • Update API proto to support custom positional embeddings
  • Update endpoint path

Release v1.5.6 🚀

17 Oct 04:57
Compare
Choose a tag to compare
  • Update package dependencies with security warnings.
  • Hotfix on friendli endpoint list command.
  • Update styling for AWAKING and UPDATING endpoint status
  • Fix descriptions of friendli api command: --model option is required for inference call.

Release v1.5.5 🚀

15 Oct 07:47
Compare
Choose a tag to compare
  • gRPC client support using the soft prompt.
  • Resolve dependency issues.
  • Update GraphQL schemas to handle resources of Friendli Dedicated Endpoints.

Release v1.5.4 🚀

30 Aug 07:37
Compare
Choose a tag to compare
  • text-to-image API is removed from the CLI command.
  • Support cancelling gRPC stream.

Release v1.5.3 🚀

23 Aug 07:03
Compare
Choose a tag to compare
  • Support stream close and context manager.
  • API E2E tests are added.

Release v1.5.2 🚀

14 Aug 07:52
522100f
Compare
Choose a tag to compare
  • Hotfix: Automatically close streaming response at the end of the stream.

Release v1.5.1 🚀

14 Aug 02:50
Compare
Choose a tag to compare

Now it is available to use API calls to Friendli Dedicated Endpoints.

from friendli import Friendli

client = Friendli(use_dedicated_endpoint=True)
chat = client.chat.completions.create(
    model="{endpoint_id}",
    messages=[
        {
            "role": "user",
            "content": "Give three tips for staying healthy.",
        }
    ]
)

If you want to send a request to a specific adapter of the Multi-LoRA endpoint, provide "{endpoint_id}:{adapter_route}" to model argument:

from friendli import Friendli

client = Friendli(use_dedicated_endpoint=True)
chat = client.chat.completions.create(
   model="{endpoint_id}:{adapter_route}",
   messages=[
       {
           "role": "user",
           "content": "Give three tips for staying healthy.",
       }
   ]
)

Release v1.5.0 🚀

06 Aug 01:42
Compare
Choose a tag to compare
  • Deprecate model conversion and quantization. Alternatively, please use friendli-model-optmizer to quantize your models.
  • Increase default HTTP timeout.

Release v1.4.2 🚀

21 Jul 07:27
Compare
Choose a tag to compare
  • Support for Tool Calling API: Added new API to support tool calling.
  • Phi3 INT8 Support: Implemented support for Phi3 INT8.
  • Snowflake Arctic FP8 Quantizer: Introduced new quantizer for Snowflake Arctic FP8.
  • Added support for INT8 quantization for Llama and refactored quantizer to use only safetensors.

Release v1.4.1 🚀

19 Jun 06:13
Compare
Choose a tag to compare

Updating Patch Version

This patch version Introduces explicit resource management to prevent unexpected resource leaks.
By default, the library closes underlying HTTP and gRPC connections when the client is garbage-collected. However, you can now manually close the Friendli or AsyncFriendli client using the .close() method or utilize a context manager to ensure proper closure when exiting a with block.

Usage examples

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

async def run():
    async with client:
        stream = await client.completions.create(
            prompt="Explain what gRPC is. Also give me a Python code snippet of gRPC client.",
            stream=True,
            top_k=1,
        )

        async for chunk in stream:
            print(chunk.text, end="", flush=True)

asyncio.run(run())