24 Jan 03:10

kooyunmo

96e3c6d

Release v1.5.7 🚀 Latest

Latest

Update API proto to support custom positional embeddings
Update endpoint path

Assets 2

17 Oct 04:57

kooyunmo

v1.5.6

57f149d

Release v1.5.6 🚀

Update package dependencies with security warnings.
Hotfix on friendli endpoint list command.
Update styling for AWAKING and UPDATING endpoint status
Fix descriptions of friendli api command: --model option is required for inference call.

Assets 2

15 Oct 07:47

kooyunmo

v1.5.5

74887de

Release v1.5.5 🚀

gRPC client support using the soft prompt.
Resolve dependency issues.
Update GraphQL schemas to handle resources of Friendli Dedicated Endpoints.

Assets 2

30 Aug 07:37

kooyunmo

v1.5.4

e30f351

Release v1.5.4 🚀

text-to-image API is removed from the CLI command.
Support cancelling gRPC stream.

Assets 2

23 Aug 07:03

kooyunmo

v1.5.3

38787d2

Release v1.5.3 🚀

Support stream close and context manager.
API E2E tests are added.

Assets 2

14 Aug 07:52

kooyunmo

v1.5.2

522100f

Release v1.5.2 🚀

Hotfix: Automatically close streaming response at the end of the stream.

Assets 2

14 Aug 02:50

kooyunmo

v1.5.1

db3c500

Release v1.5.1 🚀

Now it is available to use API calls to Friendli Dedicated Endpoints.

from friendli import Friendli

client = Friendli(use_dedicated_endpoint=True)
chat = client.chat.completions.create(
    model="{endpoint_id}",
    messages=[
        {
            "role": "user",
            "content": "Give three tips for staying healthy.",
        }
    ]
)

If you want to send a request to a specific adapter of the Multi-LoRA endpoint, provide "{endpoint_id}:{adapter_route}" to model argument:

from friendli import Friendli

client = Friendli(use_dedicated_endpoint=True)
chat = client.chat.completions.create(
   model="{endpoint_id}:{adapter_route}",
   messages=[
       {
           "role": "user",
           "content": "Give three tips for staying healthy.",
       }
   ]
)

Assets 2

06 Aug 01:42

kooyunmo

v1.5.0

0d8f5f9

Release v1.5.0 🚀

Deprecate model conversion and quantization. Alternatively, please use friendli-model-optmizer to quantize your models.
Increase default HTTP timeout.

Assets 2

21 Jul 07:27

kooyunmo

v1.4.2

25a0d6d

Release v1.4.2 🚀

Support for Tool Calling API: Added new API to support tool calling.
Phi3 INT8 Support: Implemented support for Phi3 INT8.
Snowflake Arctic FP8 Quantizer: Introduced new quantizer for Snowflake Arctic FP8.
Added support for INT8 quantization for Llama and refactored quantizer to use only safetensors.

Assets 2

19 Jun 06:13

kooyunmo

v1.4.1

64ee249

Release v1.4.1 🚀

Updating Patch Version

This patch version Introduces explicit resource management to prevent unexpected resource leaks.
By default, the library closes underlying HTTP and gRPC connections when the client is garbage-collected. However, you can now manually close the Friendli or AsyncFriendli client using the .close() method or utilize a context manager to ensure proper closure when exiting a with block.

Usage examples

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

async def run():
    async with client:
        stream = await client.completions.create(
            prompt="Explain what gRPC is. Also give me a Python code snippet of gRPC client.",
            stream=True,
            top_k=1,
        )

        async for chunk in stream:
            print(chunk.text, end="", flush=True)

asyncio.run(run())

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating Patch Version

Usage examples

Releases: friendliai/friendli-client

Release v1.5.7 🚀

Release v1.5.6 🚀

Release v1.5.5 🚀

Release v1.5.4 🚀

Release v1.5.3 🚀

Release v1.5.2 🚀

Release v1.5.1 🚀

Release v1.5.0 🚀

Release v1.4.2 🚀

Release v1.4.1 🚀

Updating Patch Version

Usage examples