Skip to content

Commit a23de47

Browse files
authored
docs: update apple silicon docs (#436)
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
1 parent bc55b53 commit a23de47

File tree

3 files changed

+47
-4
lines changed

3 files changed

+47
-4
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ If it doesn't include a specific model, you can always [create your own images](
119119
### Apple Silicon (experimental)
120120

121121
> [!NOTE]
122-
> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu).
122+
> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu). For more information, please see [GPU Acceleration](https://sozercan.github.io/aikit/docs/gpu).
123123
>
124124
> Apple Silicon is an _experimental_ runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.
125125
>

website/docs/create-images.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,10 +76,16 @@ The `model` build argument is the model URL to download and use. You can use any
7676

7777
#### `runtime`
7878

79-
The `runtime` build argument adds the applicable runtimes to the image. By default, aikit will automatically choose the most optimized CPU runtime. You can use `cuda` to include NVIDIA CUDA runtime libraries. For example:
79+
The `runtime` build argument adds the applicable runtimes to the image. By default, aikit will automatically choose the most optimized CPU runtime.
80+
81+
You can use `cuda` to include NVIDIA CUDA runtime libraries. For example:
8082

8183
`--build-arg="runtime=cuda"`.
8284

85+
or `applesilicon` to include Apple Silicon runtime libraries. For example:
86+
87+
`--build-arg="runtime=applesilicon"`.
88+
8389
### Multi-Platform Support
8490

8591
AIKit supports AMD64 and ARM64 multi-platform images. To build a multi-platform image, you can simply add `--platform linux/amd64,linux/arm64` to the build command. For example:

website/docs/gpu.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: GPU Acceleration
33
---
44

55
:::note
6-
At this time, only NVIDIA GPU acceleration is supported. Please open an issue if you'd like to see support for other GPU vendors.
6+
At this time, only NVIDIA GPU acceleration is supported, with experimental support for Apple Silicon. Please open an issue if you'd like to see support for other GPU vendors.
77
:::
88

99
## NVIDIA
@@ -57,6 +57,43 @@ If GPU acceleration is working, you'll see output that is similar to following i
5757
5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: VRAM used: 5869 MB
5858
```
5959

60-
## Demo
60+
### Demo
6161

6262
https://www.youtube.com/watch?v=yFh_Zfk34PE
63+
64+
## Apple Silicon (experimental)
65+
66+
:::note
67+
Apple Silicon is an experimental runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.
68+
:::
69+
70+
AIKit supports Apple Silicon GPU acceleration with Podman Desktop for Mac with [`libkrun`](https://github.com/containers/libkrun). Please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu) on how to enable GPU support.
71+
72+
To get started with Apple Silicon GPU-accelerated inferencing, make sure to set the following in your `aikitfile` and build your model.
73+
74+
```yaml
75+
runtime: applesilicon # use Apple Silicon runtime
76+
```
77+
78+
Please note that only the default `llama.cpp` backend with `gguf` models are supported for Apple Silicon.
79+
80+
After building the model, you can run it with:
81+
82+
```bash
83+
# for pre-made models, replace "my-model" with the image name
84+
podman run --rm --device /dev/dri -p 8080:8080 my-model
85+
```
86+
87+
If GPU acceleration is working, you'll see output that is similar to following in the debug logs:
88+
89+
```bash
90+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr ggml_vulkan: Found 1 Vulkan devices:
91+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr Vulkan0: Virtio-GPU Venus (Apple M1 Max) (venus) | uma: 1 | fp16: 1 | warp size: 32
92+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llama_load_model_from_file: using device Vulkan0 (Virtio-GPU Venus (Apple M1 Max)) - 65536 MiB free
93+
...
94+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: offloading 32 repeating layers to GPU
95+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: offloading output layer to GPU
96+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: offloaded 33/33 layers to GPU
97+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: CPU_Mapped model buffer size = 52.84 MiB
98+
6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: Vulkan0 model buffer size = 2228.82 MiB
99+
```

0 commit comments

Comments
 (0)