docs: update apple silicon docs (#436)

sozercan · web-flow · commit a23de47d5965 · 2024-11-25T19:24:52.000-08:00
Signed-off-by: Sertac Ozercan &lt;sozercan@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -119,7 +119,7 @@ If it doesn't include a specific model, you can always [create your own images](
 ### Apple Silicon (experimental)
 
 > [!NOTE]
-> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu).
+> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu). For more information, please see [GPU Acceleration](https://sozercan.github.io/aikit/docs/gpu).
 >
 > Apple Silicon is an _experimental_ runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.
 >
diff --git a/website/docs/create-images.md b/website/docs/create-images.md
@@ -76,10 +76,16 @@ The `model` build argument is the model URL to download and use. You can use any
 
 #### `runtime`
 
-The `runtime` build argument adds the applicable runtimes to the image. By default, aikit will automatically choose the most optimized CPU runtime. You can use `cuda` to include NVIDIA CUDA runtime libraries. For example:
+The `runtime` build argument adds the applicable runtimes to the image. By default, aikit will automatically choose the most optimized CPU runtime.
+
+You can use `cuda` to include NVIDIA CUDA runtime libraries. For example:
 
 `--build-arg="runtime=cuda"`.
 
+or `applesilicon` to include Apple Silicon runtime libraries. For example:
+
+`--build-arg="runtime=applesilicon"`.
+
 ### Multi-Platform Support
 
 AIKit supports AMD64 and ARM64 multi-platform images. To build a multi-platform image, you can simply add `--platform linux/amd64,linux/arm64` to the build command. For example:
diff --git a/website/docs/gpu.md b/website/docs/gpu.md
@@ -3,7 +3,7 @@ title: GPU Acceleration
 ---
 
 :::note
-At this time, only NVIDIA GPU acceleration is supported. Please open an issue if you'd like to see support for other GPU vendors.
+At this time, only NVIDIA GPU acceleration is supported, with experimental support for Apple Silicon. Please open an issue if you'd like to see support for other GPU vendors.
 :::
 
 ## NVIDIA
@@ -57,6 +57,43 @@ If GPU acceleration is working, you'll see output that is similar to following i
 5:32AM DBG GRPC(llama-2-7b-chat.Q4_K_M.gguf-127.0.0.1:43735): stderr llm_load_tensors: VRAM used: 5869 MB
 ```
 
-## Demo
+### Demo
 
 https://www.youtube.com/watch?v=yFh_Zfk34PE
+
+## Apple Silicon (experimental)
+
+:::note
+Apple Silicon is an experimental runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.
+:::
+
+AIKit supports Apple Silicon GPU acceleration with Podman Desktop for Mac with [`libkrun`](https://github.com/containers/libkrun). Please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu) on how to enable GPU support.
+
+To get started with Apple Silicon GPU-accelerated inferencing, make sure to set the following in your `aikitfile` and build your model.
+
+```yaml
+runtime: applesilicon         # use Apple Silicon runtime
+```
+
+Please note that only the default `llama.cpp` backend with `gguf` models are supported for Apple Silicon.
+
+After building the model, you can run it with:
+
+```bash
+# for pre-made models, replace "my-model" with the image name
+podman run --rm --device /dev/dri -p 8080:8080 my-model
+```
+
+If GPU acceleration is working, you'll see output that is similar to following in the debug logs:
+
+```bash
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr ggml_vulkan: Found 1 Vulkan devices:
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr Vulkan0: Virtio-GPU Venus (Apple M1 Max) (venus) | uma: 1 | fp16: 1 | warp size: 32
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llama_load_model_from_file: using device Vulkan0 (Virtio-GPU Venus (Apple M1 Max)) - 65536 MiB free
+...
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: offloading 32 repeating layers to GPU
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: offloading output layer to GPU
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors: offloaded 33/33 layers to GPU
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors:   CPU_Mapped model buffer size =    52.84 MiB
+6:16AM DBG GRPC(phi-3.5-3.8b-instruct-127.0.0.1:39883): stderr llm_load_tensors:      Vulkan0 model buffer size =  2228.82 MiB
+```

Original file line number	Diff line number	Diff line change
`@@ -119,7 +119,7 @@ If it doesn't include a specific model, you can always [create your own images](`
`119`	`119`	`### Apple Silicon (experimental)`
`120`	`120`
`121`	`121`	`> [!NOTE]`
`122`		`-> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu).`
	`122`	`+> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu). For more information, please see [GPU Acceleration](https://sozercan.github.io/aikit/docs/gpu).`
`123`	`123`	`>`
`124`	`124`	`> Apple Silicon is an _experimental_ runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.`
`125`	`125`	`>`