-
Notifications
You must be signed in to change notification settings - Fork 0
/
doc.go
121 lines (107 loc) · 3.15 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
/*
Package metal
is a library
for running computational tasks (GPGPU)
on [Apple silicon]
through Apple's [Metal API].
# Metal (Apple API)
Apple's Metal API
is a unified framework
for performing various types of task
on Apple silicon GPUs.
It offers low-level, direct, detailed access
to the hardware (hence, "metal" )
for fast and efficient processing.
The processing centers around pipelines,
which consist of
a function to run
and an arbitrary number of arguments and buffers.
The metal function is parsed
into a series of operations,
and the arguments and buffers of data
are streamed through it
in SIMD groups.
(For more details on SIMD groups
and best practices
for writing metal functions using them,
see Apple's documentation [on threads and threadgroups].)
# metal (go package)
This library
leverages Apple's Metal API
to run computational processes
in a distributed, parallel method.
First,
a metal function is parsed,
added to a pipeline,
and cached.
This happens once
for every metal function.
Then,
any number of metal buffers
are created.
A metal buffer
is an array
of arbitrary length
that references items
of an arbitrary type.
The actual type
is defined in the metal function's definition.
Finally,
the metal function
is run
with the metal buffers
and any static arguments.
This streams
the arguments and the data in the buffers
through the computational operation(s)
as sequenced in the metal function.
# Types
This is the mapping
of Go types
to Metal types:
| Go | Metal |
| ------- | ------ |
| float32 | float |
| N/A | half |
| int32 | int |
| int16 | short |
| uint32 | uint |
| uint16 | ushort |
# Limitations
- This library
technically supports only Apple GPUs
that allow non-uniform threadgroup sizes.
A table of GPUs and their feature sets can be found on [page 4 here].
Most support this feature.
There has been no testing done on GPUs that don't support it.
- This library
currently
does not support non-standard buffers,
such as buffers
that are only accessible to the GPU.
All buffers
are currently created
with the same access and performance settings.
- This library
is intended specifically
for running computations (as opposed to renderings).
This means ths metal functions
must be kernel functions,
i.e. prefixed with "kernel"
and returning "void".
# Resources
These are some helpful resources
for understanding this process better
and how to use the metal API efficiently:
- https://adrianhesketh.com/2022/03/31/use-m1-gpu-with-go/
- https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu
Details on the language specifications
for writing metal functions
can be found in the [MSL Specification].
[Apple silicon]: https://en.wikipedia.org/wiki/Apple_silicon
[Metal API]: https://developer.apple.com/metal/
[on threads and threadgroups]: https://developer.apple.com/documentation/metal/compute_passes/creating_threads_and_threadgroups#2928931
[page 4 here]: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf
[MSL Specification]: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
*/
package metal