forked from WebAudio/web-audio-api-v2
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Address #13: Add explainer for user-selectable size
- Loading branch information
Raymond Toy
committed
Apr 5, 2021
1 parent
b9ee3a5
commit 2a2e9df
Showing
1 changed file
with
150 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
# User-Selectable Render Size | ||
Raymond Toy | ||
|
||
## Background | ||
Historically, WebAudio has always rendered the graph in chunks of 128 frames, | ||
called a [render | ||
quantum](https://webaudio.github.io/web-audio-api/#render-quantum) in the specification. | ||
This was probably a trade-off between function-call overhead and latency. | ||
A smaller number would reduce latency, but the function call overhead would | ||
increase. With a larger value, the overhead is reduced, but the latency | ||
increases because any change takes 128 frames to before reaching the output. In | ||
addition, Mac OS probably processed 128 frames at a time anyway. | ||
|
||
## Issues | ||
This has worked well over time, especially on desktop, but is particularly bad | ||
on Android where 128 may not fit in well with Android's audio processing. The | ||
main problem is illustrated very well from the example from Paul Adenot in a | ||
[comment to issue #13](https://github.com/WebAudio/web-audio-api-v2/issues/13#issuecomment-572469654), | ||
reproduced below. | ||
|
||
The example is an Android phone that has a native processing buffer size | ||
of 192. Since WebAudio processes 128 frames we have the following behavior: | ||
|
||
|iteration | # number of frames to render | number of buffers to render | leftover frames| | ||
------------------------------------------------------------------------------------------------ | ||
|0| 192| 2| 64| | ||
|1| 192| 1| 0| | ||
|2| 192| 2| 64| | ||
|3| 192| 1| 0| | ||
|4| 192| 2| 64| | ||
|5| 192| 1| 0| | ||
|
||
At a sample rate of 48 kHz, 128 frames would require 2.666 ms. The net result | ||
is that the **peak** CPU usage is twice has high as might be expected since, | ||
every other time, you need to render the graph twice in 2.666 ms instead of once. | ||
Then the max complexity of the graph is unexpectedly limited because of this. | ||
|
||
However, if the WebAudio rendered 192 frames at a time, the CPU usage would | ||
remain constant, and more complex graphs could be rendered because the peak CPU | ||
would be same as the average. This does increase latency a bit, but since | ||
Android is already using a size of 192, there is no actual additional latency. | ||
|
||
Finally, some applications do not need these low latency requirements, and may | ||
also want AudioWorklets to process larger blocks to reduce function call | ||
overhead. In this case allowing render sizes of 1024 or 2048 could be | ||
appropriate. | ||
|
||
## API | ||
To allow user-selectable render size, we propose the following API: | ||
|
||
```idl | ||
// New enum | ||
enum AudioContextRenderSizeCategory { | ||
"default", | ||
"hardware" | ||
}; | ||
dictionary AudioContextOptions { | ||
(AudioContextLatencyCategory or double) latencyHint = "interactive"; | ||
float sampleRate; | ||
// New addition | ||
(AudioContextRenderSizeCategory or unsigned long) renderSizeHint = "default"; | ||
}; | ||
dictionary OfflineAudioContextOptions { | ||
unsigned long numberOfChannels = 1; | ||
required unsigned long length; | ||
required float sampleRate; | ||
// New addition | ||
(AudioContextRenderSizeCategory or unsigned long) renderSizeHint = "default"; | ||
}; | ||
partial interface BaseAudioContext { | ||
unsigned long renderSize; | ||
}; | ||
``` | ||
|
||
This API assumes that we'll update `AudioContextOptions` and | ||
`OfflineAudioContextOptions` with a new member, `renderSizeHint`, instead of | ||
defining a new dictionary. | ||
|
||
The "default" category means WebAudio will use its default render size of 128 | ||
as it currently does. | ||
|
||
When a value is given for `renderSizeHint`, the UA is allowed to modify the | ||
requested size to any appropriate size in a UA-specific way. | ||
|
||
In any case, the actual render size used by the UA is reported by the attribute | ||
`renderSize` of the `BaseAudioContext`. | ||
|
||
|
||
### "hardware" Category | ||
#### AudioContext | ||
For an `AudioContext`, the "hardware" category means WebAudio will ask the | ||
system for the appropriate render size for the output device. The UA is allowed | ||
to choose the "hardware" value as appropriate; it does not necessary reflect | ||
what the OS may say is the hardware size. | ||
|
||
#### OfflineAudioContext | ||
For an 'OfflineAudioContext', there's no concept of "hardware", so "hardware" is | ||
equivalent to "default". Allowing an `OfflineAudioContext` to have a selectable | ||
size enables testing of the render size. | ||
|
||
We explicitly do not support selecting a render size when using the | ||
[3-arg constructor](https://webaudio.github.io/web-audio-api/#dom-offlineaudiocontext-offlineaudiocontext-numberofchannels-length-samplerate) for the `OfflineAudioContext`. | ||
|
||
## Requirements | ||
### Supported Sizes | ||
All UA's must support a `renderSize` that is a power of two between 32 and 2048, | ||
inclusive. | ||
|
||
It is highly recommended that other sizes that are not a power of two be | ||
supported. This is particularly important on Android where sizes of 96, 144, | ||
192, and 240 are quite common. The the problem isn't limited to Android. | ||
Windows generally wants 10 ms buffers so we want sizes of 440 or 480 for 44.1 | ||
kHz and 48 kHz, respectively. | ||
|
||
### ScriptProcessorNode | ||
The [construction of a `ScriptProcessorNode`](https://webaudio.github.io/web-audio-api/#dom-baseaudiocontext-createscriptprocessor) | ||
requires a | ||
[`bufferSize`](https://webaudio.github.io/web-audio-api/#dom-baseaudiocontext-createscriptprocessor-buffersize-numberofinputchannels-numberofoutputchannels-buffersize) | ||
argument that must be a power of two. This is fine with appropriate buffering, | ||
but perhaps it would be better if the buffer sizes are defined to be a power of | ||
two times the render size. So, while the current allowed sizes are 0, `128*2`, | ||
`128*2^2`, `128*2^3`, `128*2^4`, `128*2^5`, `128*2^6`, and `128*2^7`, we may | ||
want to specify the sizes as 0, `r*2`, `r*2^2`, `r*2^3`, `r*2^4`, `r*2^5`, | ||
`r*2^6`, and `r*2^7`, where `r` is the `renderSize`. | ||
|
||
|
||
# Implementation Issues | ||
Conceptually this change is relatively simple, but some nodes may have | ||
additional complexities. It is up to the UA to handle these appropriately. | ||
|
||
## AnalyserNode Implementation | ||
The `AnalyserNode` currently specifies powers of two both for the size of the | ||
returned time-domain data and for the size of the frequency domain data. This | ||
is probably ok. | ||
|
||
## ConvolverNode Implementation | ||
For efficiency, the `ConvolverNode` is often implemented using FFTs. Typically, | ||
only power-of-two FFTs have been used because the render size was 128. To | ||
support user-selectable sizes, either more complex algorithms are needed to | ||
buffer the data appropriately, or more general FFTs are required to support | ||
sizes that are not a power of two. It is up to the discretion of the UA to | ||
implement this appropriately. | ||
|
||
## ScriptProcessorNode Implementation | ||
We've already proposed a change for the `ScriptProcessorNode`. |