diff --git a/index.bs b/index.bs index dda54718..a28cd2e4 100644 --- a/index.bs +++ b/index.bs @@ -453,7 +453,7 @@ class OBUHeader() { leb128() num_samples_to_trim_at_end; leb128() num_samples_to_trim_at_start; } - if (obu_extension_flag == 1) { + if (obu_extension_flag) { leb128() extension_header_size; unsigned int (8 x extension_header_size) extension_header_bytes; } @@ -508,7 +508,7 @@ This flag SHALL be set to 0 for this version of the specification. An OBU parser NOTE: A future version of the specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of the extended header to [=extension_header_size=]. -obu_size indicates the size in bytes of the OBU immediately following the obu_size field of the OBU. An OBU MAY have extra bytes after consuming all the bytes per the OBU syntax definition. Parsers compliant with this version of the specification SHOULD ignore the extra bytes. +obu_size indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields. The [=obu_size=] MAY be greater than the size needed to represent the OBU syntax defined in this version of the specification, for example, to represent new syntax defined in a future version of the specification. Parsers compliant with this version of the specification SHOULD ignore these bytes. num_samples_to_trim_at_end indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=]. @@ -516,7 +516,7 @@ NOTE: A future version of the specification may use this flag to specify an exte extension_header_size indicates the size in bytes of the extension header immediately following this field. -extension_header_bytes indicates the byte representations of the syntaxes of the extension header. +extension_header_bytes indicates the byte representations of the syntaxes of the extension header. Parsers compliant with this version of the specification SHOULD ignore these bytes. ## Reserved OBU Syntax and Semantics ## {#obu-reserved} @@ -608,7 +608,7 @@ NOTE: ipcm should not be confused with lpcm, which is num_samples_per_frame indicates the frame length, in samples, of the [=audio_frame=] provided in the audio_frame_obu. It SHALL NOT be set to zero. If the [=decoder_config=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal. -audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. +audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the correct decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a correct, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. - It SHALL be set to \(-R\) when [=codec_id=] is set to Opus, where \[R = \left\lceil{\frac{3840}{\text{num_samples_per_frame}}}\right\rceil.\] - It SHALL be set to -1 when [=codec_id=] is set to mp4a. @@ -2265,7 +2265,8 @@ Recon gain is REQUIRED only for [=num_layers=] > 1 and when [=codec_id=] is set - \(\text{MA_gain}(k) = \frac{2}{N + 1} \times \frac{\text{recon_gain}(k)}{255} + \left( 1 - \frac{2}{N + 1} \right) \times \text{MA_gain}(k - 1)\), where \(\text{MA_gain}(0) = 1\). - \(\text{e_window}[0:\text{olen}] = \text{hanning}[\text{olen}:]\), \(\text{e_window}[\text{olen}:\text{flen}] = 0\). - \(\text{s_window}[0:\text{olen}] = \text{hanning}[:\text{olen}]\), \(\text{s_window}[\text{olen}:\text{flen}] = 1\). - - Where \(\text{hanning} = \text{np.hanning}(2 \times \text{olen})\), \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size. + - \(\text{hanning}(n) = 0.5 - 0.5 \cos \left( \frac{2 \pi n}{2 \times \text{olen} - 1} \right) \), \(0 \le n \le (2 \times \text{olen} - 1)\). + - Where \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size. - The value \(N = 7\) is RECOMMENDED. The figure below shows the smoothing scheme of [=recon_gain=]. @@ -2667,7 +2668,7 @@ All syntax elements conform to the [=Syntactic Description Language=] specified leb128() indicates the type of an unsigned integer. To encode the following unsigned integer syntaxName, it first represents the integer in binary with an N-bit representation, where N is a multiple of 7. Then break the integer up into groups of 7 bits. Output one encoded byte for each 7 bits group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte. - syntaxName is an unsigned integer which is encoded by leb128(). Its size is limited to 32 bits. + syntaxName is an unsigned integer which is encoded by leb128(). The size of the unsigned integer to be encoded is limited to 32 bits. In other words, the value returned from the leb128() parsing process is less than or equal to \(2^{32} - 1\). NOTE: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known.