From 7369c65014e60870b5e058c220e78339785365f1 Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Wed, 30 Aug 2023 14:42:55 -0700 Subject: [PATCH 1/3] Minor clarifications - obu_size includes the variable section of the OBU header - the extra bytes in the header to ignore refers to the extension bytes - audio_roll_distance is related to obtaining the correct, not perfect, audio signal after decoding - clarify that the leb128 size limit of 32 bits is on the value after leb128 parsing - replace np.hanning reference with the explicit formula used by np.hanning --- index.bs | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/index.bs b/index.bs index 4c5806bb..c37a5286 100644 --- a/index.bs +++ b/index.bs @@ -451,7 +451,7 @@ class OBUHeader() { leb128() num_samples_to_trim_at_end; leb128() num_samples_to_trim_at_start; } - if (obu_extension_flag == 1) { + if (obu_extension_flag) { leb128() extension_header_size; unsigned int (8 x extension_header_size) extension_header_bytes; } @@ -506,7 +506,7 @@ This flag SHALL be set to 0 for this version of the specification. An OBU parser NOTE: A future version of the specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of the extended header to [=extension_header_size=]. -obu_size indicates the size in bytes of the OBU immediately following the obu_size field of the OBU. An OBU MAY have extra bytes after consuming all the bytes per the OBU syntax definition. Parsers compliant with this version of the specification SHOULD ignore the extra bytes. +obu_size indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields. num_samples_to_trim_at_end indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=]. @@ -514,7 +514,7 @@ NOTE: A future version of the specification may use this flag to specify an exte extension_header_size indicates the size in bytes of the extension header immediately following this field. -extension_header_bytes indicates the byte representations of the syntaxes of the extension header. +extension_header_bytes indicates the byte representations of the syntaxes of the extension header. Parsers compliant with this version of the specification SHOULD ignore these bytes. ## Reserved OBU Syntax and Semantics ## {#obu-reserved} @@ -602,7 +602,7 @@ NOTE: ipcm should not be confused with lpcm, which is num_samples_per_frame indicates the frame length, in samples, of the [=audio_frame()=] provided in the audio_frame_obu. It SHALL NOT be set to zero. If the [=decoder_config=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal. -audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. +audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the correct decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a correct, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. - It SHALL be set to \(-R\) when [=codec_id=] is set to Opus, where \[R = \left\lceil{\frac{3840}{\text{num_samples_per_frame}}}\right\rceil.\] - It SHALL be set to -1 when [=codec_id=] is set to mp4a. @@ -2201,7 +2201,8 @@ Recon gain is REQUIRED only for [=num_layers=] > 1 and when [=codec_id=] is set - \(\text{MA_gain}(k) = \frac{2}{N + 1} \times \frac{\text{recon_gain}(k)}{255} + \left( 1 - \frac{2}{N + 1} \right) \times \text{MA_gain}(k - 1)\), where \(\text{MA_gain}(0) = 1\). - \(\text{e_window}[0:\text{olen}] = \text{hanning}[\text{olen}:]\), \(\text{e_window}[\text{olen}:\text{flen}] = 0\). - \(\text{s_window}[0:\text{olen}] = \text{hanning}[:\text{olen}]\), \(\text{s_window}[\text{olen}:\text{flen}] = 1\). - - Where \(\text{hanning} = \text{np.hanning}(2 \times \text{olen})\), \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size. + - \(\text{hanning}(n) = 0.5 - 0.5 \cos \left( \frac{2 \pi n}{2 \times \text{olen} - 1} \right) \), \(0 \le n \le (2 \times \text{olen} - 1)\). + - Where \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size. - The value \(N = 7\) is RECOMMENDED. The figure below shows the smoothing scheme of [=recon_gain=]. @@ -2795,7 +2796,7 @@ All syntax elements conform to the [=Syntactic Description Language=] specified leb128() indicates the type of an unsigned integer. To encode the following unsigned integer syntaxName, it first represents the integer in binary with an N-bit representation, where N is a multiple of 7. Then break the integer up into groups of 7 bits. Output one encoded byte for each 7 bits group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte. - syntaxName is an unsigned integer which is encoded by leb128(). Its size is limited to 32 bits. + syntaxName is an unsigned integer which is encoded by leb128(). Its size is limited to 32 bits. In other words, the value returned from the leb128() parsing process is less than or equal to \(1 << 32) - 1\). NOTE: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known. From d286b0976518f8ee8bb9ccb1d0867dfab8263333 Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Wed, 30 Aug 2023 17:16:15 -0700 Subject: [PATCH 2/3] Update leb128 clarification to avoid undefined bitwise operator --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index c37a5286..f9aa76ab 100644 --- a/index.bs +++ b/index.bs @@ -2796,7 +2796,7 @@ All syntax elements conform to the [=Syntactic Description Language=] specified leb128() indicates the type of an unsigned integer. To encode the following unsigned integer syntaxName, it first represents the integer in binary with an N-bit representation, where N is a multiple of 7. Then break the integer up into groups of 7 bits. Output one encoded byte for each 7 bits group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte. - syntaxName is an unsigned integer which is encoded by leb128(). Its size is limited to 32 bits. In other words, the value returned from the leb128() parsing process is less than or equal to \(1 << 32) - 1\). + syntaxName is an unsigned integer which is encoded by leb128(). The size of the unsigned integer to be encoded is limited to 32 bits. In other words, the value returned from the leb128() parsing process is less than or equal to \(2^{32} - 1\). NOTE: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known. From 02d4de85b535162a0f2c427aa47c8b2eb5166649 Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Wed, 30 Aug 2023 22:39:13 -0700 Subject: [PATCH 3/3] Update obu_size extra bytes --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index f9aa76ab..0c967c13 100644 --- a/index.bs +++ b/index.bs @@ -506,7 +506,7 @@ This flag SHALL be set to 0 for this version of the specification. An OBU parser NOTE: A future version of the specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of the extended header to [=extension_header_size=]. -obu_size indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields. +obu_size indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields. The [=obu_size=] MAY be greater than the size needed to represent the OBU syntax defined in this version of the specification, for example, to represent new syntax defined in a future version of the specification. Parsers compliant with this version of the specification SHOULD ignore these bytes. num_samples_to_trim_at_end indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=].