Skip to content

Commit

Permalink
Merge pull request #748 from felicialim/minor-clarifications
Browse files Browse the repository at this point in the history
Minor clarifications
  • Loading branch information
felicialim authored Aug 31, 2023
2 parents c352f44 + 02d4de8 commit 3ab123f
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,7 @@ class OBUHeader() {
leb128() num_samples_to_trim_at_end;
leb128() num_samples_to_trim_at_start;
}
if (obu_extension_flag == 1) {
if (obu_extension_flag) {
leb128() extension_header_size;
unsigned int (8 x extension_header_size) extension_header_bytes;
}
Expand Down Expand Up @@ -508,15 +508,15 @@ This flag SHALL be set to 0 for this version of the specification. An OBU parser

NOTE: A future version of the specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of the extended header to [=extension_header_size=].

<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the obu_size field of the OBU. An OBU MAY have extra bytes after consuming all the bytes per the OBU syntax definition. Parsers compliant with this version of the specification SHOULD ignore the extra bytes.
<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields. The [=obu_size=] MAY be greater than the size needed to represent the OBU syntax defined in this version of the specification, for example, to represent new syntax defined in a future version of the specification. Parsers compliant with this version of the specification SHOULD ignore these bytes.

<dfn noexport>num_samples_to_trim_at_end</dfn> indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=].

<dfn noexport>num_samples_to_trim_at_start</dfn> indicates the number of samples that need to be trimmed from the start of the samples in this [=Audio Frame OBU=].

<dfn noexport>extension_header_size</dfn> indicates the size in bytes of the extension header immediately following this field.

<dfn noexport>extension_header_bytes</dfn> indicates the byte representations of the syntaxes of the extension header.
<dfn noexport>extension_header_bytes</dfn> indicates the byte representations of the syntaxes of the extension header. Parsers compliant with this version of the specification SHOULD ignore these bytes.

## Reserved OBU Syntax and Semantics ## {#obu-reserved}

Expand Down Expand Up @@ -608,7 +608,7 @@ NOTE: <code>ipcm</code> should not be confused with <code>lpcm</code>, which is

<dfn noexport>num_samples_per_frame</dfn> indicates the frame length, in samples, of the [=audio_frame=] provided in the audio_frame_obu. It SHALL NOT be set to zero. If the [=decoder_config=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal.

<dfn noexport>audio_roll_distance</dfn> indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate.
<dfn noexport>audio_roll_distance</dfn> indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the correct decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a correct, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate.
- It SHALL be set to \(-R\) when [=codec_id=] is set to <code>Opus</code>, where
\[R = \left\lceil{\frac{3840}{\text{num_samples_per_frame}}}\right\rceil.\]
- It SHALL be set to -1 when [=codec_id=] is set to <code>mp4a</code>.
Expand Down Expand Up @@ -2265,7 +2265,8 @@ Recon gain is REQUIRED only for [=num_layers=] > 1 and when [=codec_id=] is set
- \(\text{MA_gain}(k) = \frac{2}{N + 1} \times \frac{\text{recon_gain}(k)}{255} + \left( 1 - \frac{2}{N + 1} \right) \times \text{MA_gain}(k - 1)\), where \(\text{MA_gain}(0) = 1\).
- \(\text{e_window}[0:\text{olen}] = \text{hanning}[\text{olen}:]\), \(\text{e_window}[\text{olen}:\text{flen}] = 0\).
- \(\text{s_window}[0:\text{olen}] = \text{hanning}[:\text{olen}]\), \(\text{s_window}[\text{olen}:\text{flen}] = 1\).
- Where \(\text{hanning} = \text{np.hanning}(2 \times \text{olen})\), \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size.
- \(\text{hanning}(n) = 0.5 - 0.5 \cos \left( \frac{2 \pi n}{2 \times \text{olen} - 1} \right) \), \(0 \le n \le (2 \times \text{olen} - 1)\).
- Where \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size.
- The value \(N = 7\) is RECOMMENDED.

The figure below shows the smoothing scheme of [=recon_gain=].
Expand Down Expand Up @@ -2667,7 +2668,7 @@ All syntax elements conform to the [=Syntactic Description Language=] specified

<b>leb128()</b> indicates the type of an unsigned integer. To encode the following unsigned integer <b>syntaxName</b>, it first represents the integer in binary with an N-bit representation, where N is a multiple of 7. Then break the integer up into groups of 7 bits. Output one encoded byte for each 7 bits group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte.

<b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. Its size is limited to 32 bits.
<b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. The size of the unsigned integer to be encoded is limited to 32 bits. In other words, the value returned from the <b>leb128()</b> parsing process is less than or equal to \(2^{32} - 1\).

NOTE: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known.

Expand Down

0 comments on commit 3ab123f

Please sign in to comment.