Merge pull request #748 from felicialim/minor-clarifications

Minor clarifications
AOMediaCodec · Aug 31, 2023 · 3ab123f · 3ab123f
2 parents c352f44 + 02d4de8
commit 3ab123f
Showing 1 changed file with 7 additions and 6 deletions.
diff --git a/index.bs b/index.bs
@@ -453,7 +453,7 @@ class OBUHeader() {
     leb128() num_samples_to_trim_at_end;
     leb128() num_samples_to_trim_at_start;
   }
-  if (obu_extension_flag == 1) {
+  if (obu_extension_flag) {
     leb128() extension_header_size;
     unsigned int (8 x extension_header_size) extension_header_bytes;
   }
@@ -508,15 +508,15 @@ This flag SHALL be set to 0 for this version of the specification. An OBU parser
 
 NOTE: A future version of the specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of the extended header to [=extension_header_size=].
 
-<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the obu_size field of the OBU. An OBU MAY have extra bytes after consuming all the bytes per the OBU syntax definition. Parsers compliant with this version of the specification SHOULD ignore the extra bytes.
+<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields. The [=obu_size=] MAY be greater than the size needed to represent the OBU syntax defined in this version of the specification, for example, to represent new syntax defined in a future version of the specification. Parsers compliant with this version of the specification SHOULD ignore these bytes.
 
 <dfn noexport>num_samples_to_trim_at_end</dfn> indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=].
 
 <dfn noexport>num_samples_to_trim_at_start</dfn> indicates the number of samples that need to be trimmed from the start of the samples in this [=Audio Frame OBU=].
 
 <dfn noexport>extension_header_size</dfn> indicates the size in bytes of the extension header immediately following this field.
 
-<dfn noexport>extension_header_bytes</dfn> indicates the byte representations of the syntaxes of the extension header.
+<dfn noexport>extension_header_bytes</dfn> indicates the byte representations of the syntaxes of the extension header. Parsers compliant with this version of the specification SHOULD ignore these bytes.
 
 ## Reserved OBU Syntax and Semantics ## {#obu-reserved}
 
@@ -608,7 +608,7 @@ NOTE: <code>ipcm</code> should not be confused with <code>lpcm</code>, which is
 
 <dfn noexport>num_samples_per_frame</dfn> indicates the frame length, in samples, of the [=audio_frame=] provided in the audio_frame_obu. It SHALL NOT be set to zero. If the [=decoder_config=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal.
 
-<dfn noexport>audio_roll_distance</dfn> indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. 
+<dfn noexport>audio_roll_distance</dfn> indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the correct decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a correct, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. 
 - It SHALL be set to \(-R\) when [=codec_id=] is set to <code>Opus</code>, where
 	\[R = \left\lceil{\frac{3840}{\text{num_samples_per_frame}}}\right\rceil.\]
 - It SHALL be set to -1 when [=codec_id=] is set to <code>mp4a</code>.
@@ -2265,7 +2265,8 @@ Recon gain is REQUIRED only for [=num_layers=] > 1 and when [=codec_id=] is set
 	- \(\text{MA_gain}(k) = \frac{2}{N + 1} \times \frac{\text{recon_gain}(k)}{255} + \left( 1 - \frac{2}{N + 1} \right) \times \text{MA_gain}(k - 1)\), where \(\text{MA_gain}(0) = 1\).
 	- \(\text{e_window}[0:\text{olen}] = \text{hanning}[\text{olen}:]\), \(\text{e_window}[\text{olen}:\text{flen}] = 0\).
 	- \(\text{s_window}[0:\text{olen}] = \text{hanning}[:\text{olen}]\), \(\text{s_window}[\text{olen}:\text{flen}] = 1\).
-	- Where \(\text{hanning} = \text{np.hanning}(2 \times \text{olen})\), \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size.
+	- \(\text{hanning}(n) = 0.5 - 0.5 \cos \left( \frac{2 \pi n}{2 \times \text{olen} - 1}  \right) \), \(0 \le n \le (2 \times \text{olen} - 1)\).
+	- Where \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size.
 	- The value \(N = 7\) is RECOMMENDED.
 
 The figure below shows the smoothing scheme of [=recon_gain=].
@@ -2667,7 +2668,7 @@ All syntax elements conform to the [=Syntactic Description Language=] specified
 
  <b>leb128()</b> indicates the type of an unsigned integer. To encode the following unsigned integer <b>syntaxName</b>, it first represents the integer in binary with an N-bit representation, where N is a multiple of 7. Then break the integer up into groups of 7 bits. Output one encoded byte for each 7 bits group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte.
 
- <b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. Its size is limited to 32 bits.
+ <b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. The size of the unsigned integer to be encoded is limited to 32 bits. In other words, the value returned from the <b>leb128()</b> parsing process is less than or equal to \(2^{32} - 1\).
 
  NOTE: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known.