From 7369c65014e60870b5e058c220e78339785365f1 Mon Sep 17 00:00:00 2001
From: Felicia Lim <flim@google.com>
Date: Wed, 30 Aug 2023 14:42:55 -0700
Subject: [PATCH 1/3] Minor clarifications

- obu_size includes the variable section of the OBU header
- the extra bytes in the header to ignore refers to the extension bytes
- audio_roll_distance is related to obtaining the correct, not perfect,
  audio signal after decoding
- clarify that the leb128 size limit of 32 bits is on the value after
  leb128 parsing
- replace np.hanning reference with the explicit formula used by np.hanning
---
 index.bs | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/index.bs b/index.bs
index 4c5806bb..c37a5286 100644
--- a/index.bs
+++ b/index.bs
@@ -451,7 +451,7 @@ class OBUHeader() {
     leb128() num_samples_to_trim_at_end;
     leb128() num_samples_to_trim_at_start;
   }
-  if (obu_extension_flag == 1) {
+  if (obu_extension_flag) {
     leb128() extension_header_size;
     unsigned int (8 x extension_header_size) extension_header_bytes;
   }
@@ -506,7 +506,7 @@ This flag SHALL be set to 0 for this version of the specification. An OBU parser
 
 NOTE: A future version of the specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of the extended header to [=extension_header_size=].
 
-<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the obu_size field of the OBU. An OBU MAY have extra bytes after consuming all the bytes per the OBU syntax definition. Parsers compliant with this version of the specification SHOULD ignore the extra bytes.
+<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields.
 	
 <dfn noexport>num_samples_to_trim_at_end</dfn> indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=].
 
@@ -514,7 +514,7 @@ NOTE: A future version of the specification may use this flag to specify an exte
 
 <dfn noexport>extension_header_size</dfn> indicates the size in bytes of the extension header immediately following this field.
 
-<dfn noexport>extension_header_bytes</dfn> indicates the byte representations of the syntaxes of the extension header.
+<dfn noexport>extension_header_bytes</dfn> indicates the byte representations of the syntaxes of the extension header. Parsers compliant with this version of the specification SHOULD ignore these bytes.
 
 ## Reserved OBU Syntax and Semantics ## {#obu-reserved}
 
@@ -602,7 +602,7 @@ NOTE: <code>ipcm</code> should not be confused with <code>lpcm</code>, which is
 
 <dfn noexport>num_samples_per_frame</dfn> indicates the frame length, in samples, of the [=audio_frame()=] provided in the audio_frame_obu. It SHALL NOT be set to zero. If the [=decoder_config=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal.
 
-<dfn noexport>audio_roll_distance</dfn> indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. 
+<dfn noexport>audio_roll_distance</dfn> indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the correct decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a correct, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. 
 - It SHALL be set to \(-R\) when [=codec_id=] is set to <code>Opus</code>, where
 	\[R = \left\lceil{\frac{3840}{\text{num_samples_per_frame}}}\right\rceil.\]
 - It SHALL be set to -1 when [=codec_id=] is set to <code>mp4a</code>.
@@ -2201,7 +2201,8 @@ Recon gain is REQUIRED only for [=num_layers=] > 1 and when [=codec_id=] is set
 	- \(\text{MA_gain}(k) = \frac{2}{N + 1} \times \frac{\text{recon_gain}(k)}{255} + \left( 1 - \frac{2}{N + 1} \right) \times \text{MA_gain}(k - 1)\), where \(\text{MA_gain}(0) = 1\).
 	- \(\text{e_window}[0:\text{olen}] = \text{hanning}[\text{olen}:]\), \(\text{e_window}[\text{olen}:\text{flen}] = 0\).
 	- \(\text{s_window}[0:\text{olen}] = \text{hanning}[:\text{olen}]\), \(\text{s_window}[\text{olen}:\text{flen}] = 1\).
-	- Where \(\text{hanning} = \text{np.hanning}(2 \times \text{olen})\), \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size.
+	- \(\text{hanning}(n) = 0.5 - 0.5 \cos \left( \frac{2 \pi n}{2 \times \text{olen} - 1}  \right) \), \(0 \le n \le (2 \times \text{olen} - 1)\).
+	- Where \(\text{flen}\) is the frame size and \(\text{olen}\) is the overlap size.
 	- The value \(N = 7\) is RECOMMENDED.
 
 The figure below shows the smoothing scheme of [=recon_gain=].
@@ -2795,7 +2796,7 @@ All syntax elements conform to the [=Syntactic Description Language=] specified
  
  <b>leb128()</b> indicates the type of an unsigned integer. To encode the following unsigned integer <b>syntaxName</b>, it first represents the integer in binary with an N-bit representation, where N is a multiple of 7. Then break the integer up into groups of 7 bits. Output one encoded byte for each 7 bits group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte.
 
- <b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. Its size is limited to 32 bits.
+ <b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. Its size is limited to 32 bits. In other words, the value returned from the leb128() parsing process is less than or equal to \(1 << 32) - 1\).
 
  NOTE: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known.
   

From d286b0976518f8ee8bb9ccb1d0867dfab8263333 Mon Sep 17 00:00:00 2001
From: Felicia Lim <flim@google.com>
Date: Wed, 30 Aug 2023 17:16:15 -0700
Subject: [PATCH 2/3] Update leb128 clarification to avoid undefined bitwise
 operator

---
 index.bs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/index.bs b/index.bs
index c37a5286..f9aa76ab 100644
--- a/index.bs
+++ b/index.bs
@@ -2796,7 +2796,7 @@ All syntax elements conform to the [=Syntactic Description Language=] specified
  
  <b>leb128()</b> indicates the type of an unsigned integer. To encode the following unsigned integer <b>syntaxName</b>, it first represents the integer in binary with an N-bit representation, where N is a multiple of 7. Then break the integer up into groups of 7 bits. Output one encoded byte for each 7 bits group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte.
 
- <b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. Its size is limited to 32 bits. In other words, the value returned from the leb128() parsing process is less than or equal to \(1 << 32) - 1\).
+ <b>syntaxName</b> is an unsigned integer which is encoded by <b>leb128()</b>. The size of the unsigned integer to be encoded is limited to 32 bits. In other words, the value returned from the <b>leb128()</b> parsing process is less than or equal to \(2^{32} - 1\).
 
  NOTE: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known.
   

From 02d4de85b535162a0f2c427aa47c8b2eb5166649 Mon Sep 17 00:00:00 2001
From: Felicia Lim <flim@google.com>
Date: Wed, 30 Aug 2023 22:39:13 -0700
Subject: [PATCH 3/3] Update obu_size extra bytes

---
 index.bs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/index.bs b/index.bs
index f9aa76ab..0c967c13 100644
--- a/index.bs
+++ b/index.bs
@@ -506,7 +506,7 @@ This flag SHALL be set to 0 for this version of the specification. An OBU parser
 
 NOTE: A future version of the specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of the extended header to [=extension_header_size=].
 
-<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields.
+<dfn noexport>obu_size</dfn> indicates the size in bytes of the OBU immediately following the [=obu_size=] field. If the [=obu_trimming_status_flag=] and/or [=obu_extension_flag=] fields are set to 1, [=obu_size=] SHALL include the sizes of the additional fields. The [=obu_size=] MAY be greater than the size needed to represent the OBU syntax defined in this version of the specification, for example, to represent new syntax defined in a future version of the specification. Parsers compliant with this version of the specification SHOULD ignore these bytes.
 	
 <dfn noexport>num_samples_to_trim_at_end</dfn> indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=].