From a42dfdf52e873485288f3728153a0ba868f26e62 Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Tue, 15 Aug 2023 12:40:55 -0700 Subject: [PATCH 01/11] 3.1: Switch to `ClassName element_name` style. Addresses issue #654 for Section 3.1. --- index.bs | 68 ++++++++++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/index.bs b/index.bs index e377d06c..6bc3dc02 100644 --- a/index.bs +++ b/index.bs @@ -402,34 +402,34 @@ This section specifies the OBU syntax elements and their semantics. ## Immersive Audio OBU Syntax and Semantics ## {#immersiveaudio-obu} -OBUs are structured with an obu_header() and an OBU payload. +OBUs are structured with an obu_header and an OBU payload. -obu_header() and all OBU payloads including reserved_obu() are byte aligned. +The obu_header and all OBU payloads including reserved_obu are byte aligned. Syntax ``` -class ia_open_bitstream_unit() { - obu_header(); +class IaOpenBitstreamUnit() { + ObuHeader obu_header; if (obu_type == OBU_IA_Sequence_Header) - ia_sequence_header_obu(); + IaSequenceHeaderObu ia_sequence_header_obu; else if (obu_type == OBU_IA_Codec_Config) - codec_config_obu(); + CodecConfigObu codec_config_obu; else if (obu_type == OBU_IA_Audio_Element) - audio_element_obu(); + AudioElementObu audio_element_obu; else if (obu_type == OBU_IA_Mix_Presentation) - mix_presentation_obu(); + MixPresentationObu mix_presentation_obu; else if (obu_type == OBU_IA_Parameter_Block) - parameter_block_obu(); + ParameterBlockObu parameter_block_obu; else if (obu_type == OBU_IA_Temporal_Delimiter) - temporal_delimiter_obu(); + TemporalDelimiterObu temporal_delimiter_obu; else if (obu_type == OBU_IA_Audio_Frame) - audio_frame_obu(true); + AudioFrameObu audio_frame_obu(true); else if (obu_type >= 6 and <= 23) - audio_frame_obu(false); + AudioFrameObu audio_frame_obu(false); else if (obu_type >=24 and <= 30) - reserved_obu(); + ReservedObu reserved_obu; } ``` @@ -443,7 +443,7 @@ If the syntax element [=obu_type=] is equal to OBU_IA_Sequence_Header, an ordere Syntax ``` -class obu_header() { +class ObuHeader() { unsigned int (5) obu_type; unsigned int (1) obu_redundant_copy; unsigned int (1) obu_trimming_status_flag; @@ -526,7 +526,7 @@ Reserved OBUs SHOULD be ignored by parsers compliant with this version of the sp Syntax ``` -class reserved_obu() { +class ReservedObu() { } ``` @@ -544,7 +544,7 @@ This OBU MAY be placed frequently within one single [=IA Sequence=] for an appli Syntax ``` -class ia_sequence_header_obu() { +class IaSequenceHeaderObu() { unsigned int (32) ia_code; unsigned int (8) primary_profile; unsigned int (8) additional_profile; @@ -575,7 +575,7 @@ This section specifies the OBU payload of OBU_IA_Codec_Config. Syntax ``` -class codec_config_obu() { +class CodecConfigObu() { leb128() codec_config_id; codec_config(); } @@ -602,7 +602,7 @@ Parsers compliant with this version of the specification SHOULD ignore [=Codec C NOTE: 'ipcm' should not be confused with lpcm, which is another 4CC to identify codecs in other container formats (e.g., QuickTime). -num_samples_per_frame indicates the frame length, in samples, of the [=audio_frame()=] provided in the audio_frame_obu(). It SHALL NOT be set to zero. If the [=decoder_config()=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal. +num_samples_per_frame indicates the frame length, in samples, of the [=audio_frame()=] provided in the audio_frame_obu. It SHALL NOT be set to zero. If the [=decoder_config()=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal. audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the encoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. - It SHALL be set to -R when [=codec_id=] is set to 'Opus', where R is ceil(3840 / [=num_samples_per_frame=]). @@ -619,7 +619,7 @@ This section specifies the OBU payload of OBU_IA_Audio_Element. Syntax ``` -class audio_element_obu() { +class AudioElementObu() { leb128() audio_element_id; unsigned int (3) audio_element_type; unsigned int (5) reserved; @@ -689,11 +689,11 @@ audio_element_type: The type of audio representation. 2~7 : Reserved -codec_config_id indicates the identifier for the codec configuration which this [=Audio Element=] refers to. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a [=codec_config_id=] identifying an unknown [=codec_id=]. +codec_config_id indicates the identifier for the codec configuration which this [=Audio Element=] refers to. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a [=codec_config_id=] identifying an unknown [=codec_id=]. num_substreams specifies the number of [=Audio Substream=]s that are used to reconstruct this [=Audio Element=]. It SHALL NOT be set to 0. -audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to. +audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to. Let a particular [=ChannelGroup=]'s [=Audio Substream=]s be indexed as [c, n_c], where a [=ChannelGroup=] generation rule is described in [[#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule]] and - [=c=] = [1, ..., C] is the [=ChannelGroup=] index and C is the number of [=ChannelGroup=]s. @@ -1058,7 +1058,7 @@ A scene-based [=Audio Element=] has only one [=ChannelGroup=], which includes al This section specifies the OBU payload of OBU_IA_Mix_Presentation. -The metadata in mix_presentation_obu() specifies how to render, process and mix one or more [=Audio Element=]s, with details provided in [[#processing-mixpresentation]]. +The metadata in mix_presentation_obu specifies how to render, process and mix one or more [=Audio Element=]s, with details provided in [[#processing-mixpresentation]]. An [=IA Sequence=] MAY have one or more [=Mix Presentation=]s specified. The IA parser SHALL select the appropriate [=Mix Presentation=] to process according to the rules specified in [[#processing-mixpresentation-selection]]. @@ -1066,7 +1066,7 @@ A [=Mix Presentation=] MAY contain one or more sub-mixes. Common use cases MAY s Syntax ``` -class mix_presentation_obu() { +class MixPresentationObu() { leb128() mix_presentation_id; leb128() count_label; for (i = 0; i < count_label; i++) { @@ -1113,7 +1113,7 @@ class mix_presentation_obu() { num_audio_elements specifies the number of [=Audio Element=]s that are used in this [=Mix Presentation=] to generate the final output audio signal for playback. It SHALL NOT be set to 0. -audio_element_id indicates the identifier for an [=Audio Element=] which this [=Mix Presentation=] refers to. +audio_element_id indicates the identifier for an [=Audio Element=] which this [=Mix Presentation=] refers to. mix_presentation_element_annotations() provides informational metadata that the playback system MAY use to display information to the user. It is not used in the rendering or mixing process to generate the final output audio signal. @@ -1377,7 +1377,7 @@ The metadata specified in this OBU defines the parameter values for an algorithm Syntax ``` -class parameter_block_obu() { +class ParameterBlockObu() { leb128() parameter_id; (param_definition_type, param_definition_mode, duration, num_subblocks, constant_subblock_duration, subblock_duration) = get_param_definition(parameter_id); @@ -1426,13 +1426,13 @@ If [=param_definition_mode=] = 0, this function additionally gets the following When it gets an unknown [=param_definition_type=], parsers compliant with this version of the specification SHOULD ignore the [=Parameter Block OBU=]. -duration specifies the duration for which this parameter block is valid and applicable. It SHALL NOT be set to 0. +duration specifies the duration for which this parameter block is valid and applicable. It SHALL NOT be set to 0. -constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of constant_subblock_duration SHALL be set to 0. +constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of constant_subblock_duration SHALL be set to 0. -num_subblocks specifies the number of different sets of parameter values specified in this parameter block, where each set describes a different subblock of the timeline, contiguously. When [=constant_subblock_duration=] != 0, [=num_subblocks=] is implicitly calculated as [=num_subblocks=] = ceil([=duration=] / [=constant_subblock_duration=]). +num_subblocks specifies the number of different sets of parameter values specified in this parameter block, where each set describes a different subblock of the timeline, contiguously. When [=constant_subblock_duration=] != 0, [=num_subblocks=] is implicitly calculated as [=num_subblocks=] = ceil([=duration=] / [=constant_subblock_duration=]). -subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0. +subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0. The values of [=duration=], [=constant_subblock_duration=], and [=subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition. @@ -1572,7 +1572,7 @@ This section specifies the OBU payloads of OBU_IA_Audio_Frame and OBU_IA_Audio_F Syntax ``` -class audio_frame_obu(audio_substream_id_in_bitstream) { +class AudioFrameObu(audio_substream_id_in_bitstream) { if (audio_substream_id_in_bitstream) { leb128() explicit_audio_substream_id; } @@ -1602,7 +1602,7 @@ This section specifies the OBU payload of OBU_IA_Temporal_Delimiter. Syntax ``` -class temporal_delimiter_obu() { +class TemporalDelimiterObu() { } ``` @@ -2280,11 +2280,11 @@ Finally, the output mix gain SHALL be applied using the value specified in [=out ## Animated Parameters ## {#processing-animated-params} -This section describes how a set of parameter values is animated over a subblock in a parameter_block_obu() and applied to the corresponding audio samples, using the information provided in AnimatedParameterData(). +This section describes how a set of parameter values is animated over a subblock in a parameter_block_obu and applied to the corresponding audio samples, using the information provided in AnimatedParameterData(). If [=animation_type=] is equal to STEP, the parameter value provided by [=start_point_value=] SHOULD be applied to all time steps in the subblock. -If [=animation_type=] is equal to LINEAR or BEZIER, the information provided in AnimatedParameterData() describes how the set of parameter values is animated as a Bezier curve. Let T be the [=subblock_duration=] defined in the parameter_block_obu() and P0, P1 and P2 be 2D coordinates defined as +If [=animation_type=] is equal to LINEAR or BEZIER, the information provided in AnimatedParameterData() describes how the set of parameter values is animated as a Bezier curve. Let T be the [=subblock_duration=] defined in the parameter_block_obu and P0, P1 and P2 be 2D coordinates defined as ``` P0 = (t0, start_point_value), @@ -2687,7 +2687,7 @@ The pow() function returns the value of x to the power of y. ## Annex A: ID Linking Scheme (Informative) ## {#Annex_A} -The figure below shows the linking scheme among IDs in the obu_header or OBU payload. +The figure below shows the linking scheme among IDs in the obu_header or OBU payload.
ID Linking Scheme
From d32796083a1a8a661d5f08ef69458e0399adaecf Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Tue, 15 Aug 2023 15:16:12 -0700 Subject: [PATCH 02/11] Fix #655: swap trim signaling order in semantics --- index.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/index.bs b/index.bs index e377d06c..9a4e7399 100644 --- a/index.bs +++ b/index.bs @@ -511,10 +511,10 @@ NOTE: A future version of the specification may use this flag to specify an exte obu_size indicates the size in bytes of the OBU immediately following the obu_size field of the OBU. An OBU MAY have extra bytes after consuming all the bytes per the OBU syntax definition. Parsers compliant with this version of the specification SHOULD ignore the extra bytes. -num_samples_to_trim_at_start indicates the number of samples that need to be trimmed from the start of the samples in this [=Audio Frame OBU=]. - num_samples_to_trim_at_end indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=]. +num_samples_to_trim_at_start indicates the number of samples that need to be trimmed from the start of the samples in this [=Audio Frame OBU=]. + extension_header_size indicates the size in bytes of the extension header immediately following this field. extension_header_bytes indicates the byte representations of the syntaxes of the extension header. From ef7f163a1fb9805c1c3a8e6230861f9d9406e9a6 Mon Sep 17 00:00:00 2001 From: sunghee-hwang <97494915+sunghee-hwang@users.noreply.github.com> Date: Wed, 16 Aug 2023 17:31:22 +0900 Subject: [PATCH 03/11] Fix #667, 3D audio signal --- index.bs | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/index.bs b/index.bs index e377d06c..50d2ff65 100644 --- a/index.bs +++ b/index.bs @@ -325,7 +325,7 @@ Based on the model, this specification defines the immersive audio model and for
IAMF Architecture
-For a given input 3D audio, +For a given input [=3D audio signal=], - A Pre-Processor generates [=ChannelGroup=](s), [=Descriptors=] and [=Parameter Substream=](s). - A Codec Enc generates coded [=Audio Substream=](s). - An OBU Packetizer generates an [=IA Sequence=] from the coded [=Audio Substream=](s) and [=Descriptors=] and [=Parameter Substream=](s). @@ -356,12 +356,12 @@ The metadata in the [=Descriptors=] and [=IA Data=] are packetized into individu - IA Sequence Header OBU indicates the start of a full [=IA Sequence=] description and contains information related to profiles. - Codec Config OBU provides information to set up a decoder for a coded [=Audio Substream=]. - Audio Element OBU provides information to combine one or more [=Audio Substream=]s to reconstruct an [=Audio Element=]. -- Mix Presentation OBU provides information to render and mix one or more [=Audio Element=]s to generate the final 3D audio output. +- Mix Presentation OBU provides information to render and mix one or more [=Audio Element=]s to generate the final [=Immersive Audio=] output. - Multiple [=Mix Presentation=]s can be defined as alternatives to each other within the same [=IA Sequence=]. Furthermore, the choice of which [=Mix Presentation=] to use at playback is left to the user. For example, multi-language support is implemented by defining different [=Mix Presentation=]s, where the first mix describes the use of the [=Audio Element=] with English dialogue, and the second mix describes the use of the [=Audio Element=] with French dialogue. #### IA Data #### {#iadata} -IA Data contains the time-varying data that is required in the generation of the final 3D audio output. +IA Data contains the time-varying data that is required in the generation of the final [=Immersive Audio=] output. - Audio Frame OBU provides the coded audio frame for an [=Audio Substream=]. Each frame has an implied start timestamp and an explicitly defined duration. A coded [=Audio Substream=] is represented as a sequence of [=Audio Frame OBU=]s with the same identifier, in time order. - Parameter Block OBU provides the parameter values in a block for a [=Parameter Substream=]. Each block has an implied start timestamp and an explicitly defined duration. A time-varying [=Parameter Substream=] is represented as a sequence of parameter values in [=Parameter Block OBU=]s with the same identifier, in time order. @@ -948,11 +948,11 @@ Ltb: Left Top Back, Rtb: Right Top Back, LFE: Low-Frequency Effects NOTE: The Ltr and Rtr of 5.1.4ch down-mixed from 7.1.4ch is within the range of Ltb and Rtb of 7.1.4ch, in terms of their positions according to [[!ITU2051-3]]. -For a given input audio with [=audio_element_type=] = CHANNEL_BASED, if the input audio has height channels (e.g., 7.1.4ch or 5.1.2ch), it is RECOMMENDED to use channel layouts with height channels (i.e., higher than or equal to 3.1.2ch) for all [=loudspeaker_layouts=]. +For a given input [=3D audio signal=] with [=audio_element_type=] = CHANNEL_BASED, if the input [=3D audio signal=] has height channels (e.g., 7.1.4ch or 5.1.2ch), it is RECOMMENDED to use channel layouts with height channels (i.e., higher than or equal to 3.1.2ch) for all [=loudspeaker_layouts=]. - Examples for RECOMMENDED list of channel layouts: 3.1.2ch/5.1.2ch, 3.1.2ch/5.1.2ch/7.1.4ch, 5.1.2ch/7.1.4ch, etc. - Examples for NOT RECOMMENDED list of channel layouts: 2ch/3.1.2ch/5.1.2ch, 2ch/3.1.2ch/5.1.2ch/7.1.4ch, 2ch/5.1.2ch/7.1.4ch, 2ch/7.1.4ch, etc. -NOTE: This specification allows down-mixing mechanisms (e.g., as specified in [[#iamfgeneration-scalablechannelaudio-downmixmechanism]]) to drop the height channel if the output layout has no height channels. An example is down-mixing from 7.1.4ch to Mono, Stereo, 5.1ch or 7.1ch. Therefore, given an input audio with height channels, an encoder may generate a set of scalable audio channel groups with layouts that do not have height channels. +NOTE: This specification allows down-mixing mechanisms (e.g., as specified in [[#iamfgeneration-scalablechannelaudio-downmixmechanism]]) to drop the height channel if the output layout has no height channels. An example is down-mixing from 7.1.4ch to Mono, Stereo, 5.1ch or 7.1ch. Therefore, given an input [=3D audio signal=] with height channels, an encoder may generate a set of scalable audio channel groups with layouts that do not have height channels. output_gain_is_present_flag indicates if the output_gain information fields for the [=ChannelGroup=] are present. - 0: No output_gain information fields for the [=ChannelGroup=] are present. @@ -1691,7 +1691,7 @@ class decoder_config(ipcm) { sample_size complies with [=PCM_sample_size=] specified in [[!MP4-PCM]]. In other words, it SHALL take a value from the set {16, 24, 32}. -sample_rate indicates the sample rate of the input audio in Hz. It SHALL take a value from the set {44.1k, 16k, 32k, 48k, 96k}. +sample_rate indicates the sample rate of the input [=3D audio signal=] in Hz. It SHALL take a value from the set {44.1k, 16k, 32k, 48k, 96k}. The format of [=audio_frame()=] is only one single mono or stereo PCM audio frame. - If [=audio_frame()=] contains a stereo PCM audio frame, the ith audio sample of the left channel is followed by the ith audio sample of the right channel, and then the (i+1)th audio sample of the left channel is followed by the (i+1)th audio sample of the right channel, where i = 1, 2, ..., [=num_samples_per_frame=] - 1. @@ -2406,7 +2406,7 @@ In the matrices above, p1 = 0.707. Implementations MAY use a limiter defined in # IAMF Generation Process (Informative) # {#iamfgeneration} -This section provides a guideline for encoding an [=IA Sequence=] that conforms to [[#obu-syntax]], given a set of input audio and user inputs. +This section provides a guideline for encoding an [=IA Sequence=] that conforms to [[#obu-syntax]], given a set of input [=3D audio signal=] and user inputs. The RECOMMENDED input audio formats for IA encoding are as follows: - Ambisonics audio: a full-order Ambisonics signal with ACN channel ordering and SN3D normalization @@ -2424,7 +2424,7 @@ Example user inputs include: The figure below shows an example architecture for an IA encoder that generates an [=IA Sequence=] with one [=Audio Element=]. The IA encoder is composed of the Pre-processor, Codec encoder, and OBU packetizer modules. -- Pre-processor outputs one or more [=ChannelGroup=]s, [=Descriptors=] and optional [=Parameter Substream=]s based on the input audio and user inputs. +- Pre-processor outputs one or more [=ChannelGroup=]s, [=Descriptors=] and optional [=Parameter Substream=]s based on the input [=3D audio signal=] and user inputs. - It outputs one single [=ChannelGroup=] for a scene-based [=Audio Element=]. - It outputs one or more [=ChannelGroup=]s for a channel-based [=Audio Element=]. - It outputs [=Descriptors=] which are composed of one [=IA Sequence Header OBU=], one [=Codec Config OBU=], one [=Audio Element OBU=], and one or more [=Mix Presentation OBU=]s. @@ -2727,16 +2727,16 @@ The figure below shows a block diagram for the down-mix parameter and loudness m
IA Down-mix Parameter and Loudness
-For a given channel-based input audio (e.g., 7.1.4ch) and a given list of channel layouts based on the input audio, -- Down-mix parameter generator SHALL generate 5 down-mix parameters (α(k), β(k), γ(k), δ(k) and w(k), where k is the frame index) by analyzing the input audio and referring to [[AI-CAD-Mixing]]. +For a given channel-based input [=3D audio signal=] (e.g., 7.1.4ch) and a given list of channel layouts based on the input [=3D audio signal=], +- Down-mix parameter generator SHALL generate 5 down-mix parameters (α(k), β(k), γ(k), δ(k) and w(k), where k is the frame index) by analyzing the input [=3D audio signal=] and referring to [[AI-CAD-Mixing]]. - It is composed of an Audio Scene Classification module and a Height Energy Quantification module as depicted in Figure 11-2. - - Audio Scene Classification module generates 4 parameters (α(k), β(k), γ(k), δ(k)) by classifying audio scenes of the input audio in three modes. + - Audio Scene Classification module generates 4 parameters (α(k), β(k), γ(k), δ(k)) by classifying audio scenes of the input [=3D audio signal=] in three modes. - Default scene: Neither Dialog nor Effect - Dialog scene: Center-channel oriented and clear dialog/voice sounds - Effect scene: Directional and spatially moving sounds. - - The Height Energy Quantification module generates a surround-to-height mixing parameter (w(k)) which is decided according to the relative energy difference between the top and surround channels of the input audio. + - The Height Energy Quantification module generates a surround-to-height mixing parameter (w(k)) which is decided according to the relative energy difference between the top and surround channels of the input [=3D audio signal=]. - If the energy of top channels is bigger than that of surround ones, then w_idx_offset(k) is set to 1. Otherwise, it is set to -1. And, w(k) is calculated based on w_idx_offset(k) and conforms to [[#processing-scalablechannelaudio]]. -- Down-mixer generates [=down-mixed audio=] from the input audio according to the list of channel layouts and the down-mix parameters, and outputs [=down-mixed audio=] for each channel layout to the Loudness module. +- Down-mixer generates [=down-mixed audio=] from the input [=3D audio signal=] according to the list of channel layouts and the down-mix parameters, and outputs [=down-mixed audio=] for each channel layout to the Loudness module. - It is not depicted in the figure but Down-mixer further generates [=dmixp_mode=] and [=recon_gain=] for each frame to be passed to the OBU packetizer. - Loudness module measures the loudness level ([=LKFS=]) of each [=down-mixed audio=] based on [[ITU1770-4]], and passes them to OBU packetizer. @@ -2744,9 +2744,9 @@ For a given channel-based input audio (e.g., 7.1.4ch) and a given list of channe This section specifies the down-mixing mechanism to generate down-mixed audio for scalable channel audio. -For a given channel-based input audio that conforms to [=loudspeaker_layout=], the surround and top channels (if any) are separately down-mixed and especially step by step until to get a target channels. +For a given channel-based input [=3D audio signal=] that conforms to [=loudspeaker_layout=], the surround and top channels (if any) are separately down-mixed and especially step by step until to get a target channels. -Implementers MAY use another method to get the [=down-mixed audio=] from the given input audio, but the [=down-mixed audio=] SHALL comply with that by this section. +Implementers MAY use another method to get the [=down-mixed audio=] from the given input [=3D audio signal=], but the [=down-mixed audio=] SHALL comply with that by this section. Therefore, a down-mixer based on the down-mix mechanism is a combination of the following surround down-mixer(s) and top down-mixer(s) as depicted in the figure below. - Surround down-mixers @@ -2771,7 +2771,7 @@ For example, to get [=down-mixed audio=] 3.1.2ch from 7.1.4ch: This section describes the generation rule for channel layouts for scalable channel audio. -For a given channel layout (CL #n) of channel-based input audio, any list of CLs ({CL #i: i = 1, 2, ..., n}) for scalable channel audio SHALL conform with the following rules: +For a given channel layout (CL #n) of channel-based input [=3D audio signal=], any list of CLs ({CL #i: i = 1, 2, ..., n}) for scalable channel audio SHALL conform with the following rules: - Si ≤ Si+1 and Wi ≤ Wi+1 and Ti ≤ Ti+1 except Si = Si+1, Wi = Wi+1 and Ti = Ti+1 for i = n-1, n-2, …, 1. Where the ith channel layout CL #i = Si.Wi.Ti. - CL #i is one of [=loudspeaker_layout=]s supported in this version of the specification. @@ -2826,9 +2826,9 @@ Recon_Gain for D_Rtb4: This section describes the generation rule for [=ChannelGroup=]. -For a given channel-based input audio and the list of CLs ({CL #i: i = 1, 2, ..., n}), the CG Generation module outputs the transformed audio (i.e., ChannelGroups) which SHALL conform to the following rules: -- It consists of C number of channels and is structured to n number of [=ChannelGroup=]s, where C is the number of channels for the input audio. -- [=ChannelGroup=] #1 (as called BCG): This [=ChannelGroup=] is the [=down-mixed audio=] itself for CL #1 generated from the input audio. It contains a C1 number of channels. +For a given channel-based input [=3D audio signal=] and the list of CLs ({CL #i: i = 1, 2, ..., n}), the CG Generation module outputs the transformed audio (i.e., ChannelGroups) which SHALL conform to the following rules: +- It consists of C number of channels and is structured to n number of [=ChannelGroup=]s, where C is the number of channels for the input [=3D audio signal=]. +- [=ChannelGroup=] #1 (as called BCG): This [=ChannelGroup=] is the [=down-mixed audio=] itself for CL #1 generated from the input [=3D audio signal=]. It contains a C1 number of channels. - [=ChannelGroup=] #i (as called DCG, i = 2, 3, …, n): This [=ChannelGroup=] contains (Ci – Ci-1) number of channels. (Ci – Ci-1) channel(s) consists of as follows: - (Si – Si-1) surround channel(s) if Si > Si-1 . When S_set = { x | Si-1 < x ≤ Si and x is an integer}, - If 2 is an element of S_set, the L2 channel is contained in this CG #i. From 48353655e7b392c74807d223367ba7d818980390 Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Wed, 16 Aug 2023 10:45:22 -0700 Subject: [PATCH 04/11] Fix #657: clarify that trimming is optional --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index e377d06c..652b7ea3 100644 --- a/index.bs +++ b/index.bs @@ -488,7 +488,7 @@ It SHALL always be set to 0 for the following [=obu_type=] values: If a decoder encounters an OBU with [=obu_redundant_copy=] = 1, and it has also received the previous non-redundant OBU, it MAY ignore the redundant OBU. If the decoder has not received the previous non-redundant OBU, it SHALL treat the redundant copy as a non-redundant OBU and process the OBU accordingly. -obu_trimming_status_flag indicates whether this OBU has audio samples to be trimmed. It SHALL be set only when [=obu_type=] is set to OBU_IA_Audio_Frame or OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17. +obu_trimming_status_flag indicates whether this OBU has audio samples to be trimmed. It SHALL be set to 0 or 1 if the [=obu_type=] is set to OBU_IA_Audio_Frame or OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17. Otherwise, it SHALL be set to 0. For a given coded [=Audio Substream=], - If an [=Audio Frame OBU=] has its [=num_samples_to_trim_at_start=] field set to a non-zero value N, the decoder SHALL discard the first N audio samples. From 328e483f78d23a8431b1e769178b2e0227e22339 Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Wed, 16 Aug 2023 11:40:14 -0700 Subject: [PATCH 05/11] Fix #662: improve terminology --- index.bs | 167 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 84 insertions(+), 83 deletions(-) diff --git a/index.bs b/index.bs index e377d06c..8cd618fb 100644 --- a/index.bs +++ b/index.bs @@ -259,9 +259,7 @@ url: https://www.iso.org/standard/77752.html#; spec: MP4-PCM; type: property; # Introduction # {#introduction} -This specification defines an immersive audio model and formats (IAMF) to provide an immersive audio experience to end-users. -- The term Immersive Audio (IA) means the combination of [=3D audio signal=]s recreating a sound experience close to that of a natural environment. -- The term 3D audio signal means a representation of sound that incorporates additional information beyond traditional stereo or surround sound formats such as Ambisonics (Scene-based), Object-based audio and Channel-based audio (e.g., 3.1.2ch or 7.1.4ch). +This specification defines an immersive audio model and formats (IAMF) to provide an [=Immersive Audio=] experience to end-users. IAMF is used to provide [=Immersive Audio=] content for presentation on a wide range of devices in both streaming and offline applications. These applications include internet audio streaming, multicasting/broadcasting services, file download, gaming, communication, virtual and augmented reality, and others. In these applications, audio may be played back on a wide range of devices, e.g., headphones, mobile phones, tablets, TVs, sound bars, home theater systems, and big screens. @@ -272,21 +270,21 @@ Here are some typical IAMF use cases and examples of how to instantiate the mode Example 1: UC1 with [=3D audio signal=] = 3.1.2ch. - Audio Substream: The left (L) and right (R) channels are coded as one audio stream, the left top front (Ltf) and right top front (Rtf) channels as one audio stream, the Center channel as one audio stream, and the low-frequency effects (LFE) channel as one audio stream. -- Audio Element (3.1.2ch): Consists of 4 Audio Substreams which are grouped into one [=ChannelGroup=]. +- Audio Element (3.1.2ch): Consists of 4 Audio Substreams which are grouped into one [=Channel Group=]. - Mix Presentation: Provides rendering algorithms for rendering the Audio Element to popular loudspeaker layouts and headphones, and the loudness information of the [=3D audio signal=]. Example 2: UC2 with two [=3D audio signal=]s = 5.1.2ch and Stereo. - Audio Substream: The L and R channels are coded as one audio stream, the left surround (Ls) and right surround (Rs) channels as one audio stream, the Ltf and Rtf channels as one audio stream, the Center channel as one audio stream, and the LFE channel as one audio stream. -- Audio Element 1 (5.1.2ch): Consists of 5 Audio Substreams which are grouped into one [=ChannelGroup=]. -- Audio Element 2 (Stereo): Consists of 1 Audio Substream which is grouped into one [=ChannelGroup=]. +- Audio Element 1 (5.1.2ch): Consists of 5 Audio Substreams which are grouped into one [=Channel Group=]. +- Audio Element 2 (Stereo): Consists of 1 Audio Substream which is grouped into one [=Channel Group=]. - Parameter Substream 1-1: Contains mixing parameter values that are applied to Audio Element 1 by considering the home environment. - Parameter Substream 1-2: Contains mixing parameter values that are applied to Audio Element 2 by considering the home environment. - Mix Presentation: Provides rendering algorithms for rendering Audio Elements 1 & 2 to popular loudspeaker layouts, mixing information based on Parameter Substreams 1-1 & 1-2, and loudness information of the [=Rendered Mix Presentation=]. Example 3: UC3 with two [=3D audio signal=]s = first order Ambisonics (FOA) and Non-diegetic Stereo. - Audio Substream: The L and R channels are coded as one audio stream and each channel of the FOA signal as one audio stream. -- Audio Element 1 (FOA): Consists of 4 Audio Substreams which are grouped into one [=ChannelGroup=]. -- Audio Element 2 (Non-diegetic Stereo): Consists of 1 Audio Substream which is grouped into one [=ChannelGroup=]. +- Audio Element 1 (FOA): Consists of 4 Audio Substreams which are grouped into one [=Channel Group=]. +- Audio Element 2 (Non-diegetic Stereo): Consists of 1 Audio Substream which is grouped into one [=Channel Group=]. - Parameter Substream 1-1: Contains mixing parameter values that are applied to Audio Element 1 by considering the mobile environment. - Parameter Substream 1-2: Contains mixing parameter values that are applied to Audio Element 2 by considering the mobile environment. - Mix Presentation: Provides rendering algorithms for rendering Audio Elements 1 & 2 to popular loudspeaker layouts and headphones, mixing information based on Parameter Substreams 1-1 & 1-2, and loudness information of the [=Rendered Mix Presentation=]. @@ -294,6 +292,8 @@ Example 3: UC3 with two [=3D audio signal=]s = first order Ambisonics (FOA) and # Immersive Audio Model # {#iamodel} +## Model Overview ## {#model-overview} + This specification defines a model for representing [=Immersive Audio=] contents based on [=Audio Substream=]s contributing to [=Audio Element=]s meant to be rendered and mixed to form one or more presentations as depicted in the figure below.
@@ -301,20 +301,21 @@ This specification defines a model for representing [=Immersive Audio=] contents The model comprises a number of coded [=Audio Substream=]s and the metadata that describes how to decode, render and mix the [=Audio Substream=]s for playback. The model itself is codec-agnostic; any supported audio codec may be used to code the [=Audio Substream=]s. -The model includes one or more [=Audio Element=]s, each of which consists of one or more [=Audio Substream=]s. The [=Audio Substream=]s that make up an [=Audio Element=] are grouped into one or more [=ChannelGroup=]s. The model further includes [=Mix Presentation=]s and [=Parameter Substream=]s. +The model includes one or more [=Audio Element=]s, each of which consists of one or more [=Audio Substream=]s. The [=Audio Substream=]s that make up an [=Audio Element=] are grouped into one or more [=Channel Group=]s. The model further includes [=Mix Presentation=]s and [=Parameter Substream=]s. + +The term 3D audio signal means a representation of sound that incorporates additional information beyond traditional stereo or surround sound formats such as Ambisonics (Scene-based), Object-based audio and Channel-based audio (e.g., 3.1.2ch or 7.1.4ch). -## Terminology ## {#terminology} +The term Immersive Audio (IA) means the combination of [=3D audio signal=]s recreating a sound experience close to that of a natural environment. The term Audio Substream means a sequence of audio samples, which may be encoded with any compatible audio codec. -The term Audio Element means a [=3D audio signal=], and is constructed from one or more [=Audio Substream=]s and the metadata describing them. The [=Audio Substream=]s associated with one [=Audio Element=] use the same audio codec. +The term Channel Group means a set of [=Audio Substream=](s) which is(are) able to provide a spatial resolution of audio contents by itself or which is(are) able to provide an enhanced spatial resolution of audio contents by combining with the preceding [=Channel Group=]s. -The term ChannelGroup means a set of [=Audio Substream=](s) which is(are) able to provide a spatial resolution of audio contents by itself or which is(are) able to provide an enhanced spatial resolution of audio contents by combining with the preceding [=ChannelGroup=]s. +The term Audio Element means a [=3D audio signal=], and is constructed from one or more [=Audio Substream=]s (grouped into one or more [=Channel Groups=]) and the metadata describing them. The [=Audio Substream=]s associated with one [=Audio Element=] use the same audio codec. -The term Parameter Substream means a sequence of parameter values that are associated with the algorithms used for reconstructing, rendering, and mixing. It is applied to its associated [=Audio Element=] or [=Mix Presentation=]. -- [=Parameter Substream=]s may change their values over time and may further be animated; for example, any changes in values may be smoothed over some time duration. As such, they may be viewed as a 1D signal with different metadata specified for different time durations. +The term Mix Presentation means a series of processes to present [=Immersive Audio=] contents to end-users by using [=Audio Element=](s). It contains metadata that describes how the [=Audio Element=](s) is(are) rendered and mixed together for playback through physical loudspeakers or headphones, as well as loudness information. -The term Mix Presentation means a series of processes to present [=Immersive Audio=] contents to end-users by using [=Audio Element=](s). It contains metadata that describes how the [=Audio Element=](s) is(are) rendered and mixed together for playback through physical loudspeakers or headphones, and loudness information. +The term Parameter Substream means a sequence of parameter values that are associated with the algorithms used for reconstructing, rendering, and mixing. It is applied to its associated [=Audio Element=] or [=Mix Presentation=]. [=Parameter Substream=]s may change their values over time and may further be animated; for example, any changes in values may be smoothed over some time duration. As such, they may be viewed as a 1D signal with different metadata specified for different time durations. The term Rendered Mix Presentation means a [=3D audio signal=] after the [=Audio Element=](s) defined in a [=Mix Presentation=] is(are) rendered and mixed together for playback through physical loudspeakers or headphones. @@ -326,15 +327,15 @@ Based on the model, this specification defines the immersive audio model and for
IAMF Architecture
For a given input 3D audio, -- A Pre-Processor generates [=ChannelGroup=](s), [=Descriptors=] and [=Parameter Substream=](s). +- A Pre-Processor generates [=Channel Group=](s), [=Descriptors=] and [=Parameter Substream=](s). - A Codec Enc generates coded [=Audio Substream=](s). - An OBU Packetizer generates an [=IA Sequence=] from the coded [=Audio Substream=](s) and [=Descriptors=] and [=Parameter Substream=](s). - A File Packager (ISOBMFF Encapsulation) generates an IAMF File by encapsulating the [=IA Sequence=] into [[!ISOBMFF]] track(s). - A File Parser (ISOBMFF Parser) reconstructs the [=IA Sequence=] by decapsulating the IAMF File. - An OBU Parser outputs the coded [=Audio Substream=](s) and the [=Parameter Substream=](s). -- A Codec Dec outputs decoded [=ChannelGroup=](s) after decoding of the coded [=Audio Substream=](s). -- A Post-Processor outputs an [=Immersive Audio=] by using the [=ChannelGroup=](s), the [=Descriptors=] and the [=Parameter Substream=](s). -- Pre-Processor, [=ChannelGroup=](s), Codec Enc and OBU Packetizer are defined in [[#iamfgeneration]]. +- A Codec Dec outputs decoded [=Channel Group=](s) after decoding of the coded [=Audio Substream=](s). +- A Post-Processor outputs an [=Immersive Audio=] by using the [=Channel Group=](s), the [=Descriptors=] and the [=Parameter Substream=](s). +- Pre-Processor, [=Channel Group=](s), Codec Enc and OBU Packetizer are defined in [[#iamfgeneration]]. - [=IA Sequence=] is defined in [[#iasequence]]. - ISOBMFF Encapsulation, IAMF file (ISOBMFF file), and ISOBMFF Parser are defined in [[#isobmff]]. - OBU Parser, Codec Dec, and Post-Processor are defined in [[#processing]]. @@ -695,11 +696,11 @@ audio_element_type: The type of audio representation. audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to. -Let a particular [=ChannelGroup=]'s [=Audio Substream=]s be indexed as [c, n_c], where a [=ChannelGroup=] generation rule is described in [[#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule]] and -- [=c=] = [1, ..., C] is the [=ChannelGroup=] index and C is the number of [=ChannelGroup=]s. -- [=n_c=] = [1, ..., N_c] is the [=Audio Substream=] index in the c-th [=ChannelGroup=] and N_c is the number of [=Audio Substream=]s in the c-th [=ChannelGroup=]. +Let a particular [=Channel Group=]'s [=Audio Substream=]s be indexed as [c, n_c], where a [=Channel Group=] generation rule is described in [[#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule]] and +- [=c=] = [1, ..., C] is the [=Channel Group=] index and C is the number of [=Channel Group=]s. +- [=n_c=] = [1, ..., N_c] is the [=Audio Substream=] index in the c-th [=Channel Group=] and N_c is the number of [=Audio Substream=]s in the c-th [=Channel Group=]. -Then, the i-th [=audio_substream_id=] maps to a [=ChannelGroup=]'s [=Audio Substream=]s as follows, where i is the index of the array: +Then, the i-th [=audio_substream_id=] maps to a [=Channel Group=]'s [=Audio Substream=]s as follows, where i is the index of the array: ``` [ @@ -710,7 +711,7 @@ Then, the i-th [=audio_substream_id=] maps to a [=ChannelGroup=]'s [=Audio Subst ] ``` -The order of the [=Audio Substream=]s in each [=ChannelGroup=] (i.e., the semantics of n_c) is specified in [[#syntax-scalable-channel-layout-config]]. +The order of the [=Audio Substream=]s in each [=Channel Group=] (i.e., the semantics of n_c) is specified in [[#syntax-scalable-channel-layout-config]]. num_parameters specifies the number of [=Parameter Substream=]s that are used by the algorithms specified in this [=Audio Element=]. @@ -887,10 +888,10 @@ class channel_audio_layer_config(i) { } ``` -When an [=Audio Element=] is composed of G(r) number of [=Audio Substream=]s, its scalable channel audio representation is layered into [=num_layers=] = r number of [=ChannelGroup=]s. +When an [=Audio Element=] is composed of G(r) number of [=Audio Substream=]s, its scalable channel audio representation is layered into [=num_layers=] = r number of [=Channel Group=]s. -- The order of the [=ChannelGroup=]s in each [=Temporal Unit=] SHALL be same as the order of channel_audio_layer_config()s in scalable_channel_layout_config(). -- The q-th [=ChannelGroup=] consists of G(q) - G(q-1) number of [=Audio Substream=]s, where q = 1, 2, ..., r and G(0) = 0. +- The order of the [=Channel Group=]s in each [=Temporal Unit=] SHALL be same as the order of channel_audio_layer_config()s in scalable_channel_layout_config(). +- The q-th [=Channel Group=] consists of G(q) - G(q-1) number of [=Audio Substream=]s, where q = 1, 2, ..., r and G(0) = 0. - Let the term "Audio Frames" mean the set of all [=Audio Frame OBU=]s (for this [=Audio Element=]) that have the same start timestamp. All Audio Frames in an [=IA Sequence=] SHALL have the same number of [=Audio Frame OBU=]s. - [=Parameter Block OBU=]s MAY be associated with Audio Frames. @@ -898,20 +899,20 @@ When an [=Audio Element=] is composed of G(r) number of [=Audio Substream=]s, it
Immersive Audio Sequence with scalable channel audio (before OBU packing). See [[#standalone]] for related details on OBU ordering within an IA Sequence.
-Each [=ChannelGroup=] (or scalable audio channel layer) is associated with a different [=loudspeaker_layout=]. The IA decoder SHALL select one of the layers according to the following rules, in order: +Each [=Channel Group=] (or scalable audio channel layer) is associated with a different [=loudspeaker_layout=]. The IA decoder SHALL select one of the layers according to the following rules, in order: - The IA decoder SHOULD first attempt to select the layer with a [=loudspeaker_layout=] that matches the physical playback layout. - If there is no match, the IA decoder SHOULD select the layer with the closest [=loudspeaker_layout=] to the physical layout and then apply up- or down-mixing appropriately, after decoding and reconstruction of the channel audio. Sections [[#iamfgeneration-scalablechannelaudio-downmixmechanism]] and [[#processing-downmixmatrix]] provide examples of dynamic and static down-mixing matrices for some common layouts that MAY be used. Semantics -num_layers indicates the number of [=ChannelGroup=]s for scalable channel audio. It SHALL NOT be set to zero and its maximum value SHALL be 6. +num_layers indicates the number of [=Channel Group=]s for scalable channel audio. It SHALL NOT be set to zero and its maximum value SHALL be 6. - If [=loudspeaker_layout=] is set to Binaural, this field SHALL be set to 1. -channel_audio_layer_config() provides the i-th [=ChannelGroup=]'s configuration, where i is the layer index provided as input argument to this class. +channel_audio_layer_config() provides the i-th [=Channel Group=]'s configuration, where i is the layer index provided as input argument to this class. -loudspeaker_layout indicates the channel layout to be reconstructed from the precedent [=ChannelGroup=]s and current [=ChannelGroup=]. When a reserved value for [=loudspeaker_layout=] is used, parsers compliant with this version of the specification SHOULD skip the [=channel_audio_layer_config()=] for that layer and all subsequent ones, if any. +loudspeaker_layout indicates the channel layout to be reconstructed from the precedent [=Channel Group=]s and current [=Channel Group=]. When a reserved value for [=loudspeaker_layout=] is used, parsers compliant with this version of the specification SHOULD skip the [=channel_audio_layer_config()=] for that layer and all subsequent ones, if any. In this version of the specification, [=loudspeaker_layout=] indicates one of the 10 channel layouts listed below, where @@ -954,23 +955,23 @@ For a given input audio with [=audio_element_type=] = CHANNEL_BASED, if the inpu NOTE: This specification allows down-mixing mechanisms (e.g., as specified in [[#iamfgeneration-scalablechannelaudio-downmixmechanism]]) to drop the height channel if the output layout has no height channels. An example is down-mixing from 7.1.4ch to Mono, Stereo, 5.1ch or 7.1ch. Therefore, given an input audio with height channels, an encoder may generate a set of scalable audio channel groups with layouts that do not have height channels. -output_gain_is_present_flag indicates if the output_gain information fields for the [=ChannelGroup=] are present. -- 0: No output_gain information fields for the [=ChannelGroup=] are present. -- 1: output_gain information fields for the [=ChannelGroup=] are present. In this case, [=output_gain_flags=] and [=output_gain=] fields are present. +output_gain_is_present_flag indicates if the output_gain information fields for the [=Channel Group=] are present. +- 0: No output_gain information fields for the [=Channel Group=] are present. +- 1: output_gain information fields for the [=Channel Group=] are present. In this case, [=output_gain_flags=] and [=output_gain=] fields are present. -recon_gain_is_present_flag indicates if the recon_gain information fields for the [=ChannelGroup=] are present in [=recon_gain_info_parameter_data()=]. -- 0: No recon_gain information fields for the [=ChannelGroup=] are present in [=recon_gain_info_parameter_data()=]. -- 1: recon_gain information fields for the [=ChannelGroup=] are present in [=recon_gain_info_parameter_data()=]. In this case, the [=recon_gain_flags=] and [=recon_gain=] fields are present. +recon_gain_is_present_flag indicates if the recon_gain information fields for the [=Channel Group=] are present in [=recon_gain_info_parameter_data()=]. +- 0: No recon_gain information fields for the [=Channel Group=] are present in [=recon_gain_info_parameter_data()=]. +- 1: recon_gain information fields for the [=Channel Group=] are present in [=recon_gain_info_parameter_data()=]. In this case, the [=recon_gain_flags=] and [=recon_gain=] fields are present. substream_count specifies the number of [=Audio Substream=]s. The sum of all [=substream_count=]s in this OBU SHALL be the same as [=num_substreams=] in this OBU. It SHALL NOT be set to 0. coupled_substream_count specifies the number of referenced [=Audio Substream=]s, each of which is coded as coupled stereo channels. -Each pair of coupled stereo channels in the same [=ChannelGroup=] SHALL be coded in stereo mode to generate one single coded [=Audio Substream=] and each of the non-coupled channels in the same [=ChannelGroup=] SHALL be coded in mono mode to generate one single coded [=Audio Substream=]. +Each pair of coupled stereo channels in the same [=Channel Group=] SHALL be coded in stereo mode to generate one single coded [=Audio Substream=] and each of the non-coupled channels in the same [=Channel Group=] SHALL be coded in mono mode to generate one single coded [=Audio Substream=]. - Coupled stereo channels: L/R, Ls/Rs, Lss/Rss, Lrs/Rrs, Ltf/Rtf, Ltb/Rtb - Non-coupled channels: C, LFE, L -The order of the [=Audio Substream=]s in each [=ChannelGroup=] SHALL be as follows: +The order of the [=Audio Substream=]s in each [=Channel Group=] SHALL be as follows: - Coupled substreams come first and are followed by non-coupled substreams. - Coupled substreams for surround channels come first and are followed by the coupled substreams for top channels. - Coupled substreams for front channels come first and are followed by the coupled substreams for the side, rear and back channels. @@ -1051,7 +1052,7 @@ If ambisonics_mode is equal to PROJECTION, this indicates that the Ambisonics ch demixing_matrix complies with the "Demixing Matrix" field for [=ChannelMappingFamily=] = 3 in [[!RFC8486]] except that the byte order of each of the matrix coefficients is converted to big-endian. -A scene-based [=Audio Element=] has only one [=ChannelGroup=], which includes all [=Audio Substream=]s that it refers to. The order of the [=Audio Substream=]s in the [=ChannelGroup=] SHALL conform to [[RFC8486]]. +A scene-based [=Audio Element=] has only one [=Channel Group=], which includes all [=Audio Substream=]s that it refers to. The order of the [=Audio Substream=]s in the [=Channel Group=] SHALL conform to [[RFC8486]]. ## Mix Presentation OBU Syntax and Semantics ## {#obu-mixpresentation} @@ -2041,12 +2042,12 @@ The figure below shows the decoding and reconstruction flowchart.
Scalable Channel Audio Decoding and Reconstruction Flowchart
For a given loudspeaker layout (i.e., CL #i) among the list of [=loudspeaker_layout=] in [=scalable_channel_layout_config()=], -- The OBU Parser SHALL output the [=Audio Substream=]s for [=ChannelGroup=] #1 to [=ChannelGroup=] #i and pass them to the Codec Decoder, along with [=decoder_config()=]. +- The OBU Parser SHALL output the [=Audio Substream=]s for [=Channel Group=] #1 to [=Channel Group=] #i and pass them to the Codec Decoder, along with [=decoder_config()=]. - The Codec Decoder SHALL output the decoded PCM channels. - For non-scalable audio (i.e., i = [=num_layers=] = 1), its order SHALL be converted to the loudspeaker location order for CL #1. - For scalable audio (i.e., i > 1), the output channels SHALL have the same order as the originally transmitted order of the coded channels. - For scalable audio (i.e., i > 1), the decoded PCM channels are further processed as: - - When [=output_gain_is_present_flag=](j) for [=ChannelGroup=] #j (j = 1, 2, …, i-1) is set to 1, the Gain module SHALL apply [=output_gain=](j) to all audio samples of the mixed channels in [=ChannelGroup=] #j indicated by [=output_gain_flag=](j). + - When [=output_gain_is_present_flag=](j) for [=Channel Group=] #j (j = 1, 2, …, i-1) is set to 1, the Gain module SHALL apply [=output_gain=](j) to all audio samples of the mixed channels in [=Channel Group=] #j indicated by [=output_gain_flag=](j). - The De-Mixer SHALL output de-mixed PCM channels for CL #i generated through de-mixing of the mixed channels from the Gain module by using non-mixed channels and demixing parameters for each frame. - The Recon_Gain module SHALL output smoothed PCM channels by applying [=recon_gain=] to each frame of the de-mixed channels. - The order for the Non-mixed channels and Smoothed channels SHALL be converted to the loudspeaker location order for CL #i after going through the necessary modules such as Gain, De-Mixer, Recon_Gain, etc. @@ -2055,7 +2056,7 @@ The following sections, [[#processing-scalablechannelaudio-gain]], [[#processing ### Gain ### {#processing-scalablechannelaudio-gain} -The Gain module is the mirror process of the Attenuation module (described in [[#iamfgeneration-scalablechannelaudio]]). It recovers the reduced sample values using [=output_gain=](i) when its [=output_gain_is_present_flag=](i) for [=ChannelGroup=] #i is set to 1. When its [=output_gain_is_present_flag=](i) is set to 0, then this module SHALL be bypassed for [=ChannelGroup=] #i. The value of [=output_gain=](i) for [=ChannelGroup=] #i SHALL be applied to all samples of the mixed channels in [=ChannelGroup=] #i, where a mixed channel means the channel created by mixing multiple channels of an input channel audio when generating [=down-mixed audio=] from the input channel audio (i.e., the channel audio for CL #n). +The Gain module is the mirror process of the Attenuation module (described in [[#iamfgeneration-scalablechannelaudio]]). It recovers the reduced sample values using [=output_gain=](i) when its [=output_gain_is_present_flag=](i) for [=Channel Group=] #i is set to 1. When its [=output_gain_is_present_flag=](i) is set to 0, then this module SHALL be bypassed for [=Channel Group=] #i. The value of [=output_gain=](i) for [=Channel Group=] #i SHALL be applied to all samples of the mixed channels in [=Channel Group=] #i, where a mixed channel means the channel created by mixing multiple channels of an input channel audio when generating [=down-mixed audio=] from the input channel audio (i.e., the channel audio for CL #n). To apply the gain, an implementation SHALL use the following: @@ -2424,15 +2425,15 @@ Example user inputs include: The figure below shows an example architecture for an IA encoder that generates an [=IA Sequence=] with one [=Audio Element=]. The IA encoder is composed of the Pre-processor, Codec encoder, and OBU packetizer modules. -- Pre-processor outputs one or more [=ChannelGroup=]s, [=Descriptors=] and optional [=Parameter Substream=]s based on the input audio and user inputs. - - It outputs one single [=ChannelGroup=] for a scene-based [=Audio Element=]. - - It outputs one or more [=ChannelGroup=]s for a channel-based [=Audio Element=]. +- Pre-processor outputs one or more [=Channel Group=]s, [=Descriptors=] and optional [=Parameter Substream=]s based on the input audio and user inputs. + - It outputs one single [=Channel Group=] for a scene-based [=Audio Element=]. + - It outputs one or more [=Channel Group=]s for a channel-based [=Audio Element=]. - It outputs [=Descriptors=] which are composed of one [=IA Sequence Header OBU=], one [=Codec Config OBU=], one [=Audio Element OBU=], and one or more [=Mix Presentation OBU=]s. - It may output [=Parameter Substream=]s - For a channel-based [=Audio Element=] with [=num_layers=] = 1, it may output a [=Parameter Substream=] with demixing info. - For a channel-based [=Audio Element=] with [=num_layers=] > 1, it outputs [=Parameter Substream=]s with demixing info and recon gain info. - It may further output [=Parameter Substream=]s with mixing gain. -- Codec encoder generates one or more [=Audio Substream=]s from each [=ChannelGroup=] based on [=Codec Config OBU=]. +- Codec encoder generates one or more [=Audio Substream=]s from each [=Channel Group=] based on [=Codec Config OBU=]. - OBU packetizer packetizes [=Descriptors=], [=Parameter Substream=]s and [=Audio Substream=]s into OBUs, and outputs an [=IA Sequence=]. - Temporal unit generator generates a [=Temporal Unit=] for each frame from [=Audio Frame OBU=]s and [=Parameter Block OBU=]s (if present). @@ -2443,15 +2444,15 @@ The IA encoder is composed of the Pre-processor, Codec encoder, and OBU packetiz For Ambisonics encoding: -- The Pre-Processor outputs one [=ChannelGroup=] and one set of [=Descriptors=]. It is composed of only the Meta Generator. +- The Pre-Processor outputs one [=Channel Group=] and one set of [=Descriptors=]. It is composed of only the Meta Generator. - The Meta Generator generates [=Descriptors=] based on the Ambisonics mode and the number of channels. - [=ambisonics_mode=] is set as follows: - 0 if [=ChannelMappingFamily=] = 2, as specified in [[RFC8486]]. - 1 if [=ChannelMappingFamily=] = 3, as speciifed in [[RFC8486]]. - [=ambisonics_config()=] is set as follows: - [=output_channel_count=] is set to the number of Ambisonics channels, e.g., 4, 9, or 16. - - [=channel_mapping=] for [=ambisonics_mode=] = 0 is assigned based on the order of the [=Audio Substream=]s in the [=ChannelGroup=]. - - [=demixing_matrix=] for [=ambisonics_mode=] = 1 is assigned based on the order of the [=Audio Substream=]s in the [=ChannelGroup=]. + - [=channel_mapping=] for [=ambisonics_mode=] = 0 is assigned based on the order of the [=Audio Substream=]s in the [=Channel Group=]. + - [=demixing_matrix=] for [=ambisonics_mode=] = 1 is assigned based on the order of the [=Audio Substream=]s in the [=Channel Group=]. - Codec Enc. outputs [=substream_count=] number of [=Audio Substream=]s. - The i-th [=Temporal Unit=] is composed of the [=Audio Frame OBU=]s for the i-th frame. - It may have an immediately preceding [=Temporal Delimiter OBU=]. @@ -2460,42 +2461,42 @@ For Ambisonics encoding: For Scalable Channel Audio encoding: -- The Pre-processor outputs N [=ChannelGroup=]s ([=num_layers=] = N), [=Descriptors=] and [=Parameter Substream=]s. It is composed of a Down-mix parameter generator, Down-mixer, Loudness, ChannelGroup generator, Attenuation, and Meta generator. +- The Pre-processor outputs N [=Channel Group=]s ([=num_layers=] = N), [=Descriptors=] and [=Parameter Substream=]s. It is composed of a Down-mix parameter generator, Down-mixer, Loudness, Channel Group generator, Attenuation, and Meta generator. - For non-scalable channel audio (i.e., [=num_layers=] = 1): - [=Parameter Substream=] for recon gain is not generated. - [=Parameter Substream=] for demixing info may be generated by implementers who assume it to be recommended for dynamic downmixing on the decoder side. - - Down-mixer, ChannelGroup generator, and Attenuation modules are not needed. + - Down-mixer, Channel Group generator, and Attenuation modules are not needed. - Down-mix parameter generator generates 5 down-mix parameters (α(k), β(k), γ(k), δ(k) and w(k)) by analyzing the input channel audio. - Down-mixer generates [=down-mixed audio=]s according to the list of channel layouts and the down-mix parameters. - Loudness module outputs the loudness level ([=LKFS=]) of each [=down-mixed audio=] based on [[ITU1770-4]]. - - ChannelGroup generator transforms the input channel audio to N [=ChannelGroup=]s for scalable channel audio with [=num_layers=] = N by using the down-mix parameters and the list of channel layouts. - - The Attenuation module applies a gain to the transformed [=ChannelGroup=]s to prevent clipping. + - Channel Group generator transforms the input channel audio to N [=Channel Group=]s for scalable channel audio with [=num_layers=] = N by using the down-mix parameters and the list of channel layouts. + - The Attenuation module applies a gain to the transformed [=Channel Group=]s to prevent clipping. - Meta generator generates [=Descriptors=] and [=Parameter Substream=]s. - [=Descriptors=] are set as follows: - [=num_layers=] is set to N (i.e., the number of channel layouts). - [=channel_audio_layer_config()=] is set as follows: - - [=loudspeaker_layout=] is set to the ith list of channel layouts for the ith [=ChannelGroup=]. - - [=output_gain_is_present_flag=] is set to 1 for the ith [=ChannelGroup=] if attenuation is applied to the mixed channels of the ith [=ChannelGroup=]. Otherwise, it is set to 0 for the ith [=ChannelGroup=]. - - [=recon_gain_is_present_flag=] is set to 1 for the ith [=ChannelGroup=] if the preceding [=ChannelGroup=]s has one or more mixed channels from the [=down-mixed audio=] for the ith channel layout. Otherwise, it is set to 0 for the ith [=ChannelGroup=]. Especially, when [=num_layers=] = 1, this flag is set to 0. + - [=loudspeaker_layout=] is set to the ith list of channel layouts for the ith [=Channel Group=]. + - [=output_gain_is_present_flag=] is set to 1 for the ith [=Channel Group=] if attenuation is applied to the mixed channels of the ith [=Channel Group=]. Otherwise, it is set to 0 for the ith [=Channel Group=]. + - [=recon_gain_is_present_flag=] is set to 1 for the ith [=Channel Group=] if the preceding [=Channel Group=]s has one or more mixed channels from the [=down-mixed audio=] for the ith channel layout. Otherwise, it is set to 0 for the ith [=Channel Group=]. Especially, when [=num_layers=] = 1, this flag is set to 0. - This flag is set to 0 for lossless codecs including LPCM. - - [=substream_count=] is set to the number of [=Audio Substream=]s in the ith [=ChannelGroup=]. - - [=coupled_substream_count=] is set to the number of coupled substreams among the [=Audio Substream=]s that make up the ith [=ChannelGroup=]. - - Each bit of [=output_gain_flags=] is set to 1 for the ith [=ChannelGroup=] if attenuation is applied to the relevant channel of the ith [=ChannelGroup=]. Otherwise, it is set to 0 for the ith [=ChannelGroup=]. + - [=substream_count=] is set to the number of [=Audio Substream=]s in the ith [=Channel Group=]. + - [=coupled_substream_count=] is set to the number of coupled substreams among the [=Audio Substream=]s that make up the ith [=Channel Group=]. + - Each bit of [=output_gain_flags=] is set to 1 for the ith [=Channel Group=] if attenuation is applied to the relevant channel of the ith [=Channel Group=]. Otherwise, it is set to 0 for the ith [=Channel Group=]. - [=output_gain=] is set to the gain (i.e., the inverse of attenuation gain) which is applied to the channels which are indicated by [=output_gain_flags=]. - - [=Parameter Substream=]s can be composed of one for demixing info and the other for recon gain. When [=recon_gain_is_present_flag=] = 0 for all [=ChannelGroup=]s, no [=Parameter Block OBU=]s for recon gain info are present in [=IA Sequence=]. + - [=Parameter Substream=]s can be composed of one for demixing info and the other for recon gain. When [=recon_gain_is_present_flag=] = 0 for all [=Channel Group=]s, no [=Parameter Block OBU=]s for recon gain info are present in [=IA Sequence=]. - [=dmixp_mode=] of [=demixing_info_parameter_data()=] for the kth frame is set to indicate (α(k), β(k), γ(k), δ(k)) and w_idx_offset(k), where w_idx_offset(k) = 1 or -1. - [=recon_gain_flags=] of [=recon_gain_info_parameter_data()=] is set to indicate the de-mixed channels which need to apply [=recon_gain=] among the output channels after demixing for the ith channel layout. - - [=recon_gain=] is set to the gain value to be applied to the channel which is indicated by [=recon_gain_flags=] for the ith [=ChannelGroup=]. + - [=recon_gain=] is set to the gain value to be applied to the channel which is indicated by [=recon_gain_flags=] for the ith [=Channel Group=]. - [=Temporal Unit=] for the kth frame is composed of zero or more [=Parameter Block OBU=]s and followed by the [=Audio Frame OBU=]s for the kth frames. - It may have the immediately preceding [=Temporal Delimiter OBU=]. - - [=ChannelGroup=]s in a [=Temporal Unit=] are placed in order. In other words, the [=ChannelGroup=] for the first channel layout comes first, followed by the [=ChannelGroup=] for the second channel layout, followed by the [=ChannelGroup=] for the third channel layout, and so on. + - [=Channel Group=]s in a [=Temporal Unit=] are placed in order. In other words, the [=Channel Group=] for the first channel layout comes first, followed by the [=Channel Group=] for the second channel layout, followed by the [=Channel Group=] for the third channel layout, and so on. The figure below shows the IA encoding flowchart for Scalable Channel Audio. - For a given input channel audio and a given list of channel layouts for scalability, PCMs for the input channel audio are passed to the CG Generation module. - CG Generation module generates the transformed audio according to the CG generation rule based on the list of CLs and the down-mix parameters. - - The transformed audio is structured as [=ChannelGroup=]s. + - The transformed audio is structured as [=Channel Group=]s. - Non-mixed channels of the transformed audio (i.e., the original channels of the input channel audio) are directly input to the Codec encoder, but the mixed channels may be input first to the Attenuation module and then to the Codec encoder. -- The Attenuation module reduces all sample values of the mixed channels in the same [=ChannelGroup=] at a uniform rate ([=output_gain=]). +- The Attenuation module reduces all sample values of the mixed channels in the same [=Channel Group=] at a uniform rate ([=output_gain=]). - A range of 0 dB to -6 dB is recommended for attenuation. (i.e., a range of 0 dB to 6 dB for [=output_gain=]) - Codec Enc. generates the coded [=Audio Substream=]s from PCMs and passes the coded [=Audio Substream=]s and one single [=decoder_config()=] to OBU Packetizer. - OBU packetizer generates [=Descriptors=] which consists of one [=IA Sequence Header OBU=], one [=Codec Config OBU=], one [=Audio Element OBU=] and one or more [=Mix Presentation OBU=]. @@ -2798,38 +2799,38 @@ If 10*log10(level Ok / maxL^2) is less than the first threshold value (-80dB is If 10*log10(level Ok / level Mk ) is less than the second threshold value (-6dB is RECOMMENDED), Recon_Gain (k, i) is set to the value which makes level Ok = Recon_Gain (k, i)^2 * level Dk. Otherwise, Recon_Gain (k, i) = 1. Actual value (i.e., [=recon_gain=]) to be delivered is floor(255*Recon_Gain). For example, if we assume CL #i = 7.1.4ch and CL #i-1 = 5.1.2ch, then de-mixed channels are D_Lrs7, D_Rrs7, D_Ltb4 and D_Rtb4. -- D_Lrs7 and D_Rrs7 are de-mixed from Ls5 and Rs5 in the (i-1)th [=ChannelGroup=] by using Lss7 and Rss7 in the ith [=ChannelGroup=] and its relevant demixing parameters (i.e., α(k) and β(k)) , respectively. -- D_Ltb4 and D_Rtb4 are de-mixed from Ltf2 and Rtf2 in the (i-1)th [=ChannelGroup=] by using Ltf4 and Rtf4 in the ith [=ChannelGroup=] and its relevant demixing parameter (i.e., γ(k)), respectively. +- D_Lrs7 and D_Rrs7 are de-mixed from Ls5 and Rs5 in the (i-1)th [=Channel Group=] by using Lss7 and Rss7 in the ith [=Channel Group=] and its relevant demixing parameters (i.e., α(k) and β(k)) , respectively. +- D_Ltb4 and D_Rtb4 are de-mixed from Ltf2 and Rtf2 in the (i-1)th [=Channel Group=] by using Ltf4 and Rtf4 in the ith [=Channel Group=] and its relevant demixing parameter (i.e., γ(k)), respectively. Recon_Gain for D_Lrs7: -- Level Ok is the signal power for the frame #k of Lrs7 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Ls5 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Lrs7 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Ls5 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Lrs7. Recon_Gain for D_Rrs7: -- Level Ok is the signal power for the frame #k of Rrs7 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Rs5 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Rrs7 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Rs5 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Rrs7. Recon_Gain for D_Ltb4: -- Level Ok is the signal power for the frame #k of Ltf4 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Ltf2 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Ltf4 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Ltf2 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Ltb4. Recon_Gain for D_Rtb4: -- Level Ok is the signal power for the frame #k of Rtf4 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Rtf2 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Rtf4 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Rtf2 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Rtb4. -### Annex B-5: ChannelGroup Generation Rule ### {#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule} +### Annex B-5: Channel Group Generation Rule ### {#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule} -This section describes the generation rule for [=ChannelGroup=]. +This section describes the generation rule for [=Channel Group=]. -For a given channel-based input audio and the list of CLs ({CL #i: i = 1, 2, ..., n}), the CG Generation module outputs the transformed audio (i.e., ChannelGroups) which SHALL conform to the following rules: -- It consists of C number of channels and is structured to n number of [=ChannelGroup=]s, where C is the number of channels for the input audio. -- [=ChannelGroup=] #1 (as called BCG): This [=ChannelGroup=] is the [=down-mixed audio=] itself for CL #1 generated from the input audio. It contains a C1 number of channels. -- [=ChannelGroup=] #i (as called DCG, i = 2, 3, …, n): This [=ChannelGroup=] contains (Ci – Ci-1) number of channels. (Ci – Ci-1) channel(s) consists of as follows: +For a given channel-based input audio and the list of CLs ({CL #i: i = 1, 2, ..., n}), the CG Generation module outputs the transformed audio (i.e., Channel Groups) which SHALL conform to the following rules: +- It consists of C number of channels and is structured to n number of [=Channel Group=]s, where C is the number of channels for the input audio. +- [=Channel Group=] #1 (as called BCG): This [=Channel Group=] is the [=down-mixed audio=] itself for CL #1 generated from the input audio. It contains a C1 number of channels. +- [=Channel Group=] #i (as called DCG, i = 2, 3, …, n): This [=Channel Group=] contains (Ci – Ci-1) number of channels. (Ci – Ci-1) channel(s) consists of as follows: - (Si – Si-1) surround channel(s) if Si > Si-1 . When S_set = { x | Si-1 < x ≤ Si and x is an integer}, - If 2 is an element of S_set, the L2 channel is contained in this CG #i. - If 3 is an element of S_set, the Center channel is contained in this CG #i. @@ -2837,8 +2838,8 @@ For a given channel-based input audio and the list of CLs ({CL #i: i = 1, 2, ... - If 7 is an element of S_set, the Lss7 and Rss7 channels are contained in this CG #i. - The LFE channel if Wi > Wi-1. - (Ti – Ti-1) top channels if Ti > Ti-1 . - - If Ti-1 = 0, the top channels of the [=down-mixed audio=] for CL #i are contained in this [=ChannelGroup=] #i. - - If Ti-1 = 2, the Ltf and Rtf channels of the [=down-mixed audio=] for CL #i are contained in this [=ChannelGroup=] #i. + - If Ti-1 = 0, the top channels of the [=down-mixed audio=] for CL #i are contained in this [=Channel Group=] #i. + - If Ti-1 = 2, the Ltf and Rtf channels of the [=down-mixed audio=] for CL #i are contained in this [=Channel Group=] #i. The figure below shows one example of a transformation matrix with 4 CGs (2ch/3.1.2ch/5.1.2ch/7.1.4ch). From 37b372ee33972f33974e1b8d0a2ae304c4623f18 Mon Sep 17 00:00:00 2001 From: sunghee-hwang <97494915+sunghee-hwang@users.noreply.github.com> Date: Thu, 17 Aug 2023 15:32:53 +0900 Subject: [PATCH 06/11] Fix #675 --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index e377d06c..3cb9099a 100644 --- a/index.bs +++ b/index.bs @@ -539,7 +539,7 @@ This OBU is used to indicate the start of an [=IA Sequence=]. So, the first OBU NOTE: When an [=IA Sequence=] is stored in a file, the [=IA Sequence Header OBU=] can be used to identify that the file contains an [=IA Sequence=]. -This OBU MAY be placed frequently within one single [=IA Sequence=] for an application such as broadcasting or multicasting. In that case, all [=IA Sequence Header OBU=]s except the first one SHALL be marked as redundant (i.e., [=obu_redundant_copy=] = 1). +This OBU MAY be placed frequently within one single [=IA Sequence=] for an application such as broadcasting or multicasting. In that case, all [=IA Sequence Header OBU=]s except the first one SHALL be marked as redundant (i.e., [=obu_redundant_copy=] = 1). So, if a decoder encounters a non-redundant [=IA Sequence Header OBU=] (i.e., [=obu_redundant_copy=] = 0), and it has also received the previous [=IA Sequence Header OBU=], the non-redundant [=IA Sequence Header OBU=] indicates the start of a new [=IA Sequence=]. Syntax From 3f6ff86f4cd1f9a855b46a391b9de46ddbdc1dfa Mon Sep 17 00:00:00 2001 From: sunghee-hwang <97494915+sunghee-hwang@users.noreply.github.com> Date: Thu, 17 Aug 2023 16:13:49 +0900 Subject: [PATCH 07/11] Fix #678 --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index e377d06c..751ddf6e 100644 --- a/index.bs +++ b/index.bs @@ -521,7 +521,7 @@ NOTE: A future version of the specification may use this flag to specify an exte ## Reserved OBU Syntax and Semantics ## {#obu-reserved} -Reserved OBUs SHOULD be ignored by parsers compliant with this version of the specification. Future versions of the specification MAY define semantics for these reserved OBUs that would only be supported by parsers compliant with these future versions. +Reserved OBUs SHOULD be ignored by parsers compliant with this version of the specification. Future versions of the specification MAY define syntax and semantics for these reserved OBUs that would only be supported by parsers compliant with these future versions. Syntax From c0c4caf1b2670d9a4c4926f8f30f04b6ac93dee2 Mon Sep 17 00:00:00 2001 From: sunghee-hwang <97494915+sunghee-hwang@users.noreply.github.com> Date: Thu, 17 Aug 2023 16:50:48 +0900 Subject: [PATCH 08/11] IaOpenBitstreamUnit -> IAOpenBitstreamUnit --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index 6bc3dc02..fd013bc7 100644 --- a/index.bs +++ b/index.bs @@ -409,7 +409,7 @@ The obu_header and all OBU payloads including reserved_obuSyntax ``` -class IaOpenBitstreamUnit() { +class IAOpenBitstreamUnit() { ObuHeader obu_header; if (obu_type == OBU_IA_Sequence_Header) From 4d6a525384c5b092798417edc1bafee7daa75126 Mon Sep 17 00:00:00 2001 From: sunghee-hwang <97494915+sunghee-hwang@users.noreply.github.com> Date: Thu, 17 Aug 2023 17:11:42 +0900 Subject: [PATCH 09/11] Fix #680, either 0xF806 or 0xFC06 --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index 5a2e21cb..e80c26ef 100644 --- a/index.bs +++ b/index.bs @@ -555,7 +555,7 @@ class IaSequenceHeaderObu() { ia_code is a ‘four-character code’ (4CC), iamf. -NOTE: When IA OBUs are delivered over a protocol that does not provide explicit [=IA Sequence=] boundaries, a parser may locate the [=IA Sequence=] start by searching for the code iamf preceded by specific OBU header values. For example, by assuming that [=obu_extension_flag=] is set to 0 and because [=obu_trimming_status_flag=] is set to 0 for an [=IA Sequence Header OBU=], the OBU header can be 0xF806 or 0xFC06. +NOTE: When IA OBUs are delivered over a protocol that does not provide explicit [=IA Sequence=] boundaries, a parser may locate the [=IA Sequence=] start by searching for the code iamf preceded by specific OBU header values. For example, by assuming that [=obu_extension_flag=] is set to 0 and because [=obu_trimming_status_flag=] is set to 0 for an [=IA Sequence Header OBU=], the OBU header can be either 0xF806 or 0xFC06. primary_profile indicates the primary profile that this [=IA Sequence=] complies with. Parsers compliant with this version of the specification SHOULD discard the [=IA Sequence=] if they do not support the value indicated here. From b33962e75866b10f3f9b212563f5433a3cf40fae Mon Sep 17 00:00:00 2001 From: sunghee-hwang <97494915+sunghee-hwang@users.noreply.github.com> Date: Thu, 17 Aug 2023 17:17:37 +0900 Subject: [PATCH 10/11] Fix #681, fix typo (encoder -> decoder) --- index.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.bs b/index.bs index 5a2e21cb..bc1e3d5b 100644 --- a/index.bs +++ b/index.bs @@ -604,7 +604,7 @@ NOTE: 'ipcm' should not be confused with lpcm, which is another 4CC num_samples_per_frame indicates the frame length, in samples, of the [=audio_frame()=] provided in the audio_frame_obu. It SHALL NOT be set to zero. If the [=decoder_config()=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal. -audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the encoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. +audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate. - It SHALL be set to -R when [=codec_id=] is set to 'Opus', where R is ceil(3840 / [=num_samples_per_frame=]). - It SHALL be set to -1 when [=codec_id=] is set to 'mp4a'. - It SHALL be set to 0 when [=codec_id=] is set to 'fLaC' or 'ipcm'. From f18bd47a27c0b5f746d61351a8f588dbb0bcf326 Mon Sep 17 00:00:00 2001 From: Felicia Lim Date: Wed, 16 Aug 2023 17:12:25 -0700 Subject: [PATCH 11/11] Use scoped concepts for duplicated dfns Autolinks now point to the correct definition if a syntax with the same name is defined in more than one OBU (e.g. IDs). Fix #682 --- index.bs | 146 +++++++++++++++++++++++++++---------------------------- 1 file changed, 73 insertions(+), 73 deletions(-) diff --git a/index.bs b/index.bs index 5a2e21cb..e7865d18 100644 --- a/index.bs +++ b/index.bs @@ -500,8 +500,8 @@ NOTE: Because of coding dependency, discarding a sample can sometimes mean decod NOTE: This means that if one of the values is set to the number of samples in the [=Audio Frame OBU=] (i.e., [=num_samples_per_frame=]), the other value is set to 0. -- When [=num_samples_to_trim_at_start=] is non-zero, all [=Audio Frame OBU=]s with the same [=audio_substream_id=], and preceding this OBU back until the [=Codec Config OBU=] defining this [=Audio Substream=], SHALL have their [=num_samples_to_trim_at_start=] field equal to the number of samples in the corresponding [=Audio Frame OBU=] (i.e., [=num_samples_per_frame=]). -- When [=num_samples_to_trim_at_end=] is non-zero in an [=Audio Frame OBU=], there SHALL be no subsequent [=Audio Frame OBU=] with the same [=audio_substream_id=] until a non-redundant [=Codec Config OBU=] defining an [=Audio Substream=] with the same [=audio_substream_id=]. +- When [=num_samples_to_trim_at_start=] is non-zero, all [=Audio Frame OBU=]s with the same [=audio_substream/audio_substream_id=], and preceding this OBU back until the [=Codec Config OBU=] defining this [=Audio Substream=], SHALL have their [=num_samples_to_trim_at_start=] field equal to the number of samples in the corresponding [=Audio Frame OBU=] (i.e., [=num_samples_per_frame=]). +- When [=num_samples_to_trim_at_end=] is non-zero in an [=Audio Frame OBU=], there SHALL be no subsequent [=Audio Frame OBU=] with the same [=audio_substream/audio_substream_id=] until a non-redundant [=Codec Config OBU=] defining an [=Audio Substream=] with the same [=audio_substream/audio_substream_id=]. obu_extension_flag indicates whether the [=extension_header_size=] field is present. If it is set to 0, the [=extension_header_size=] field SHALL NOT be present. Otherwise, the [=extension_header_size=] field SHALL be present. @@ -590,7 +590,7 @@ class codec_config() { Semantics -codec_config_id defines an identifier for a codec configuration. Within an [=IA Sequence=], there SHALL be one unique [=codec_config_id=] per codec. There SHALL be exactly one [=Codec Config OBU=] with a given identifier in a set of [=Descriptors=]. [=Audio Element=]s use this identifier to indicate that its corresponding [=Audio Substream=]s are coded with this codec configuration. +codec_config_id defines an identifier for a codec configuration. Within an [=IA Sequence=], there SHALL be one unique [=codec_config_obu/codec_config_id=] per codec. There SHALL be exactly one [=Codec Config OBU=] with a given identifier in a set of [=Descriptors=]. [=Audio Element=]s use this identifier to indicate that its corresponding [=Audio Substream=]s are coded with this codec configuration. codec_id indicates a ‘four-character code’ (4CC) to identify the codec used to generate the coded [=Audio Substream=]s. For this version of the specification, it SHALL be set to one of the four [=codec_id=] values defined below: - 'Opus': All coded [=Audio Substream=]s referred to by all [=Audio Element=]s with this codec configuration SHALL comply with the [[!RFC6716]] specification and the [=decoder_config()=] structure SHALL comply with the constraints given in [[#opus-specific]]. @@ -678,7 +678,7 @@ class ReconGainParamDefinition() extends ParamDefinition() { Semantics -audio_element_id defines an identifier for an [=Audio Element=]. Within an [=IA Sequence=], there SHALL be one unique [=audio_element_id=] per [=Audio Element=]. There SHALL be exactly one [=Audio Element OBU=] with a given identifier in a set of [=Descriptors=]. [=Mix Presentation=]s refer to a particular [=Audio Element=] using this identifier. +audio_element_id defines an identifier for an [=Audio Element=]. Within an [=IA Sequence=], there SHALL be one unique [=audio_element_obu/audio_element_id=] per [=Audio Element=]. There SHALL be exactly one [=Audio Element OBU=] with a given identifier in a set of [=Descriptors=]. [=Mix Presentation=]s refer to a particular [=Audio Element=] using this identifier. audio_element_type specifies the audio representation of this [=Audio Element=], which is constructed from one or more [=Audio Substream=]s. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a reserved [=audio_element_type=]. @@ -689,17 +689,17 @@ audio_element_type: The type of audio representation. 2~7 : Reserved -codec_config_id indicates the identifier for the codec configuration which this [=Audio Element=] refers to. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a [=codec_config_id=] identifying an unknown [=codec_id=]. +codec_config_id indicates the identifier for the codec configuration which this [=Audio Element=] refers to. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a [=audio_element_obu/codec_config_id=] identifying an unknown [=codec_id=]. num_substreams specifies the number of [=Audio Substream=]s that are used to reconstruct this [=Audio Element=]. It SHALL NOT be set to 0. -audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to. +audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to. Let a particular [=ChannelGroup=]'s [=Audio Substream=]s be indexed as [c, n_c], where a [=ChannelGroup=] generation rule is described in [[#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule]] and - [=c=] = [1, ..., C] is the [=ChannelGroup=] index and C is the number of [=ChannelGroup=]s. - [=n_c=] = [1, ..., N_c] is the [=Audio Substream=] index in the c-th [=ChannelGroup=] and N_c is the number of [=Audio Substream=]s in the c-th [=ChannelGroup=]. -Then, the i-th [=audio_substream_id=] maps to a [=ChannelGroup=]'s [=Audio Substream=]s as follows, where i is the index of the array: +Then, the i-th [=audio_element_obu/audio_substream_id=] maps to a [=ChannelGroup=]'s [=Audio Substream=]s as follows, where i is the index of the array: ``` [ @@ -752,9 +752,9 @@ In this parameter definition, - [=parameter_rate=] SHALL be set to the sample rate of this [=Audio Element=]. - [=param_definition_mode=] SHALL be set to 0. -- [=duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=]. -- [=num_subblocks=] SHALL be set to 1. -- [=constant_subblock_duration=] SHALL be the same as [=duration=]. +- [=ParamDefinition/duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=]. +- [=ParamDefinition/num_subblocks=] SHALL be set to 1. +- [=ParamDefinition/constant_subblock_duration=] SHALL be the same as [=ParamDefinition/duration=]. recon_gain_info provides the parameter definition for the gain value, which is used to reconstruct a scalable channel audio representation. The parameter definition is provided by ReconGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks is specified in [=recon_gain_info_parameter_data()=]. @@ -762,9 +762,9 @@ In this parameter definition, - [=parameter_rate=] SHALL be set to the sample rate of this [=Audio Element=]. - [=param_definition_mode=] SHALL be set to 0. -- [=duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=]. -- [=num_subblocks=] SHALL be set to 1. -- [=constant_subblock_duration=] SHALL be same as [=duration=]. +- [=ParamDefinition/duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=]. +- [=ParamDefinition/num_subblocks=] SHALL be set to 1. +- [=ParamDefinition/constant_subblock_duration=] SHALL be same as [=ParamDefinition/duration=]. param_definition_size indicates the size in bytes of [=param_definition_bytes=]. @@ -779,7 +779,7 @@ In this parameter definition, audio_element_config_bytes represents reserved bytes for future use when new [=audio_element_type=] values are defined. Parsers compliant with this version of the specification SHOULD ignore these bytes. -default_demixing_info_parameter_data() provides the default demixing parameter data to apply to all audio samples when there are no [=Parameter Block OBU=]s (with the same [=parameter_id=] defined in this DemixingParamDefinition()) provided. +default_demixing_info_parameter_data() provides the default demixing parameter data to apply to all audio samples when there are no [=Parameter Block OBU=]s (with the same [=ParamDefinition/parameter_id=] defined in this DemixingParamDefinition()) provided. - In this class, [=w_idx_offset=] in [=demixing_info_parameter_data()=] SHALL be ignored. - Instead, [=default_w=] directly indicates the weight value [=w(k)=]. @@ -802,7 +802,7 @@ The mapping of [=default_w=] to [=w(k)=] SHOULD be as follows: 11 ~ 15 : reserved -A default recon gain value of 0 dB is implied when there are no [=Parameter Block OBU=]s (with the same [=parameter_id=] defined in this ReconGainParamDefinition()) provided. +A default recon gain value of 0 dB is implied when there are no [=Parameter Block OBU=]s (with the same [=ParamDefinition/parameter_id=] defined in this ReconGainParamDefinition()) provided. ### Parameter Definition Syntax and Semantics ### {#parameter-definition} @@ -831,31 +831,31 @@ abstract class ParamDefinition() { Semantics -parameter_id indicates the identifier for the [=Parameter Substream=] which this parameter definition refers to. There SHALL be one unique [=parameter_id=] per [=Parameter Substream=]. +parameter_id indicates the identifier for the [=Parameter Substream=] which this parameter definition refers to. There SHALL be one unique [=ParamDefinition/parameter_id=] per [=Parameter Substream=]. parameter_rate specifies the rate used by this [=Parameter Substream=], expressed as ticks per second. Time-related fields associated with this [=Parameter Substream=], such as durations, SHALL be expressed in the number of ticks. - The rate SHALL be a value such that (the rate * [=num_samples_per_frame=]) / (the sample rate of [=Audio Element=]) is a non-zero integer. -param_definition_mode indicates whether this parameter definition specifies the [=duration=], [=num_subblocks=], [=constant_subblock_duration=] and [=subblock_duration=] fields for the parameter blocks with the same [=parameter_id=]. +param_definition_mode indicates whether this parameter definition specifies the [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=] and [=ParamDefinition/subblock_duration=] fields for the parameter blocks with the same [=parameter_block_obu/parameter_id=]. -- When this field is set to 0, all of the [=duration=], [=num_subblocks=], [=constant_subblock_duration=], and [=subblock_duration=] fields SHALL be specified in this parameter definition. None of the parameter blocks with the same [=parameter_id=] SHALL specify these same fields. +- When this field is set to 0, all of the [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=] fields SHALL be specified in this parameter definition. None of the parameter blocks with the same [=parameter_block_obu/parameter_id=] SHALL specify these same fields. -- When this field is set to 1, none of the [=duration=], [=num_subblocks=], [=constant_subblock_duration=], and [=subblock_duration=] fields SHALL be specified in this parameter definition. Instead, each parameter block with the same [=parameter_id=] SHALL specify these same fields. +- When this field is set to 1, none of the [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=] fields SHALL be specified in this parameter definition. Instead, each parameter block with the same [=parameter_block_obu/parameter_id=] SHALL specify these same fields. -duration specifies the duration for which each parameter block with the same [=parameter_id=] is valid and applicable. It SHALL NOT be set to 0. +duration specifies the duration for which each parameter block with the same [=parameter_block_obu/parameter_id=] is valid and applicable. It SHALL NOT be set to 0. -constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of constant_subblock_duration SHALL be set to 0. +constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of [=ParamDefinition/constant_subblock_duration=] SHALL be set to 0. -Let D = the value of [=duration=], NS = the value of [=num_subblocks=], CSD = the value of [=constant_subblock_duration=] and SD = the value of [=subblock_duration=]. -- When [=CSD=] != 0, [=num_subblocks=] is implicitly calculated as [=NS=] = ceil([=D=] / [=CSD=]). +Let D = the value of [=ParamDefinition/duration=], NS = the value of [=ParamDefinition/num_subblocks=], CSD = the value of [=ParamDefinition/constant_subblock_duration=] and SD = the value of [=ParamDefinition/subblock_duration=]. +- When [=CSD=] != 0, [=ParamDefinition/num_subblocks=] is implicitly calculated as [=NS=] = ceil([=D=] / [=CSD=]). - If [=NS=] * [=CSD=] > [=D=], the actual duration of the last subblock SHALL be [=D=] - ([=NS=] - 1) * [=CSD=]. - When [=CSD=] = 0, the summation of all [=SD=]s in this parameter block SHALL be equal to [=D=]. -num_subblocks specifies the number of different sets of parameter values specified in each parameter block with the same [=parameter_id=], where each set describes a different subblock of the timeline, contiguously. +num_subblocks specifies the number of different sets of parameter values specified in each parameter block with the same [=parameter_block_obu/parameter_id=], where each set describes a different subblock of the timeline, contiguously. -subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0. +subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0. -The values for [=duration=], [=constant_subblock_duration=], and [=subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition. +The values for [=ParamDefinition/duration=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition. ### Scalable Channel Layout Config Syntax and Semantics ### {#syntax-scalable-channel-layout-config} @@ -1113,7 +1113,7 @@ class MixPresentationObu() { num_audio_elements specifies the number of [=Audio Element=]s that are used in this [=Mix Presentation=] to generate the final output audio signal for playback. It SHALL NOT be set to 0. -audio_element_id indicates the identifier for an [=Audio Element=] which this [=Mix Presentation=] refers to. +audio_element_id indicates the identifier for an [=Audio Element=] which this [=Mix Presentation=] refers to. mix_presentation_element_annotations() provides informational metadata that the playback system MAY use to display information to the user. It is not used in the rendering or mixing process to generate the final output audio signal. @@ -1220,9 +1220,9 @@ class MixGainParamDefinition() extends ParamDefinition() { Semantics -mix_gain provides the parameter definition for the gain value that is applied to all channels of the rendered [=Audio Element=] signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_id=] is specified in [=mix_gain_parameter_data()=]. +mix_gain provides the parameter definition for the gain value that is applied to all channels of the rendered [=Audio Element=] signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_block_obu/parameter_id=] is specified in [=mix_gain_parameter_data()=]. -default_mix_gain specifies the default mix gain value to apply when there are no mix gain parameter blocks with the same [=parameter_id=] provided. This value is expressed in dB and SHALL be applied to all channels in the rendered [=Audio Element=]. It is stored as a 16-bit, signed, two's complement fixed-point value with 8 fractional bits (i.e., Q7.8 in [[!Q-Format]]). +default_mix_gain specifies the default mix gain value to apply when there are no mix gain parameter blocks with the same [=parameter_block_obu/parameter_id=] provided. This value is expressed in dB and SHALL be applied to all channels in the rendered [=Audio Element=]. It is stored as a 16-bit, signed, two's complement fixed-point value with 8 fractional bits (i.e., Q7.8 in [[!Q-Format]]). ### Output Mix Config Syntax and Semantics ### {#obu-mixpresentation-outputmix} @@ -1239,7 +1239,7 @@ class output_mix_config() { Semantics -output_mix_gain provides the parameter definition for the gain value that is applied to all channels of the mixed audio signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_id=] is specified in [=mix_gain_parameter_data()=]. +output_mix_gain provides the parameter definition for the gain value that is applied to all channels of the mixed audio signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_block_obu/parameter_id=] is specified in [=mix_gain_parameter_data()=]. ### Layout Syntax and Semantics ### {#syntax-layout} @@ -1418,23 +1418,23 @@ class ParameterBlockObu() { Semantics -parameter_id indicates the identifier for a [=Parameter Substream=] which this [=Parameter Block OBU=] refers to. If no [=Audio Element OBU=]s or [=Mix Presentation OBU=]s refer to this [=parameter_id=], parsers compliant with this version of the specification SHOULD ignore [=Parameter Block OBU=]s with this identifier. +parameter_id indicates the identifier for a [=Parameter Substream=] which this [=Parameter Block OBU=] refers to. If no [=Audio Element OBU=]s or [=Mix Presentation OBU=]s refer to this [=parameter_block_obu/parameter_id=], parsers compliant with this version of the specification SHOULD ignore [=Parameter Block OBU=]s with this identifier. -get_param_definition() is a run-time function to get the [=param_definition_type=] and [=param_definition_mode=] from the [=Audio Element OBU=] or [=Mix Presentation OBU=] that references this [=parameter_id=]. +get_param_definition() is a run-time function to get the [=param_definition_type=] and [=param_definition_mode=] from the [=Audio Element OBU=] or [=Mix Presentation OBU=] that references this [=parameter_block_obu/parameter_id=]. -If [=param_definition_mode=] = 0, this function additionally gets the following fields from the same [=Audio Element OBU=]: [=duration=], [=num_subblocks=], [=constant_subblock_duration=], and [=subblock_duration=]. +If [=param_definition_mode=] = 0, this function additionally gets the following fields from the same [=Audio Element OBU=] or [=Mix Presentation OBU=]: [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=]. When it gets an unknown [=param_definition_type=], parsers compliant with this version of the specification SHOULD ignore the [=Parameter Block OBU=]. -duration specifies the duration for which this parameter block is valid and applicable. It SHALL NOT be set to 0. +duration specifies the duration for which this parameter block is valid and applicable. It SHALL NOT be set to 0. -constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of constant_subblock_duration SHALL be set to 0. +constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of [=parameter_block_obu/constant_subblock_duration=] SHALL be set to 0. -num_subblocks specifies the number of different sets of parameter values specified in this parameter block, where each set describes a different subblock of the timeline, contiguously. When [=constant_subblock_duration=] != 0, [=num_subblocks=] is implicitly calculated as [=num_subblocks=] = ceil([=duration=] / [=constant_subblock_duration=]). +num_subblocks specifies the number of different sets of parameter values specified in this parameter block, where each set describes a different subblock of the timeline, contiguously. When [=parameter_block_obu/constant_subblock_duration=] != 0, [=parameter_block_obu/num_subblocks=] is implicitly calculated as [=parameter_block_obu/num_subblocks=] = ceil([=parameter_block_obu/duration=] / [=parameter_block_obu/constant_subblock_duration=]). -subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0. +subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0. -The values of [=duration=], [=constant_subblock_duration=], and [=subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition. +The values of [=parameter_block_obu/duration=], [=parameter_block_obu/constant_subblock_duration=], and [=parameter_block_obu/subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition. parameter_data_size indicates the size in bytes of [=parameter_data_bytes=]. @@ -1567,7 +1567,7 @@ class recon_gain_info_parameter_data() { This section specifies the OBU payloads of OBU_IA_Audio_Frame and OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17. -audio_substream_id is an identifier for an [=Audio Substream=] associated with this audio frame. Within an [=IA Sequence=], there SHALL be one unique [=audio_substream_id=] per [=Audio Substream=]. There SHALL be exactly one [=Audio Element OBU=] with a given [=audio_substream_id=] in a set of [=Descriptors=]. +audio_substream_id defines an identifier for an [=Audio Substream=] associated with this audio frame. Within an [=IA Sequence=], there SHALL be one unique [=audio_substream/audio_substream_id=] per [=Audio Substream=]. There SHALL be exactly one [=Audio Element OBU=] with a given [=audio_element_obu/audio_substream_id=] in a set of [=Descriptors=]. Syntax @@ -1582,14 +1582,14 @@ class AudioFrameObu(audio_substream_id_in_bitstream) { Semantics -The variable audio_substream_id_in_bitstream does not exist in an [=IA Sequence=]. It is an indicator of whether this OBU payload includes an explicit [=audio_substream_id=] and its value is based on the [=obu_type=], as follows: +The variable audio_substream_id_in_bitstream does not exist in an [=IA Sequence=]. It is an indicator of whether this OBU payload includes an explicit [=audio_substream/audio_substream_id=] and its value is based on the [=obu_type=], as follows: - true for [=obu_type=] = OBU_IA_Audio_Frame. - false for [=obu_type=] = OBU_IA_Audio_Frame_ID0, OBU_IA_Audio_Frame_ID1, ..., or OBU_IA_Audio_Frame_ID17. -explicit_audio_substream_id defines the [=audio_substream_id=] of this frame. The value SHALL be greater than 17. When this field is not present, [=audio_substream_id=] is implicit and is defined as a value from 0 to 17 for OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, respectively. +explicit_audio_substream_id indicates the [=audio_substream/audio_substream_id=] of this frame. The value SHALL be greater than 17. When this field is not present, [=audio_substream/audio_substream_id=] is implicit and is defined as a value from 0 to 17 for OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, respectively. -NOTE: The first 18 [=Audio Substream=]s in an [=IA Sequence=] MAY use the OBU types OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, which have predefined [=audio_substream_id=]s associated with them. This reduces bitrate by avoiding the extra [=explicit_audio_substream_id=] field in the bitstream. +NOTE: The first 18 [=Audio Substream=]s in an [=IA Sequence=] MAY use the OBU types OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, which have predefined [=audio_substream/audio_substream_id=]s associated with them. This reduces bitrate by avoiding the extra [=explicit_audio_substream_id=] field in the bitstream. coded_frame_size is the size of [=audio_frame()=] in bytes. @@ -1825,7 +1825,7 @@ NOTE: In a typical case, the OBUs in the first [=Descriptors=] of an [=IA Sequen A file conformant to this specification satisfies the following: - It SHALL conform to the normative requirements of [[!ISOBMFF]] -- It SHALL have the iamf brand among the compatible brands array of the FileTypeBox +- It SHALL have the iamf brand among the compatible brands array of the FileTypeBox - It SHALL contain at least one track using an [=IASampleEntry=] - It SHOULD indicate a structural ISOBMFF brand among the compatible brands' array of the FileTypeBox, such as 'iso6' - It MAY indicate other brands not specified in this specification provided that the associated requirements do not conflict with those given in this specification @@ -1864,7 +1864,7 @@ NOTE: Multiple sample entries may be used in a track, for example when the track ### IA Sample Entry ### {#iasampleentry-section}
-	Sample Entry Type: iamf
+	Sample Entry Type: iamf
 	Container:         Sample Description Box ('stsd')
 	Mandatory:         Yes
 	Quantity:          One or more.
@@ -2160,7 +2160,7 @@ When an [=IA Sequence=] contains multiple [=Mix Presentation=]s, the IA parser S
 1. If there are any user-selectable mixes, the IA parser SHOULD select the mix, or mixes, that match the user's preferences. An example might be a mix with a specific language. [=Mix Presentation=]s MAY use [=mix_presentation_friendly_label=] to describe such mixes.
 2. If there is more than one valid mix remaining, the IA parser SHOULD select an appropriate mix for rendering, in the following order.
 	1. If the playback device is headphones:
-		1. Select the mix with [=audio_element_id=] whose [=loudspeaker_layout=] is BINAURAL.
+		1. Select the mix with [=mix_presentation_obu/audio_element_id=] whose [=loudspeaker_layout=] is BINAURAL.
 		2. If there is no such mix, select the mix with [=loudness_layout=] = BINAURAL.
 		3. If there is no such mix, select the mix with the highest available [=loudness_layout=].
 	2. If the playback layout is loudspeakers:
@@ -2284,7 +2284,7 @@ This section describes how a set of parameter values is animated over a subblock
 
 If [=animation_type=] is equal to STEP, the parameter value provided by [=start_point_value=] SHOULD be applied to all time steps in the subblock.
 
-If [=animation_type=] is equal to LINEAR or BEZIER, the information provided in AnimatedParameterData() describes how the set of parameter values is animated as a Bezier curve. Let T be the [=subblock_duration=] defined in the parameter_block_obu and P0, P1 and P2 be 2D coordinates defined as
+If [=animation_type=] is equal to LINEAR or BEZIER, the information provided in AnimatedParameterData() describes how the set of parameter values is animated as a Bezier curve. Let T be the [=parameter_block_obu/subblock_duration=] defined in the parameter_block_obu and P0, P1 and P2 be 2D coordinates defined as
 
 ```
 P0 = (t0, start_point_value),
@@ -2564,18 +2564,18 @@ Step 1: [=Descriptors=] are generated as follows:
 - [=IA Sequence Header OBU=]: take the larger [=primary_profile=] field and the larger [=additional_profile=] field, respectively.
 - [=Codec Config OBU=]: take the [=Codec Config OBU=] of an [=IA Sequence=].
 - Two [=Audio Element OBU=]s: take both of them and make the following modifications:
-	- [=codec_config_id=] in each [=Audio Element OBU=] is updated to indicate the [=codec_config_id=] specified in the taken [=Codec Config OBU=].
-	- Each [=audio_element_id=] is updated to be unique between the two [=Audio Element OBU=]s.
-	- Each [=audio_substream_id=] is updated to be unique between the two [=Audio Element OBU=]s.
-	- [=parameter_id=]s in [=ParamDefinition()=]s carried in each [=Audio Element OBU=] are updated to be unique within the new [=IA Sequence=], if necessary.
+	- [=audio_element_obu/codec_config_id=] in each [=Audio Element OBU=] is updated to indicate the [=codec_config_obu/codec_config_id=] specified in the taken [=Codec Config OBU=].
+	- Each [=audio_element_obu/audio_element_id=] is updated to be unique between the two [=Audio Element OBU=]s.
+	- Each [=audio_element_obu/audio_substream_id=] is updated to be unique between the two [=Audio Element OBU=]s.
+	- [=ParamDefinition/parameter_id=]s in [=ParamDefinition()=]s carried in each [=Audio Element OBU=] are updated to be unique within the new [=IA Sequence=], if necessary.
 - [=Mix Presentation OBU=]s: generate new ones which are used for mixing the two [=Audio Element=]s.
-	- [=audio_element_id=]s in each [=Mix Presentation OBU=] are set to indicate the [=audio_element_id=]s of the referred [=Audio Element OBU=]s.
-	- [=parameter_id=]s in [=ParamDefinition()=]s carried in each [=Mix Presentation OBU=] are set to refer their associated [=Parameter Substream=]s.
+	- [=mix_presentation_obu/audio_element_id=]s in each [=Mix Presentation OBU=] are set to indicate the [=audio_element_obu/audio_element_id=]s of the referred [=Audio Element OBU=]s.
+	- [=ParamDefinition/parameter_id=]s in [=ParamDefinition()=]s carried in each [=Mix Presentation OBU=] are set to refer their associated [=Parameter Substream=]s.
 
 Step 2: The ith [=Temporal Unit=] is generated as follows:
 - Place all [=Parameter Block OBU=]s for the ith frame, followed by the [=Audio Frame OBU=]s ith frame (grouped by [=Audio Element=]s). Make the following modifications:
-	- The [=obu_type=]s of the [=Audio Frame OBU=]s are updated to be aligned with the [=audio_substream_id=]s specified in the [=Audio Element OBU=]s.
-	- [=parameter_id=]s in [=Parameter Block OBU=]s are updated to identify their associated [=Parameter Substream=]s based on the [=parameter_id=]s carried in the [=Descriptors=].
+	- The [=obu_type=]s of the [=Audio Frame OBU=]s are updated to be aligned with the [=audio_element_obu/audio_substream_id=]s specified in the [=Audio Element OBU=]s.
+	- [=parameter_block_obu/parameter_id=]s in [=Parameter Block OBU=]s are updated to identify their associated [=Parameter Substream=]s based on the [=ParamDefinition/parameter_id=]s carried in the [=Descriptors=].
 - It may have the immediately preceding [=Temporal Delimiter OBU=].
 
 Step 3: Generate an [=IA Sequence=] which starts with [=Descriptors=] and is followed by [=Temporal Unit=]s in order.
@@ -2693,26 +2693,26 @@ The figure below shows the linking scheme among IDs in the obu_header
ID Linking Scheme
In the above figure, -- [=Codec Config OBU=] with [=codec_config_id=] = 0 is providing [=codec_id=] and its [=decoder_config()=]. +- [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0 is providing [=codec_id=] and its [=decoder_config()=]. - [=Mix Presentation OBU=] with [=mix_presentation_id=] = 21 is saying: - - There are two [=Audio Element=]s([=audio_element_id=] = 11 and 12) which need to be mixed. The [=audio_element_id=] = 11 and the [=audio_element_id=] = 12 are linked to the [=Audio Element OBU=]s with [=audio_element_id=] = 11 and [=audio_element_id=] = 12, respectively. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 32 to be used for mixing of the [=Audio Element=] with [=audio_element_id=] = 11. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 33 to be used for mixing of the [=Audio Element=] with [=audio_element_id=] = 12. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 34 to be used for mixing of the two [=Audio Element=]s. -- [=Audio Element OBU=] with [=audio_element_id=] = 11 is saying: - - This [=Audio Element=] has been coded using [=Codec Config OBU=] with [=codec_config_id=] = 0. - - There are two [=Audio Substream=]s ([=audio_substream_id=] = 0 and 1) in this [=Audio Element=]. The [=audio_substream_id=] = 0 and the [=audio_substream_id=] = 1 are linked to the [=Audio Frame OBU=]s with [=audio_substream_id=] = 0 and [=audio_substream_id=] = 1(i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0 and [=obu_type=] = OBU_IA_Audio_Frame_ID1), respectively. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 31 to be used for demixing of this [=Audio Element=]. -- [=Audio Element OBU=] with [=audio_element_id=] = 12 is saying: - - This [=Audio Element=] has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0. - - There is one [=Audio Substream=] ([=audio_substream_id=] = 2) in this [=Audio Element=]. The [=audio_substream_id=] = 2 is linked to the [=Audio Frame OBU=]s with [=audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2). -- [=Audio Frame OBU=] with [=audio_substream_id=] = 0 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0 of [=Audio Substream=] with [=audio_substream_id=] = 0. -- [=Audio Frame OBU=] with [=audio_substream_id=] = 1 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID1) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0 of [=Audio Substream=] with [=audio_substream_id=] = 1. -- [=Audio Frame OBU=] with [=audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0 of [=Audio Substream=] with [=audio_substream_id=] = 2. -- [=Parameter Block OBU=] with [=parameter_id=] = 31 is providing [=demixing_info_parameter_data()=] to be applied for demixing of the [=Audio Element=] with [=audio_element_id=] = 11. -- [=Parameter Block OBU=] with [=parameter_id=] = 32 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_id=] = 11. -- [=Parameter Block OBU=] with [=parameter_id=] = 33 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_id=] = 12. -- [=Parameter Block OBU=] with [=parameter_id=] = 34 is providing mix_gain_parameter_data() to be applied to the [=Rendered Mix Presentation=] of the two rendered [=Audio Element=]s. + - There are two [=Audio Element=]s ([=audio_element_obu/audio_element_id=] = 11 and 12) which need to be mixed. The [=mix_presentation_obu/audio_element_id=] = 11 and the [=mix_presentation_obu/audio_element_id=] = 12 are linked to the [=Audio Element OBU=]s with [=audio_element_obu/audio_element_id=] = 11 and [=audio_element_obu/audio_element_id=] = 12, respectively. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 32 to be used for mixing the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 11. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 33 to be used for mixing the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 12. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 34 to be used for mixing the two [=Audio Element=]s. +- [=Audio Element OBU=] with [=audio_element_obu/audio_element_id=] = 11 is saying: + - This [=Audio Element=] has been coded using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. + - There are two [=Audio Substream=]s ([=audio_substream/audio_substream_id=] = 0 and 1, respectively) in this [=Audio Element=]. They are linked to the [=Audio Frame OBU=]s with [=audio_substream/audio_substream_id=] = 0 and [=audio_substream/audio_substream_id=] = 1 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0 and [=obu_type=] = OBU_IA_Audio_Frame_ID1), respectively. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 31 to be used for demixing this [=Audio Element=]. +- [=Audio Element OBU=] with [=audio_element_obu/audio_element_id=] = 12 is saying: + - This [=Audio Element=] has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. + - There is one [=Audio Substream=] ([=audio_substream/audio_substream_id=] = 2) in this [=Audio Element=]. It is linked to the [=Audio Frame OBU=]s with [=audio_substream/audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2). +- [=Audio Frame OBU=] with [=audio_substream/audio_substream_id=] = 0 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. +- [=Audio Frame OBU=] with [=audio_substream/audio_substream_id=] = 1 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID1) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. +- [=Audio Frame OBU=] with [=audio_substream/audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 31 is providing [=demixing_info_parameter_data()=] to be applied for demixing the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 11. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 32 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 11. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 33 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 12. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 34 is providing mix_gain_parameter_data() to be applied to the [=Rendered Mix Presentation=] of the two rendered [=Audio Element=]s. ## Annex B: Rules for Scalable Channel Audio (Normative) ## {#Annex_B}