diff --git a/index.bs b/index.bs index 92456f6d..1755341c 100644 --- a/index.bs +++ b/index.bs @@ -259,9 +259,7 @@ url: https://www.iso.org/standard/77752.html#; spec: MP4-PCM; type: property; # Introduction # {#introduction} -This specification defines an Immersive Audio Model and Formats (IAMF) to provide an immersive audio experience to end-users. -- The term Immersive Audio (IA) means the combination of [=3D audio signal=]s recreating a sound experience close to that of a natural environment. -- The term 3D audio signal means a representation of sound that incorporates additional information beyond traditional stereo or surround sound formats such as Ambisonics (Scene-based), Object-based audio and Channel-based audio (e.g., 3.1.2ch or 7.1.4ch). +This specification defines an Immersive Audio Model and Formats (IAMF) to provide an [=Immersive Audio=] experience to end-users. IAMF is used to provide [=Immersive Audio=] content for presentation on a wide range of devices in both streaming and offline applications. These applications include internet audio streaming, multicasting/broadcasting services, file download, gaming, communication, virtual and augmented reality, and others. In these applications, audio may be played back on a wide range of devices, e.g., headphones, mobile phones, tablets, TVs, sound bars, home theater systems, and big screens. @@ -271,22 +269,22 @@ Here are some typical IAMF use cases and examples of how to instantiate the mode - UC3: Two [=Audio Element=]s (e.g., FOA and Non-diegetic Stereo) are delivered to a mobile device through a unicast network. FOA is rendered to Binaural (or Stereo) and Non-diegetic is rendered to Stereo. After mixing them, it is processed with loudness normalization and is played back on headphones through the mobile device. Example 1: UC1 with [=3D audio signal=] = 3.1.2ch. -- Audio Substream: The Left (L) and Right (R) channels are coded as one audio stream, the Left top front (Ltf) and Right top front (Rtf) channels as one audio stream, the Center channel as one audio stream, and the low-frequency effects (LFE) channel as one audio stream. -- Audio Element (3.1.2ch): Consists of 4 Audio Substreams which are grouped into one [=ChannelGroup=]. +- Audio Substream: The Left (L) and Right (R) channels are coded as one audio stream, the Left top front (Ltf) and Right top front (Rtf) channels as one audio stream, the Center channel as one audio stream, and the Low-Frequency Effects (LFE) channel as one audio stream. +- Audio Element (3.1.2ch): Consists of 4 Audio Substreams which are grouped into one [=Channel Group=]. - Mix Presentation: Provides rendering algorithms for rendering the Audio Element to popular loudspeaker layouts and headphones, and the loudness information of the [=3D audio signal=]. Example 2: UC2 with two [=3D audio signal=]s = 5.1.2ch and Stereo. - Audio Substream: The L and R channels are coded as one audio stream, the Left surround (Ls) and Right surround (Rs) channels as one audio stream, the Ltf and Rtf channels as one audio stream, the Center channel as one audio stream, and the LFE channel as one audio stream. -- Audio Element 1 (5.1.2ch): Consists of 5 Audio Substreams which are grouped into one [=ChannelGroup=]. -- Audio Element 2 (Stereo): Consists of 1 Audio Substream which is grouped into one [=ChannelGroup=]. +- Audio Element 1 (5.1.2ch): Consists of 5 Audio Substreams which are grouped into one [=Channel Group=]. +- Audio Element 2 (Stereo): Consists of 1 Audio Substream which is grouped into one [=Channel Group=]. - Parameter Substream 1-1: Contains mixing parameter values that are applied to Audio Element 1 by considering the home environment. - Parameter Substream 1-2: Contains mixing parameter values that are applied to Audio Element 2 by considering the home environment. - Mix Presentation: Provides rendering algorithms for rendering Audio Elements 1 & 2 to popular loudspeaker layouts, mixing information based on Parameter Substreams 1-1 & 1-2, and loudness information of the [=Rendered Mix Presentation=]. Example 3: UC3 with two [=3D audio signal=]s = First Order Ambisonics (FOA) and Non-diegetic Stereo. - Audio Substream: The L and R channels are coded as one audio stream and each channel of the FOA signal as one audio stream. -- Audio Element 1 (FOA): Consists of 4 Audio Substreams which are grouped into one [=ChannelGroup=]. -- Audio Element 2 (Non-diegetic Stereo): Consists of 1 Audio Substream which is grouped into one [=ChannelGroup=]. +- Audio Element 1 (FOA): Consists of 4 Audio Substreams which are grouped into one [=Channel Group=]. +- Audio Element 2 (Non-diegetic Stereo): Consists of 1 Audio Substream which is grouped into one [=Channel Group=]. - Parameter Substream 1-1: Contains mixing parameter values that are applied to Audio Element 1 by considering the mobile environment. - Parameter Substream 1-2: Contains mixing parameter values that are applied to Audio Element 2 by considering the mobile environment. - Mix Presentation: Provides rendering algorithms for rendering Audio Elements 1 & 2 to popular loudspeaker layouts and headphones, mixing information based on Parameter Substreams 1-1 & 1-2, and loudness information of the [=Rendered Mix Presentation=]. @@ -294,6 +292,8 @@ Example 3: UC3 with two [=3D audio signal=]s = First Order Ambisonics (FOA) and # Immersive Audio Model # {#iamodel} +## Model Overview ## {#model-overview} + This specification defines a model for representing [=Immersive Audio=] contents based on [=Audio Substream=]s contributing to [=Audio Element=]s meant to be rendered and mixed to form one or more presentations as depicted in the figure below.
obu_header
and an OBU payload.
-obu_header() and all OBU payloads including reserved_obu() are byte aligned.
+The obu_header
and all OBU payloads including reserved_obu
are byte aligned.
Syntax
```
-class ia_open_bitstream_unit() {
- obu_header();
+class IAOpenBitstreamUnit() {
+ ObuHeader obu_header;
if (obu_type == OBU_IA_Sequence_Header)
- ia_sequence_header_obu();
+ IaSequenceHeaderObu ia_sequence_header_obu;
else if (obu_type == OBU_IA_Codec_Config)
- codec_config_obu();
+ CodecConfigObu codec_config_obu;
else if (obu_type == OBU_IA_Audio_Element)
- audio_element_obu();
+ AudioElementObu audio_element_obu;
else if (obu_type == OBU_IA_Mix_Presentation)
- mix_presentation_obu();
+ MixPresentationObu mix_presentation_obu;
else if (obu_type == OBU_IA_Parameter_Block)
- parameter_block_obu();
+ ParameterBlockObu parameter_block_obu;
else if (obu_type == OBU_IA_Temporal_Delimiter)
- temporal_delimiter_obu();
+ TemporalDelimiterObu temporal_delimiter_obu;
else if (obu_type == OBU_IA_Audio_Frame)
- audio_frame_obu(true);
+ AudioFrameObu audio_frame_obu(true);
else if (obu_type >= 6 and <= 23)
- audio_frame_obu(false);
+ AudioFrameObu audio_frame_obu(false);
else if (obu_type >=24 and <= 30)
- reserved_obu();
+ ReservedObu reserved_obu;
}
```
@@ -443,7 +444,7 @@ If the syntax element [=obu_type=] is equal to OBU_IA_Sequence_Header, an ordere
Syntax
```
-class obu_header() {
+class ObuHeader() {
unsigned int (5) obu_type;
unsigned int (1) obu_redundant_copy;
unsigned int (1) obu_trimming_status_flag;
@@ -488,7 +489,7 @@ It SHALL always be set to 0 for the following [=obu_type=] values:
If a decoder encounters an OBU with [=obu_redundant_copy=] = 1, and it has also received the previous non-redundant OBU, it MAY ignore the redundant OBU. If the decoder has not received the previous non-redundant OBU, it SHALL treat the redundant copy as a non-redundant OBU and process the OBU accordingly.
-obu_trimming_status_flag indicates whether this OBU has audio samples to be trimmed. It SHALL be set only when [=obu_type=] is set to OBU_IA_Audio_Frame or OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17.
+obu_trimming_status_flag indicates whether this OBU has audio samples to be trimmed. It SHALL be set to 0 or 1 if the [=obu_type=] is set to OBU_IA_Audio_Frame or OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17. Otherwise, it SHALL be set to 0.
For a given coded [=Audio Substream=],
- If an [=Audio Frame OBU=] has its [=num_samples_to_trim_at_start=] field set to a non-zero value N, the decoder SHALL discard the first N audio samples.
@@ -500,8 +501,8 @@ NOTE: Because of coding dependency, discarding a sample can sometimes mean decod
NOTE: This means that if one of the values is set to the number of samples in the [=Audio Frame OBU=] (i.e., [=num_samples_per_frame=]), the other value is set to 0.
-- When [=num_samples_to_trim_at_start=] is non-zero, all [=Audio Frame OBU=]s with the same [=audio_substream_id=], and preceding this OBU back until the [=Codec Config OBU=] defining this [=Audio Substream=], SHALL have their [=num_samples_to_trim_at_start=] field equal to the number of samples in the corresponding [=Audio Frame OBU=] (i.e., [=num_samples_per_frame=]).
-- When [=num_samples_to_trim_at_end=] is non-zero in an [=Audio Frame OBU=], there SHALL be no subsequent [=Audio Frame OBU=] with the same [=audio_substream_id=] until a non-redundant [=Codec Config OBU=] defining an [=Audio Substream=] with the same [=audio_substream_id=].
+- When [=num_samples_to_trim_at_start=] is non-zero, all [=Audio Frame OBU=]s with the same [=audio_substream/audio_substream_id=], and preceding this OBU back until the [=Codec Config OBU=] defining this [=Audio Substream=], SHALL have their [=num_samples_to_trim_at_start=] field equal to the number of samples in the corresponding [=Audio Frame OBU=] (i.e., [=num_samples_per_frame=]).
+- When [=num_samples_to_trim_at_end=] is non-zero in an [=Audio Frame OBU=], there SHALL be no subsequent [=Audio Frame OBU=] with the same [=audio_substream/audio_substream_id=] until a non-redundant [=Codec Config OBU=] defining an [=Audio Substream=] with the same [=audio_substream/audio_substream_id=].
obu_extension_flag indicates whether the [=extension_header_size=] field is present. If it is set to 0, the [=extension_header_size=] field SHALL NOT be present. Otherwise, the [=extension_header_size=] field SHALL be present.
@@ -511,22 +512,22 @@ NOTE: A future version of the specification may use this flag to specify an exte
obu_size indicates the size in bytes of the OBU immediately following the obu_size field of the OBU. An OBU MAY have extra bytes after consuming all the bytes per the OBU syntax definition. Parsers compliant with this version of the specification SHOULD ignore the extra bytes.
-num_samples_to_trim_at_start indicates the number of samples that need to be trimmed from the start of the samples in this [=Audio Frame OBU=].
-
num_samples_to_trim_at_end indicates the number of samples that need to be trimmed from the end of the samples in this [=Audio Frame OBU=].
+num_samples_to_trim_at_start indicates the number of samples that need to be trimmed from the start of the samples in this [=Audio Frame OBU=].
+
extension_header_size indicates the size in bytes of the extension header immediately following this field.
extension_header_bytes indicates the byte representations of the syntaxes of the extension header.
## Reserved OBU Syntax and Semantics ## {#obu-reserved}
-Reserved OBUs SHOULD be ignored by parsers compliant with this version of the specification. Future versions of the specification MAY define semantics for these reserved OBUs that would only be supported by parsers compliant with these future versions.
+Reserved OBUs SHOULD be ignored by parsers compliant with this version of the specification. Future versions of the specification MAY define syntax and semantics for these reserved OBUs that would only be supported by parsers compliant with these future versions.
Syntax
```
-class reserved_obu() {
+class ReservedObu() {
}
```
@@ -539,12 +540,12 @@ This OBU is used to indicate the start of an [=IA Sequence=]. So, the first OBU
NOTE: When an [=IA Sequence=] is stored in a file, the [=IA Sequence Header OBU=] can be used to identify that the file contains an [=IA Sequence=].
-This OBU MAY be placed frequently within one single [=IA Sequence=] for an application such as broadcasting or multicasting. In that case, all [=IA Sequence Header OBU=]s except the first one SHALL be marked as redundant (i.e., [=obu_redundant_copy=] = 1).
+This OBU MAY be placed frequently within one single [=IA Sequence=] for an application such as broadcasting or multicasting. In that case, all [=IA Sequence Header OBU=]s except the first one SHALL be marked as redundant (i.e., [=obu_redundant_copy=] = 1). So, if a decoder encounters a non-redundant [=IA Sequence Header OBU=] (i.e., [=obu_redundant_copy=] = 0), and it has also received the previous [=IA Sequence Header OBU=], the non-redundant [=IA Sequence Header OBU=] indicates the start of a new [=IA Sequence=].
Syntax
```
-class ia_sequence_header_obu() {
+class IaSequenceHeaderObu() {
unsigned int (32) ia_code;
unsigned int (8) primary_profile;
unsigned int (8) additional_profile;
@@ -555,7 +556,7 @@ class ia_sequence_header_obu() {
ia_code is a ‘four-character code’ (4CC), iamf
.
-NOTE: When IA OBUs are delivered over a protocol that does not provide explicit [=IA Sequence=] boundaries, a parser may locate the [=IA Sequence=] start by searching for the code iamf
preceded by specific OBU header values. For example, by assuming that [=obu_extension_flag=] is set to 0 and because [=obu_trimming_status_flag=] is set to 0 for an [=IA Sequence Header OBU=], the OBU header can be 0xF806 or 0xFC06.
+NOTE: When IA OBUs are delivered over a protocol that does not provide explicit [=IA Sequence=] boundaries, a parser may locate the [=IA Sequence=] start by searching for the code iamf
preceded by specific OBU header values. For example, by assuming that [=obu_extension_flag=] is set to 0 and because [=obu_trimming_status_flag=] is set to 0 for an [=IA Sequence Header OBU=], the OBU header can be either 0xF806 or 0xFC06.
primary_profile indicates the primary profile that this [=IA Sequence=] complies with. Parsers compliant with this version of the specification SHOULD discard the [=IA Sequence=] if they do not support the value indicated here.
@@ -575,7 +576,7 @@ This section specifies the OBU payload of OBU_IA_Codec_Config.
Syntax
```
-class codec_config_obu() {
+class CodecConfigObu() {
leb128() codec_config_id;
codec_config();
}
@@ -590,7 +591,7 @@ class codec_config() {
Semantics
-codec_config_id defines an identifier for a codec configuration. Within an [=IA Sequence=], there SHALL be one unique [=codec_config_id=] per codec. There SHALL be exactly one [=Codec Config OBU=] with a given identifier in a set of [=Descriptors=]. [=Audio Element=]s use this identifier to indicate that its corresponding [=Audio Substream=]s are coded with this codec configuration.
+codec_config_id defines an identifier for a codec configuration. Within an [=IA Sequence=], there SHALL be one unique [=codec_config_obu/codec_config_id=] per codec. There SHALL be exactly one [=Codec Config OBU=] with a given identifier in a set of [=Descriptors=]. [=Audio Element=]s use this identifier to indicate that its corresponding [=Audio Substream=]s are coded with this codec configuration.
codec_id indicates a ‘four-character code’ (4CC) to identify the codec used to generate the coded [=Audio Substream=]s. For this version of the specification, it SHALL be set to one of the four [=codec_id=] values defined below:
- 'Opus': All coded [=Audio Substream=]s referred to by all [=Audio Element=]s with this codec configuration SHALL comply with the [[!RFC6716]] specification and the [=decoder_config()=] structure SHALL comply with the constraints given in [[#opus-specific]].
@@ -602,9 +603,9 @@ Parsers compliant with this version of the specification SHOULD ignore [=Codec C
NOTE: 'ipcm' should not be confused with lpcm
, which is another 4CC to identify codecs in other container formats (e.g., QuickTime).
-num_samples_per_frame indicates the frame length, in samples, of the [=audio_frame()=] provided in the audio_frame_obu(). It SHALL NOT be set to zero. If the [=decoder_config()=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal.
+num_samples_per_frame indicates the frame length, in samples, of the [=audio_frame()=] provided in the audio_frame_obu
. It SHALL NOT be set to zero. If the [=decoder_config()=] structure for a given codec specifies a value for the frame length, the two values SHALL be equal.
-audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the encoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate.
+audio_roll_distance indicates how many audio frames prior to the current audio frame need to be decoded (and the decoded samples discarded) to set the decoder in a state that will produce the perfect decoded audio signal. It SHALL always be a negative value or zero. For some audio codecs, even if an audio frame can be decoded independently, the decoded signal after decoding only that frame may not represent a perfect, decoded audio signal, even ignoring compression artifacts. This can be due to overlap transforms. While potentially acceptable when starting to decode an [=Audio Substream=], it may be problematic when automatically switching between similar [=Audio Substream=]s of different quality and/or bitrate.
- It SHALL be set to -R when [=codec_id=] is set to 'Opus', where R is ceil(3840 / [=num_samples_per_frame=])
.
- It SHALL be set to -1 when [=codec_id=] is set to 'mp4a'.
- It SHALL be set to 0 when [=codec_id=] is set to 'fLaC' or 'ipcm'.
@@ -619,7 +620,7 @@ This section specifies the OBU payload of OBU_IA_Audio_Element.
Syntax
```
-class audio_element_obu() {
+class AudioElementObu() {
leb128() audio_element_id;
unsigned int (3) audio_element_type;
unsigned int (5) reserved;
@@ -678,7 +679,7 @@ class ReconGainParamDefinition() extends ParamDefinition() {
Semantics
-audio_element_id defines an identifier for an [=Audio Element=]. Within an [=IA Sequence=], there SHALL be one unique [=audio_element_id=] per [=Audio Element=]. There SHALL be exactly one [=Audio Element OBU=] with a given identifier in a set of [=Descriptors=]. [=Mix Presentation=]s refer to a particular [=Audio Element=] using this identifier.
+audio_element_id defines an identifier for an [=Audio Element=]. Within an [=IA Sequence=], there SHALL be one unique [=audio_element_obu/audio_element_id=] per [=Audio Element=]. There SHALL be exactly one [=Audio Element OBU=] with a given identifier in a set of [=Descriptors=]. [=Mix Presentation=]s refer to a particular [=Audio Element=] using this identifier.
audio_element_type specifies the audio representation of this [=Audio Element=], which is constructed from one or more [=Audio Substream=]s. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a reserved [=audio_element_type=].
@@ -689,17 +690,17 @@ audio_element_type: The type of audio representation.
2~7 : Reserved
-codec_config_id indicates the identifier for the codec configuration which this [=Audio Element=] refers to. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a [=codec_config_id=] identifying an unknown [=codec_id=].
+codec_config_id indicates the identifier for the codec configuration which this [=Audio Element=] refers to. Parsers compliant with this version of the specification SHOULD ignore [=Audio Element OBU=]s with a [=audio_element_obu/codec_config_id=] identifying an unknown [=codec_id=].
num_substreams specifies the number of [=Audio Substream=]s that are used to reconstruct this [=Audio Element=]. It SHALL NOT be set to 0.
-audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to.
+audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to.
-Let a particular [=ChannelGroup=]'s [=Audio Substream=]s be indexed as [c, n_c], where a [=ChannelGroup=] generation rule is described in [[#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule]] and
-- [=c=] = [1, ..., C] is the [=ChannelGroup=] index and C is the number of [=ChannelGroup=]s.
-- [=n_c=] = [1, ..., N_c] is the [=Audio Substream=] index in the c-th [=ChannelGroup=] and N_c is the number of [=Audio Substream=]s in the c-th [=ChannelGroup=].
+Let a particular [=Channel Group=]'s [=Audio Substream=]s be indexed as [c, n_c], where a [=Channel Group=] generation rule is described in [[#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule]] and
+- [=c=] = [1, ..., C] is the [=Channel Group=] index and C is the number of [=Channel Group=]s.
+- [=n_c=] = [1, ..., N_c] is the [=Audio Substream=] index in the c-th [=Channel Group=] and N_c is the number of [=Audio Substream=]s in the c-th [=Channel Group=].
-Then, the i-th [=audio_substream_id=] maps to a [=ChannelGroup=]'s [=Audio Substream=]s as follows, where i is the index of the array:
+Then, the i-th [=audio_element_obu/audio_substream_id=] maps to a [=Channel Group=]'s [=Audio Substream=]s as follows, where i is the index of the array:
```
[
@@ -710,7 +711,7 @@ Then, the i-th [=audio_substream_id=] maps to a [=ChannelGroup=]'s [=Audio Subst
]
```
-The order of the [=Audio Substream=]s in each [=ChannelGroup=] (i.e., the semantics of n_c) is specified in [[#syntax-scalable-channel-layout-config]].
+The order of the [=Audio Substream=]s in each [=Channel Group=] (i.e., the semantics of n_c) is specified in [[#syntax-scalable-channel-layout-config]].
num_parameters specifies the number of [=Parameter Substream=]s that are used by the algorithms specified in this [=Audio Element=].
@@ -752,9 +753,9 @@ In this parameter definition,
- [=parameter_rate=] SHALL be set to the sample rate of this [=Audio Element=].
- [=param_definition_mode=] SHALL be set to 0.
-- [=duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=].
-- [=num_subblocks=] SHALL be set to 1.
-- [=constant_subblock_duration=] SHALL be the same as [=duration=].
+- [=ParamDefinition/duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=].
+- [=ParamDefinition/num_subblocks=] SHALL be set to 1.
+- [=ParamDefinition/constant_subblock_duration=] SHALL be the same as [=ParamDefinition/duration=].
recon_gain_info provides the parameter definition for the gain value, which is used to reconstruct a scalable channel audio representation. The parameter definition is provided by ReconGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks is specified in [=recon_gain_info_parameter_data()=].
@@ -762,9 +763,9 @@ In this parameter definition,
- [=parameter_rate=] SHALL be set to the sample rate of this [=Audio Element=].
- [=param_definition_mode=] SHALL be set to 0.
-- [=duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=].
-- [=num_subblocks=] SHALL be set to 1.
-- [=constant_subblock_duration=] SHALL be same as [=duration=].
+- [=ParamDefinition/duration=] SHALL be the same as [=num_samples_per_frame=] of this [=Audio Element=].
+- [=ParamDefinition/num_subblocks=] SHALL be set to 1.
+- [=ParamDefinition/constant_subblock_duration=] SHALL be same as [=ParamDefinition/duration=].
param_definition_size indicates the size in bytes of [=param_definition_bytes=].
@@ -779,7 +780,7 @@ In this parameter definition,
audio_element_config_bytes represents reserved bytes for future use when new [=audio_element_type=] values are defined. Parsers compliant with this version of the specification SHOULD ignore these bytes.
-default_demixing_info_parameter_data() provides the default demixing parameter data to apply to all audio samples when there are no [=Parameter Block OBU=]s (with the same [=parameter_id=] defined in this DemixingParamDefinition()) provided.
+default_demixing_info_parameter_data() provides the default demixing parameter data to apply to all audio samples when there are no [=Parameter Block OBU=]s (with the same [=ParamDefinition/parameter_id=] defined in this DemixingParamDefinition()) provided.
- In this class, [=w_idx_offset=] in [=demixing_info_parameter_data()=] SHALL be ignored.
- Instead, [=default_w=] directly indicates the weight value [=w(k)=].
@@ -802,7 +803,7 @@ The mapping of [=default_w=] to [=w(k)=] SHOULD be as follows:
11 ~ 15 : reserved
-A default recon gain value of 0 dB is implied when there are no [=Parameter Block OBU=]s (with the same [=parameter_id=] defined in this ReconGainParamDefinition()) provided.
+A default recon gain value of 0 dB is implied when there are no [=Parameter Block OBU=]s (with the same [=ParamDefinition/parameter_id=] defined in this ReconGainParamDefinition()) provided.
### Parameter Definition Syntax and Semantics ### {#parameter-definition}
@@ -831,31 +832,31 @@ abstract class ParamDefinition() {
Semantics
-parameter_id indicates the identifier for the [=Parameter Substream=] which this parameter definition refers to. There SHALL be one unique [=parameter_id=] per [=Parameter Substream=].
+parameter_id indicates the identifier for the [=Parameter Substream=] which this parameter definition refers to. There SHALL be one unique [=ParamDefinition/parameter_id=] per [=Parameter Substream=].
parameter_rate specifies the rate used by this [=Parameter Substream=], expressed as ticks per second. Time-related fields associated with this [=Parameter Substream=], such as durations, SHALL be expressed in the number of ticks.
- The rate SHALL be a value such that (the rate * [=num_samples_per_frame=]) / (the sample rate of [=Audio Element=]) is a non-zero integer.
-param_definition_mode indicates whether this parameter definition specifies the [=duration=], [=num_subblocks=], [=constant_subblock_duration=] and [=subblock_duration=] fields for the parameter blocks with the same [=parameter_id=].
+param_definition_mode indicates whether this parameter definition specifies the [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=] and [=ParamDefinition/subblock_duration=] fields for the parameter blocks with the same [=parameter_block_obu/parameter_id=].
-- When this field is set to 0, all of the [=duration=], [=num_subblocks=], [=constant_subblock_duration=], and [=subblock_duration=] fields SHALL be specified in this parameter definition. None of the parameter blocks with the same [=parameter_id=] SHALL specify these same fields.
+- When this field is set to 0, all of the [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=] fields SHALL be specified in this parameter definition. None of the parameter blocks with the same [=parameter_block_obu/parameter_id=] SHALL specify these same fields.
-- When this field is set to 1, none of the [=duration=], [=num_subblocks=], [=constant_subblock_duration=], and [=subblock_duration=] fields SHALL be specified in this parameter definition. Instead, each parameter block with the same [=parameter_id=] SHALL specify these same fields.
+- When this field is set to 1, none of the [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=] fields SHALL be specified in this parameter definition. Instead, each parameter block with the same [=parameter_block_obu/parameter_id=] SHALL specify these same fields.
-duration specifies the duration for which each parameter block with the same [=parameter_id=] is valid and applicable. It SHALL NOT be set to 0.
+duration specifies the duration for which each parameter block with the same [=parameter_block_obu/parameter_id=] is valid and applicable. It SHALL NOT be set to 0.
-constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of constant_subblock_duration SHALL be set to 0.
+constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of [=ParamDefinition/constant_subblock_duration=] SHALL be set to 0.
-Let D = the value of [=duration=], NS = the value of [=num_subblocks=], CSD = the value of [=constant_subblock_duration=] and SD = the value of [=subblock_duration=].
-- When [=CSD=] != 0, [=num_subblocks=] is implicitly calculated as [=NS=] = ceil([=D=] / [=CSD=]).
+Let D = the value of [=ParamDefinition/duration=], NS = the value of [=ParamDefinition/num_subblocks=], CSD = the value of [=ParamDefinition/constant_subblock_duration=] and SD = the value of [=ParamDefinition/subblock_duration=].
+- When [=CSD=] != 0, [=ParamDefinition/num_subblocks=] is implicitly calculated as [=NS=] = ceil([=D=] / [=CSD=]).
- If [=NS=] * [=CSD=] > [=D=], the actual duration of the last subblock SHALL be [=D=] - ([=NS=] - 1) * [=CSD=].
- When [=CSD=] = 0, the summation of all [=SD=]s in this parameter block SHALL be equal to [=D=].
-num_subblocks specifies the number of different sets of parameter values specified in each parameter block with the same [=parameter_id=], where each set describes a different subblock of the timeline, contiguously.
+num_subblocks specifies the number of different sets of parameter values specified in each parameter block with the same [=parameter_block_obu/parameter_id=], where each set describes a different subblock of the timeline, contiguously.
-subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0.
+subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0.
-The values for [=duration=], [=constant_subblock_duration=], and [=subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition.
+The values for [=ParamDefinition/duration=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition.
### Scalable Channel Layout Config Syntax and Semantics ### {#syntax-scalable-channel-layout-config}
@@ -887,10 +888,10 @@ class channel_audio_layer_config(i) {
}
```
-When an [=Audio Element=] is composed of G(r) number of [=Audio Substream=]s, its scalable channel audio representation is layered into [=num_layers=] = r number of [=ChannelGroup=]s.
+When an [=Audio Element=] is composed of G(r) number of [=Audio Substream=]s, its scalable channel audio representation is layered into [=num_layers=] = r number of [=Channel Group=]s.
-- The order of the [=ChannelGroup=]s in each [=Temporal Unit=] SHALL be same as the order of channel_audio_layer_config()s in scalable_channel_layout_config().
-- The q-th [=ChannelGroup=] consists of G(q) - G(q-1) number of [=Audio Substream=]s, where q = 1, 2, ..., r and G(0) = 0.
+- The order of the [=Channel Group=]s in each [=Temporal Unit=] SHALL be same as the order of channel_audio_layer_config()s in scalable_channel_layout_config().
+- The q-th [=Channel Group=] consists of G(q) - G(q-1) number of [=Audio Substream=]s, where q = 1, 2, ..., r and G(0) = 0.
- Let the term "Audio Frames" mean the set of all [=Audio Frame OBU=]s (for this [=Audio Element=]) that have the same start timestamp. All Audio Frames in an [=IA Sequence=] SHALL have the same number of [=Audio Frame OBU=]s.
- [=Parameter Block OBU=]s MAY be associated with Audio Frames.
@@ -898,20 +899,20 @@ When an [=Audio Element=] is composed of G(r) number of [=Audio Substream=]s, it
mix_presentation_obu
specifies how to render, process and mix one or more [=Audio Element=]s, with details provided in [[#processing-mixpresentation]].
An [=IA Sequence=] MAY have one or more [=Mix Presentation=]s specified. The IA parser SHALL select the appropriate [=Mix Presentation=] to process according to the rules specified in [[#processing-mixpresentation-selection]].
@@ -1066,7 +1067,7 @@ A [=Mix Presentation=] MAY contain one or more sub-mixes. Common use cases MAY s
Syntax
```
-class mix_presentation_obu() {
+class MixPresentationObu() {
leb128() mix_presentation_id;
leb128() count_label;
for (i = 0; i < count_label; i++) {
@@ -1113,7 +1114,7 @@ class mix_presentation_obu() {
num_audio_elements specifies the number of [=Audio Element=]s that are used in this [=Mix Presentation=] to generate the final output audio signal for playback. It SHALL NOT be set to 0.
-audio_element_id indicates the identifier for an [=Audio Element=] which this [=Mix Presentation=] refers to.
+audio_element_id indicates the identifier for an [=Audio Element=] which this [=Mix Presentation=] refers to.
mix_presentation_element_annotations() provides informational metadata that the playback system MAY use to display information to the user. It is not used in the rendering or mixing process to generate the final output audio signal.
@@ -1220,9 +1221,9 @@ class MixGainParamDefinition() extends ParamDefinition() {
Semantics
-mix_gain provides the parameter definition for the gain value that is applied to all channels of the rendered [=Audio Element=] signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_id=] is specified in [=mix_gain_parameter_data()=].
+mix_gain provides the parameter definition for the gain value that is applied to all channels of the rendered [=Audio Element=] signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_block_obu/parameter_id=] is specified in [=mix_gain_parameter_data()=].
-default_mix_gain specifies the default mix gain value to apply when there are no mix gain parameter blocks with the same [=parameter_id=] provided. This value is expressed in dB and SHALL be applied to all channels in the rendered [=Audio Element=]. It is stored as a 16-bit, signed, two's complement fixed-point value with 8 fractional bits (i.e., Q7.8 in [[!Q-Format]]).
+default_mix_gain specifies the default mix gain value to apply when there are no mix gain parameter blocks with the same [=parameter_block_obu/parameter_id=] provided. This value is expressed in dB and SHALL be applied to all channels in the rendered [=Audio Element=]. It is stored as a 16-bit, signed, two's complement fixed-point value with 8 fractional bits (i.e., Q7.8 in [[!Q-Format]]).
### Output Mix Config Syntax and Semantics ### {#obu-mixpresentation-outputmix}
@@ -1239,7 +1240,7 @@ class output_mix_config() {
Semantics
-output_mix_gain provides the parameter definition for the gain value that is applied to all channels of the mixed audio signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_id=] is specified in [=mix_gain_parameter_data()=].
+output_mix_gain provides the parameter definition for the gain value that is applied to all channels of the mixed audio signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks with the same [=parameter_block_obu/parameter_id=] is specified in [=mix_gain_parameter_data()=].
### Layout Syntax and Semantics ### {#syntax-layout}
@@ -1377,7 +1378,7 @@ The metadata specified in this OBU defines the parameter values for an algorithm
Syntax
```
-class parameter_block_obu() {
+class ParameterBlockObu() {
leb128() parameter_id;
(param_definition_type, param_definition_mode, duration, num_subblocks, constant_subblock_duration, subblock_duration) = get_param_definition(parameter_id);
@@ -1418,23 +1419,23 @@ class parameter_block_obu() {
Semantics
-parameter_id indicates the identifier for a [=Parameter Substream=] which this [=Parameter Block OBU=] refers to. If no [=Audio Element OBU=]s or [=Mix Presentation OBU=]s refer to this [=parameter_id=], parsers compliant with this version of the specification SHOULD ignore [=Parameter Block OBU=]s with this identifier.
+parameter_id indicates the identifier for a [=Parameter Substream=] which this [=Parameter Block OBU=] refers to. If no [=Audio Element OBU=]s or [=Mix Presentation OBU=]s refer to this [=parameter_block_obu/parameter_id=], parsers compliant with this version of the specification SHOULD ignore [=Parameter Block OBU=]s with this identifier.
-get_param_definition() is a run-time function to get the [=param_definition_type=] and [=param_definition_mode=] from the [=Audio Element OBU=] or [=Mix Presentation OBU=] that references this [=parameter_id=].
+get_param_definition() is a run-time function to get the [=param_definition_type=] and [=param_definition_mode=] from the [=Audio Element OBU=] or [=Mix Presentation OBU=] that references this [=parameter_block_obu/parameter_id=].
-If [=param_definition_mode=] = 0, this function additionally gets the following fields from the same [=Audio Element OBU=]: [=duration=], [=num_subblocks=], [=constant_subblock_duration=], and [=subblock_duration=].
+If [=param_definition_mode=] = 0, this function additionally gets the following fields from the same [=Audio Element OBU=] or [=Mix Presentation OBU=]: [=ParamDefinition/duration=], [=ParamDefinition/num_subblocks=], [=ParamDefinition/constant_subblock_duration=], and [=ParamDefinition/subblock_duration=].
When it gets an unknown [=param_definition_type=], parsers compliant with this version of the specification SHOULD ignore the [=Parameter Block OBU=].
-duration specifies the duration for which this parameter block is valid and applicable. It SHALL NOT be set to 0.
+duration specifies the duration for which this parameter block is valid and applicable. It SHALL NOT be set to 0.
-constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of constant_subblock_duration SHALL be set to 0.
+constant_subblock_duration specifies the duration of each subblock, in the case where all subblocks except the last subblock have equal durations. If all subblocks except the last subblock do not have equal durations, the value of [=parameter_block_obu/constant_subblock_duration=] SHALL be set to 0.
-num_subblocks specifies the number of different sets of parameter values specified in this parameter block, where each set describes a different subblock of the timeline, contiguously. When [=constant_subblock_duration=] != 0, [=num_subblocks=] is implicitly calculated as [=num_subblocks=] = ceil([=duration=] / [=constant_subblock_duration=]).
+num_subblocks specifies the number of different sets of parameter values specified in this parameter block, where each set describes a different subblock of the timeline, contiguously. When [=parameter_block_obu/constant_subblock_duration=] != 0, [=parameter_block_obu/num_subblocks=] is implicitly calculated as [=parameter_block_obu/num_subblocks=] = ceil([=parameter_block_obu/duration=] / [=parameter_block_obu/constant_subblock_duration=]).
-subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0.
+subblock_duration specifies the duration for the given subblock. It SHALL NOT be set to 0.
-The values of [=duration=], [=constant_subblock_duration=], and [=subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition.
+The values of [=parameter_block_obu/duration=], [=parameter_block_obu/constant_subblock_duration=], and [=parameter_block_obu/subblock_duration=] SHALL be expressed as the number of ticks at the [=parameter_rate=] specified in the corresponding parameter definition.
parameter_data_size indicates the size in bytes of [=parameter_data_bytes=].
@@ -1567,12 +1568,12 @@ class recon_gain_info_parameter_data() {
This section specifies the OBU payloads of OBU_IA_Audio_Frame and OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17.
-audio_substream_id is an identifier for an [=Audio Substream=] associated with this audio frame. Within an [=IA Sequence=], there SHALL be one unique [=audio_substream_id=] per [=Audio Substream=]. There SHALL be exactly one [=Audio Element OBU=] with a given [=audio_substream_id=] in a set of [=Descriptors=].
+audio_substream_id defines an identifier for an [=Audio Substream=] associated with this audio frame. Within an [=IA Sequence=], there SHALL be one unique [=audio_substream/audio_substream_id=] per [=Audio Substream=]. There SHALL be exactly one [=Audio Element OBU=] with a given [=audio_element_obu/audio_substream_id=] in a set of [=Descriptors=].
Syntax
```
-class audio_frame_obu(audio_substream_id_in_bitstream) {
+class AudioFrameObu(audio_substream_id_in_bitstream) {
if (audio_substream_id_in_bitstream) {
leb128() explicit_audio_substream_id;
}
@@ -1582,14 +1583,14 @@ class audio_frame_obu(audio_substream_id_in_bitstream) {
Semantics
-The variable audio_substream_id_in_bitstream does not exist in an [=IA Sequence=]. It is an indicator of whether this OBU payload includes an explicit [=audio_substream_id=] and its value is based on the [=obu_type=], as follows:
+The variable audio_substream_id_in_bitstream does not exist in an [=IA Sequence=]. It is an indicator of whether this OBU payload includes an explicit [=audio_substream/audio_substream_id=] and its value is based on the [=obu_type=], as follows:
- true
for [=obu_type=] = OBU_IA_Audio_Frame.
- false
for [=obu_type=] = OBU_IA_Audio_Frame_ID0, OBU_IA_Audio_Frame_ID1, ..., or OBU_IA_Audio_Frame_ID17.
-explicit_audio_substream_id defines the [=audio_substream_id=] of this frame. The value SHALL be greater than 17. When this field is not present, [=audio_substream_id=] is implicit and is defined as a value from 0 to 17 for OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, respectively.
+explicit_audio_substream_id indicates the [=audio_substream/audio_substream_id=] of this frame. The value SHALL be greater than 17. When this field is not present, [=audio_substream/audio_substream_id=] is implicit and is defined as a value from 0 to 17 for OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, respectively.
-NOTE: The first 18 [=Audio Substream=]s in an [=IA Sequence=] MAY use the OBU types OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, which have predefined [=audio_substream_id=]s associated with them. This reduces bitrate by avoiding the extra [=explicit_audio_substream_id=] field in the bitstream.
+NOTE: The first 18 [=Audio Substream=]s in an [=IA Sequence=] MAY use the OBU types OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID17, which have predefined [=audio_substream/audio_substream_id=]s associated with them. This reduces bitrate by avoiding the extra [=explicit_audio_substream_id=] field in the bitstream.
coded_frame_size is the size of [=audio_frame()=] in bytes.
@@ -1602,7 +1603,7 @@ This section specifies the OBU payload of OBU_IA_Temporal_Delimiter.
Syntax
```
-class temporal_delimiter_obu() {
+class TemporalDelimiterObu() {
}
```
@@ -1691,7 +1692,7 @@ class decoder_config(ipcm) {
sample_size complies with [=PCM_sample_size=] specified in [[!MP4-PCM]]. In other words, it SHALL take a value from the set {16, 24, 32}.
-sample_rate indicates the sample rate of the input audio in Hz. It SHALL take a value from the set {44.1k, 16k, 32k, 48k, 96k}.
+sample_rate indicates the sample rate of the input [=3D audio signal=] in Hz. It SHALL take a value from the set {44.1k, 16k, 32k, 48k, 96k}.
The format of [=audio_frame()=] is only one single mono or stereo PCM audio frame.
- If [=audio_frame()=] contains a stereo PCM audio frame, the ith audio sample of the Left channel is followed by the ith audio sample of the Right channel, and then the (i+1)th audio sample of the Left channel is followed by the (i+1)th audio sample of the Right channel, where i = 1, 2, ..., [=num_samples_per_frame=] - 1.
@@ -1825,7 +1826,7 @@ NOTE: In a typical case, the OBUs in the first [=Descriptors=] of an [=IA Sequen
A file conformant to this specification satisfies the following:
- It SHALL conform to the normative requirements of [[!ISOBMFF]]
-- It SHALL have the iamf brand among the compatible brands array of the FileTypeBox
+- It SHALL have the iamf brand among the compatible brands array of the FileTypeBox
- It SHALL contain at least one track using an [=IASampleEntry=]
- It SHOULD indicate a structural ISOBMFF brand among the compatible brands' array of the FileTypeBox, such as 'iso6'
- It MAY indicate other brands not specified in this specification provided that the associated requirements do not conflict with those given in this specification
@@ -1864,7 +1865,7 @@ NOTE: Multiple sample entries may be used in a track, for example when the track
### IA Sample Entry ### {#iasampleentry-section}
- Sample Entry Type: iamf + Sample Entry Type: iamf Container: Sample Description Box ('stsd') Mandatory: Yes Quantity: One or more. @@ -2041,12 +2042,12 @@ The figure below shows the decoding and reconstruction flowchart.For a given loudspeaker layout (i.e., CL #i) among the list of [=loudspeaker_layout=] in [=scalable_channel_layout_config()=], -- The OBU Parser SHALL output the [=Audio Substream=]s for [=ChannelGroup=] #1 to [=ChannelGroup=] #i and pass them to the Codec Decoder, along with [=decoder_config()=]. +- The OBU Parser SHALL output the [=Audio Substream=]s for [=Channel Group=] #1 to [=Channel Group=] #i and pass them to the Codec Decoder, along with [=decoder_config()=]. - The Codec Decoder SHALL output the decoded PCM channels. - For non-scalable audio (i.e., i = [=num_layers=] = 1), its order SHALL be converted to the loudspeaker location order for CL #1. - For scalable audio (i.e., i > 1), the output channels SHALL have the same order as the originally transmitted order of the coded channels. - For scalable audio (i.e., i > 1), the decoded PCM channels are further processed as: - - When [=output_gain_is_present_flag=](j) for [=ChannelGroup=] #j (j = 1, 2, …, i-1) is set to 1, the Gain module SHALL apply [=output_gain=](j) to all audio samples of the mixed channels in [=ChannelGroup=] #j indicated by [=output_gain_flag=](j). + - When [=output_gain_is_present_flag=](j) for [=Channel Group=] #j (j = 1, 2, …, i-1) is set to 1, the Gain module SHALL apply [=output_gain=](j) to all audio samples of the mixed channels in [=Channel Group=] #j indicated by [=output_gain_flag=](j). - The De-Mixer SHALL output de-mixed PCM channels for CL #i generated through de-mixing of the mixed channels from the Gain module by using non-mixed channels and demixing parameters for each frame. - The Recon_Gain module SHALL output smoothed PCM channels by applying [=recon_gain=] to each frame of the de-mixed channels. - The order for the Non-mixed channels and Smoothed channels SHALL be converted to the loudspeaker location order for CL #i after going through the necessary modules such as Gain, De-Mixer, Recon_Gain, etc. @@ -2055,7 +2056,7 @@ The following sections, [[#processing-scalablechannelaudio-gain]], [[#processing ### Gain ### {#processing-scalablechannelaudio-gain} -The Gain module is the mirror process of the Attenuation module (described in [[#iamfgeneration-scalablechannelaudio]]). It recovers the reduced sample values using [=output_gain=](i) when its [=output_gain_is_present_flag=](i) for [=ChannelGroup=] #i is set to 1. When its [=output_gain_is_present_flag=](i) is set to 0, then this module SHALL be bypassed for [=ChannelGroup=] #i. The value of [=output_gain=](i) for [=ChannelGroup=] #i SHALL be applied to all samples of the mixed channels in [=ChannelGroup=] #i, where a mixed channel means the channel created by mixing multiple channels of an input channel audio when generating [=down-mixed audio=] from the input channel audio (i.e., the channel audio for CL #n). +The Gain module is the mirror process of the Attenuation module (described in [[#iamfgeneration-scalablechannelaudio]]). It recovers the reduced sample values using [=output_gain=](i) when its [=output_gain_is_present_flag=](i) for [=Channel Group=] #i is set to 1. When its [=output_gain_is_present_flag=](i) is set to 0, then this module SHALL be bypassed for [=Channel Group=] #i. The value of [=output_gain=](i) for [=Channel Group=] #i SHALL be applied to all samples of the mixed channels in [=Channel Group=] #i, where a mixed channel means the channel created by mixing multiple channels of an input channel audio when generating [=down-mixed audio=] from the input channel audio (i.e., the channel audio for CL #n). To apply the gain, an implementation SHALL use the following: @@ -2160,7 +2161,7 @@ When an [=IA Sequence=] contains multiple [=Mix Presentation=]s, the IA parser S 1. If there are any user-selectable mixes, the IA parser SHOULD select the mix, or mixes, that match the user's preferences. An example might be a mix with a specific language. [=Mix Presentation=]s MAY use [=mix_presentation_friendly_label=] to describe such mixes. 2. If there is more than one valid mix remaining, the IA parser SHOULD select an appropriate mix for rendering, in the following order. 1. If the playback device is headphones: - 1. Select the mix with [=audio_element_id=] whose [=loudspeaker_layout=] is BINAURAL. + 1. Select the mix with [=mix_presentation_obu/audio_element_id=] whose [=loudspeaker_layout=] is BINAURAL. 2. If there is no such mix, select the mix with [=loudness_layout=] = BINAURAL. 3. If there is no such mix, select the mix with the highest available [=loudness_layout=]. 2. If the playback layout is loudspeakers: @@ -2280,11 +2281,11 @@ Finally, the output mix gain SHALL be applied using the value specified in [=out ## Animated Parameters ## {#processing-animated-params} -This section describes how a set of parameter values is animated over a subblock in a parameter_block_obu() and applied to the corresponding audio samples, using the information provided in AnimatedParameterData(). +This section describes how a set of parameter values is animated over a subblock in a Scalable Channel Audio Decoding and Reconstruction Flowchart parameter_block_obu
and applied to the corresponding audio samples, using the information provided in AnimatedParameterData(). If [=animation_type=] is equal to STEP, the parameter value provided by [=start_point_value=] SHOULD be applied to all time steps in the subblock. -If [=animation_type=] is equal to LINEAR or BEZIER, the information provided in AnimatedParameterData() describes how the set of parameter values is animated as a Bezier curve. LetT
be the [=subblock_duration=] defined in the parameter_block_obu() andP0
,P1
andP2
be 2D coordinates defined as +If [=animation_type=] is equal to LINEAR or BEZIER, the information provided in AnimatedParameterData() describes how the set of parameter values is animated as a Bezier curve. LetT
be the [=parameter_block_obu/subblock_duration=] defined in theparameter_block_obu
andP0
,P1
andP2
be 2D coordinates defined as ``` P0 = (t0, start_point_value), @@ -2406,7 +2407,7 @@ In the matrices above, p1 = 0.707. Implementations MAY use a limiter defined in # IAMF Generation Process (Informative) # {#iamfgeneration} -This section provides a guideline for encoding an [=IA Sequence=] that conforms to [[#obu-syntax]], given a set of input audio and user inputs. +This section provides a guideline for encoding an [=IA Sequence=] that conforms to [[#obu-syntax]], given a set of input [=3D audio signal=] and user inputs. The RECOMMENDED input audio formats for IA encoding are as follows: - Ambisonics audio: a full-order Ambisonics signal with ACN channel ordering and SN3D normalization @@ -2424,15 +2425,15 @@ Example user inputs include: The figure below shows an example architecture for an IA encoder that generates an [=IA Sequence=] with one [=Audio Element=]. The IA encoder is composed of the Pre-processor, Codec encoder, and OBU packetizer modules. -- Pre-processor outputs one or more [=ChannelGroup=]s, [=Descriptors=] and optional [=Parameter Substream=]s based on the input audio and user inputs. - - It outputs one single [=ChannelGroup=] for a scene-based [=Audio Element=]. - - It outputs one or more [=ChannelGroup=]s for a channel-based [=Audio Element=]. +- Pre-processor outputs one or more [=Channel Group=]s, [=Descriptors=] and optional [=Parameter Substream=]s based on the input [=3D audio signal=] and user inputs. + - It outputs one single [=Channel Group=] for a scene-based [=Audio Element=]. + - It outputs one or more [=Channel Group=]s for a channel-based [=Audio Element=]. - It outputs [=Descriptors=] which are composed of one [=IA Sequence Header OBU=], one [=Codec Config OBU=], one [=Audio Element OBU=], and one or more [=Mix Presentation OBU=]s. - It may output [=Parameter Substream=]s - For a channel-based [=Audio Element=] with [=num_layers=] = 1, it may output a [=Parameter Substream=] with demixing info. - For a channel-based [=Audio Element=] with [=num_layers=] > 1, it outputs [=Parameter Substream=]s with demixing info and recon gain info. - It may further output [=Parameter Substream=]s with mixing gain. -- Codec encoder generates one or more [=Audio Substream=]s from each [=ChannelGroup=] based on [=Codec Config OBU=]. +- Codec encoder generates one or more [=Audio Substream=]s from each [=Channel Group=] based on [=Codec Config OBU=]. - OBU packetizer packetizes [=Descriptors=], [=Parameter Substream=]s and [=Audio Substream=]s into OBUs, and outputs an [=IA Sequence=]. - Temporal unit generator generates a [=Temporal Unit=] for each frame from [=Audio Frame OBU=]s and [=Parameter Block OBU=]s (if present). @@ -2443,15 +2444,15 @@ The IA encoder is composed of the Pre-processor, Codec encoder, and OBU packetiz For Ambisonics encoding: -- The Pre-Processor outputs one [=ChannelGroup=] and one set of [=Descriptors=]. It is composed of only the Meta Generator. +- The Pre-Processor outputs one [=Channel Group=] and one set of [=Descriptors=]. It is composed of only the Meta Generator. - The Meta Generator generates [=Descriptors=] based on the Ambisonics mode and the number of channels. - [=ambisonics_mode=] is set as follows: - 0 if [=ChannelMappingFamily=] = 2, as specified in [[RFC8486]]. - 1 if [=ChannelMappingFamily=] = 3, as speciifed in [[RFC8486]]. - [=ambisonics_config()=] is set as follows: - [=output_channel_count=] is set to the number of Ambisonics channels, e.g., 4, 9, or 16. - - [=channel_mapping=] for [=ambisonics_mode=] = 0 is assigned based on the order of the [=Audio Substream=]s in the [=ChannelGroup=]. - - [=demixing_matrix=] for [=ambisonics_mode=] = 1 is assigned based on the order of the [=Audio Substream=]s in the [=ChannelGroup=]. + - [=channel_mapping=] for [=ambisonics_mode=] = 0 is assigned based on the order of the [=Audio Substream=]s in the [=Channel Group=]. + - [=demixing_matrix=] for [=ambisonics_mode=] = 1 is assigned based on the order of the [=Audio Substream=]s in the [=Channel Group=]. - Codec Enc. outputs [=substream_count=] number of [=Audio Substream=]s. - The i-th [=Temporal Unit=] is composed of the [=Audio Frame OBU=]s for the i-th frame. - It may have an immediately preceding [=Temporal Delimiter OBU=]. @@ -2460,42 +2461,42 @@ For Ambisonics encoding: For Scalable Channel Audio encoding: -- The Pre-processor outputs N [=ChannelGroup=]s ([=num_layers=] = N), [=Descriptors=] and [=Parameter Substream=]s. It is composed of a down-mix parameter generator, down-mixer, Loudness, ChannelGroup generator, Attenuation, and Meta generator. +- The Pre-processor outputs N [=Channel Group=]s ([=num_layers=] = N), [=Descriptors=] and [=Parameter Substream=]s. It is composed of a down-mix parameter generator, down-mixer, Loudness, Channel Group generator, Attenuation, and Meta generator. - For non-scalable channel audio (i.e., [=num_layers=] = 1): - [=Parameter Substream=] for recon gain is not generated. - [=Parameter Substream=] for demixing info may be generated by implementers who assume it to be recommended for dynamic downmixing on the decoder side. - - Down-mixer, ChannelGroup generator, and Attenuation modules are not needed. + - Down-mixer, Channel Group generator, and Attenuation modules are not needed. - Down-mix parameter generator generates 5 down-mix parameters (α(k), β(k), γ(k), δ(k) and w(k)) by analyzing the input channel audio. - Down-mixer generates [=down-mixed audio=]s according to the list of channel layouts and the down-mix parameters. - Loudness module outputs the loudness level ([=LKFS=]) of each [=down-mixed audio=] based on [[ITU1770-4]]. - - ChannelGroup generator transforms the input channel audio to N [=ChannelGroup=]s for scalable channel audio with [=num_layers=] = N by using the down-mix parameters and the list of channel layouts. - - The Attenuation module applies a gain to the transformed [=ChannelGroup=]s to prevent clipping. + - Channel Group generator transforms the input channel audio to N [=Channel Group=]s for scalable channel audio with [=num_layers=] = N by using the down-mix parameters and the list of channel layouts. + - The Attenuation module applies a gain to the transformed [=Channel Group=]s to prevent clipping. - Meta generator generates [=Descriptors=] and [=Parameter Substream=]s. - [=Descriptors=] are set as follows: - [=num_layers=] is set to N (i.e., the number of channel layouts). - [=channel_audio_layer_config()=] is set as follows: - - [=loudspeaker_layout=] is set to the ith list of channel layouts for the ith [=ChannelGroup=]. - - [=output_gain_is_present_flag=] is set to 1 for the ith [=ChannelGroup=] if attenuation is applied to the mixed channels of the ith [=ChannelGroup=]. Otherwise, it is set to 0 for the ith [=ChannelGroup=]. - - [=recon_gain_is_present_flag=] is set to 1 for the ith [=ChannelGroup=] if the preceding [=ChannelGroup=]s has one or more mixed channels from the [=down-mixed audio=] for the ith channel layout. Otherwise, it is set to 0 for the ith [=ChannelGroup=]. Especially, when [=num_layers=] = 1, this flag is set to 0. + - [=loudspeaker_layout=] is set to the ith list of channel layouts for the ith [=Channel Group=]. + - [=output_gain_is_present_flag=] is set to 1 for the ith [=Channel Group=] if attenuation is applied to the mixed channels of the ith [=Channel Group=]. Otherwise, it is set to 0 for the ith [=Channel Group=]. + - [=recon_gain_is_present_flag=] is set to 1 for the ith [=Channel Group=] if the preceding [=Channel Group=]s has one or more mixed channels from the [=down-mixed audio=] for the ith channel layout. Otherwise, it is set to 0 for the ith [=Channel Group=]. Especially, when [=num_layers=] = 1, this flag is set to 0. - This flag is set to 0 for lossless codecs including LPCM. - - [=substream_count=] is set to the number of [=Audio Substream=]s in the ith [=ChannelGroup=]. - - [=coupled_substream_count=] is set to the number of coupled substreams among the [=Audio Substream=]s that make up the ith [=ChannelGroup=]. - - Each bit of [=output_gain_flags=] is set to 1 for the ith [=ChannelGroup=] if attenuation is applied to the relevant channel of the ith [=ChannelGroup=]. Otherwise, it is set to 0 for the ith [=ChannelGroup=]. + - [=substream_count=] is set to the number of [=Audio Substream=]s in the ith [=Channel Group=]. + - [=coupled_substream_count=] is set to the number of coupled substreams among the [=Audio Substream=]s that make up the ith [=Channel Group=]. + - Each bit of [=output_gain_flags=] is set to 1 for the ith [=Channel Group=] if attenuation is applied to the relevant channel of the ith [=Channel Group=]. Otherwise, it is set to 0 for the ith [=Channel Group=]. - [=output_gain=] is set to the gain (i.e., the inverse of attenuation gain) which is applied to the channels which are indicated by [=output_gain_flags=]. - - [=Parameter Substream=]s can be composed of one for demixing info and the other for recon gain. When [=recon_gain_is_present_flag=] = 0 for all [=ChannelGroup=]s, no [=Parameter Block OBU=]s for recon gain info are present in [=IA Sequence=]. + - [=Parameter Substream=]s can be composed of one for demixing info and the other for recon gain. When [=recon_gain_is_present_flag=] = 0 for all [=Channel Group=]s, no [=Parameter Block OBU=]s for recon gain info are present in [=IA Sequence=]. - [=dmixp_mode=] of [=demixing_info_parameter_data()=] for the kth frame is set to indicate (α(k), β(k), γ(k), δ(k)) and w_idx_offset(k), where w_idx_offset(k) = 1 or -1. - [=recon_gain_flags=] of [=recon_gain_info_parameter_data()=] is set to indicate the de-mixed channels which need to apply [=recon_gain=] among the output channels after demixing for the ith channel layout. - - [=recon_gain=] is set to the gain value to be applied to the channel which is indicated by [=recon_gain_flags=] for the ith [=ChannelGroup=]. + - [=recon_gain=] is set to the gain value to be applied to the channel which is indicated by [=recon_gain_flags=] for the ith [=Channel Group=]. - [=Temporal Unit=] for the kth frame is composed of zero or more [=Parameter Block OBU=]s and followed by the [=Audio Frame OBU=]s for the kth frames. - It may have the immediately preceding [=Temporal Delimiter OBU=]. - - [=ChannelGroup=]s in a [=Temporal Unit=] are placed in order. In other words, the [=ChannelGroup=] for the first channel layout comes first, followed by the [=ChannelGroup=] for the second channel layout, followed by the [=ChannelGroup=] for the third channel layout, and so on. + - [=Channel Group=]s in a [=Temporal Unit=] are placed in order. In other words, the [=Channel Group=] for the first channel layout comes first, followed by the [=Channel Group=] for the second channel layout, followed by the [=Channel Group=] for the third channel layout, and so on. The figure below shows the IA encoding flowchart for Scalable Channel Audio. - For a given input channel audio and a given list of channel layouts for scalability, PCMs for the input channel audio are passed to the CG Generation module. - CG Generation module generates the transformed audio according to the CG generation rule based on the list of CLs and the down-mix parameters. - - The transformed audio is structured as [=ChannelGroup=]s. + - The transformed audio is structured as [=Channel Group=]s. - Non-mixed channels of the transformed audio (i.e., the original channels of the input channel audio) are directly input to the Codec encoder, but the mixed channels may be input first to the Attenuation module and then to the Codec encoder. -- The Attenuation module reduces all sample values of the mixed channels in the same [=ChannelGroup=] at a uniform rate ([=output_gain=]). +- The Attenuation module reduces all sample values of the mixed channels in the same [=Channel Group=] at a uniform rate ([=output_gain=]). - A range of 0 dB to -6 dB is recommended for attenuation. (i.e., a range of 0 dB to 6 dB for [=output_gain=]) - Codec Enc. generates the coded [=Audio Substream=]s from PCMs and passes the coded [=Audio Substream=]s and one single [=decoder_config()=] to OBU Packetizer. - OBU packetizer generates [=Descriptors=] which consists of one [=IA Sequence Header OBU=], one [=Codec Config OBU=], one [=Audio Element OBU=] and one or more [=Mix Presentation OBU=]. @@ -2564,18 +2565,18 @@ Step 1: [=Descriptors=] are generated as follows: - [=IA Sequence Header OBU=]: take the larger [=primary_profile=] field and the larger [=additional_profile=] field, respectively. - [=Codec Config OBU=]: take the [=Codec Config OBU=] of an [=IA Sequence=]. - Two [=Audio Element OBU=]s: take both of them and make the following modifications: - - [=codec_config_id=] in each [=Audio Element OBU=] is updated to indicate the [=codec_config_id=] specified in the taken [=Codec Config OBU=]. - - Each [=audio_element_id=] is updated to be unique between the two [=Audio Element OBU=]s. - - Each [=audio_substream_id=] is updated to be unique between the two [=Audio Element OBU=]s. - - [=parameter_id=]s in [=ParamDefinition()=]s carried in each [=Audio Element OBU=] are updated to be unique within the new [=IA Sequence=], if necessary. + - [=audio_element_obu/codec_config_id=] in each [=Audio Element OBU=] is updated to indicate the [=codec_config_obu/codec_config_id=] specified in the taken [=Codec Config OBU=]. + - Each [=audio_element_obu/audio_element_id=] is updated to be unique between the two [=Audio Element OBU=]s. + - Each [=audio_element_obu/audio_substream_id=] is updated to be unique between the two [=Audio Element OBU=]s. + - [=ParamDefinition/parameter_id=]s in [=ParamDefinition()=]s carried in each [=Audio Element OBU=] are updated to be unique within the new [=IA Sequence=], if necessary. - [=Mix Presentation OBU=]s: generate new ones which are used for mixing the two [=Audio Element=]s. - - [=audio_element_id=]s in each [=Mix Presentation OBU=] are set to indicate the [=audio_element_id=]s of the referred [=Audio Element OBU=]s. - - [=parameter_id=]s in [=ParamDefinition()=]s carried in each [=Mix Presentation OBU=] are set to refer their associated [=Parameter Substream=]s. + - [=mix_presentation_obu/audio_element_id=]s in each [=Mix Presentation OBU=] are set to indicate the [=audio_element_obu/audio_element_id=]s of the referred [=Audio Element OBU=]s. + - [=ParamDefinition/parameter_id=]s in [=ParamDefinition()=]s carried in each [=Mix Presentation OBU=] are set to refer their associated [=Parameter Substream=]s. Step 2: The ith [=Temporal Unit=] is generated as follows: - Place all [=Parameter Block OBU=]s for the ith frame, followed by the [=Audio Frame OBU=]s ith frame (grouped by [=Audio Element=]s). Make the following modifications: - - The [=obu_type=]s of the [=Audio Frame OBU=]s are updated to be aligned with the [=audio_substream_id=]s specified in the [=Audio Element OBU=]s. - - [=parameter_id=]s in [=Parameter Block OBU=]s are updated to identify their associated [=Parameter Substream=]s based on the [=parameter_id=]s carried in the [=Descriptors=]. + - The [=obu_type=]s of the [=Audio Frame OBU=]s are updated to be aligned with the [=audio_element_obu/audio_substream_id=]s specified in the [=Audio Element OBU=]s. + - [=parameter_block_obu/parameter_id=]s in [=Parameter Block OBU=]s are updated to identify their associated [=Parameter Substream=]s based on the [=ParamDefinition/parameter_id=]s carried in the [=Descriptors=]. - It may have the immediately preceding [=Temporal Delimiter OBU=]. Step 3: Generate an [=IA Sequence=] which starts with [=Descriptors=] and is followed by [=Temporal Unit=]s in order. @@ -2687,32 +2688,32 @@ The pow() function returns the value of x to the power of y. ## Annex A: ID Linking Scheme (Informative) ## {#Annex_A} -The figure below shows the linking scheme among IDs in the obu_header or OBU payload. +The figure below shows the linking scheme among IDs in theobu_header
or OBU payload.In the above figure, -- [=Codec Config OBU=] with [=codec_config_id=] = 0 is providing [=codec_id=] and its [=decoder_config()=]. +- [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0 is providing [=codec_id=] and its [=decoder_config()=]. - [=Mix Presentation OBU=] with [=mix_presentation_id=] = 21 is saying: - - There are two [=Audio Element=]s([=audio_element_id=] = 11 and 12) which need to be mixed. The [=audio_element_id=] = 11 and the [=audio_element_id=] = 12 are linked to the [=Audio Element OBU=]s with [=audio_element_id=] = 11 and [=audio_element_id=] = 12, respectively. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 32 to be used for mixing of the [=Audio Element=] with [=audio_element_id=] = 11. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 33 to be used for mixing of the [=Audio Element=] with [=audio_element_id=] = 12. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 34 to be used for mixing of the two [=Audio Element=]s. -- [=Audio Element OBU=] with [=audio_element_id=] = 11 is saying: - - This [=Audio Element=] has been coded using [=Codec Config OBU=] with [=codec_config_id=] = 0. - - There are two [=Audio Substream=]s ([=audio_substream_id=] = 0 and 1) in this [=Audio Element=]. The [=audio_substream_id=] = 0 and the [=audio_substream_id=] = 1 are linked to the [=Audio Frame OBU=]s with [=audio_substream_id=] = 0 and [=audio_substream_id=] = 1(i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0 and [=obu_type=] = OBU_IA_Audio_Frame_ID1), respectively. - - There are [=Parameter Block OBU=]s with [=parameter_id=] = 31 to be used for demixing of this [=Audio Element=]. -- [=Audio Element OBU=] with [=audio_element_id=] = 12 is saying: - - This [=Audio Element=] has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0. - - There is one [=Audio Substream=] ([=audio_substream_id=] = 2) in this [=Audio Element=]. The [=audio_substream_id=] = 2 is linked to the [=Audio Frame OBU=]s with [=audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2). -- [=Audio Frame OBU=] with [=audio_substream_id=] = 0 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0 of [=Audio Substream=] with [=audio_substream_id=] = 0. -- [=Audio Frame OBU=] with [=audio_substream_id=] = 1 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID1) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0 of [=Audio Substream=] with [=audio_substream_id=] = 1. -- [=Audio Frame OBU=] with [=audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_id=] = 0 of [=Audio Substream=] with [=audio_substream_id=] = 2. -- [=Parameter Block OBU=] with [=parameter_id=] = 31 is providing [=demixing_info_parameter_data()=] to be applied for demixing of the [=Audio Element=] with [=audio_element_id=] = 11. -- [=Parameter Block OBU=] with [=parameter_id=] = 32 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_id=] = 11. -- [=Parameter Block OBU=] with [=parameter_id=] = 33 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_id=] = 12. -- [=Parameter Block OBU=] with [=parameter_id=] = 34 is providing mix_gain_parameter_data() to be applied to the [=Rendered Mix Presentation=] of the two rendered [=Audio Element=]s. + - There are two [=Audio Element=]s ([=audio_element_obu/audio_element_id=] = 11 and 12) which need to be mixed. The [=mix_presentation_obu/audio_element_id=] = 11 and the [=mix_presentation_obu/audio_element_id=] = 12 are linked to the [=Audio Element OBU=]s with [=audio_element_obu/audio_element_id=] = 11 and [=audio_element_obu/audio_element_id=] = 12, respectively. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 32 to be used for mixing the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 11. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 33 to be used for mixing the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 12. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 34 to be used for mixing the two [=Audio Element=]s. +- [=Audio Element OBU=] with [=audio_element_obu/audio_element_id=] = 11 is saying: + - This [=Audio Element=] has been coded using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. + - There are two [=Audio Substream=]s ([=audio_substream/audio_substream_id=] = 0 and 1, respectively) in this [=Audio Element=]. They are linked to the [=Audio Frame OBU=]s with [=audio_substream/audio_substream_id=] = 0 and [=audio_substream/audio_substream_id=] = 1 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0 and [=obu_type=] = OBU_IA_Audio_Frame_ID1), respectively. + - There are [=Parameter Block OBU=]s with [=parameter_block_obu/parameter_id=] = 31 to be used for demixing this [=Audio Element=]. +- [=Audio Element OBU=] with [=audio_element_obu/audio_element_id=] = 12 is saying: + - This [=Audio Element=] has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. + - There is one [=Audio Substream=] ([=audio_substream/audio_substream_id=] = 2) in this [=Audio Element=]. It is linked to the [=Audio Frame OBU=]s with [=audio_substream/audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2). +- [=Audio Frame OBU=] with [=audio_substream/audio_substream_id=] = 0 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID0) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. +- [=Audio Frame OBU=] with [=audio_substream/audio_substream_id=] = 1 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID1) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. +- [=Audio Frame OBU=] with [=audio_substream/audio_substream_id=] = 2 (i.e., [=obu_type=] = OBU_IA_Audio_Frame_ID2) is providing the coded data which has been coded by using [=Codec Config OBU=] with [=codec_config_obu/codec_config_id=] = 0. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 31 is providing [=demixing_info_parameter_data()=] to be applied for demixing the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 11. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 32 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 11. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 33 is providing mix_gain_parameter_data() to be applied to the rendered [=Audio Element=] after rendering according to [=rendering_config()=] of the [=Audio Element=] with [=audio_element_obu/audio_element_id=] = 12. +- [=Parameter Block OBU=] with [=parameter_block_obu/parameter_id=] = 34 is providing mix_gain_parameter_data() to be applied to the [=Rendered Mix Presentation=] of the two rendered [=Audio Element=]s. ## Annex B: Rules for Scalable Channel Audio (Normative) ## {#Annex_B} @@ -2727,16 +2728,16 @@ The figure below shows a block diagram for the down-mix parameter and loudness m ID Linking Scheme -For a given channel-based input audio (e.g., 7.1.4ch) and a given list of channel layouts based on the input audio, -- Down-mix parameter generator SHALL generate 5 down-mix parameters (α(k), β(k), γ(k), δ(k) and w(k), where k is the frame index) by analyzing the input audio and referring to [[AI-CAD-Mixing]]. +For a given channel-based input [=3D audio signal=] (e.g., 7.1.4ch) and a given list of channel layouts based on the input [=3D audio signal=], +- Down-mix parameter generator SHALL generate 5 down-mix parameters (α(k), β(k), γ(k), δ(k) and w(k), where k is the frame index) by analyzing the input [=3D audio signal=] and referring to [[AI-CAD-Mixing]]. - It is composed of an Audio Scene Classification module and a Height Energy Quantification module as depicted in Figure 11-2. - - Audio Scene Classification module generates 4 parameters (α(k), β(k), γ(k), δ(k)) by classifying audio scenes of the input audio in three modes. + - Audio Scene Classification module generates 4 parameters (α(k), β(k), γ(k), δ(k)) by classifying audio scenes of the input [=3D audio signal=] in three modes. - Default scene: Neither Dialog nor Effect - Dialog scene: Center-channel oriented and clear dialog/voice sounds - Effect scene: Directional and spatially moving sounds. - - The Height Energy Quantification module generates a surround-to-height mixing parameter (w(k)) which is decided according to the relative energy difference between the top and surround channels of the input audio. + - The Height Energy Quantification module generates a surround-to-height mixing parameter (w(k)) which is decided according to the relative energy difference between the top and surround channels of the input [=3D audio signal=]. - If the energy of top channels is bigger than that of surround ones, then w_idx_offset(k) is set to 1. Otherwise, it is set to -1. And, w(k) is calculated based on w_idx_offset(k) and conforms to [[#processing-scalablechannelaudio]]. -- Down-mixer generates [=down-mixed audio=] from the input audio according to the list of channel layouts and the down-mix parameters, and outputs [=down-mixed audio=] for each channel layout to the Loudness module. +- Down-mixer generates [=down-mixed audio=] from the input [=3D audio signal=] according to the list of channel layouts and the down-mix parameters, and outputs [=down-mixed audio=] for each channel layout to the Loudness module. - It is not depicted in the figure but down-mixer further generates [=dmixp_mode=] and [=recon_gain=] for each frame to be passed to the OBU packetizer. - Loudness module measures the loudness level ([=LKFS=]) of each [=down-mixed audio=] based on [[ITU1770-4]], and passes them to OBU packetizer. @@ -2744,9 +2745,9 @@ For a given channel-based input audio (e.g., 7.1.4ch) and a given list of channe This section specifies the down-mixing mechanism to generate down-mixed audio for scalable channel audio. -For a given channel-based input audio that conforms to [=loudspeaker_layout=], the surround and top channels (if any) are separately down-mixed and especially step by step until to get a target channels. +For a given channel-based input [=3D audio signal=] that conforms to [=loudspeaker_layout=], the surround and top channels (if any) are separately down-mixed and especially step by step until to get a target channels. -Implementers MAY use another method to get the [=down-mixed audio=] from the given input audio, but the [=down-mixed audio=] SHALL comply with that by this section. +Implementers MAY use another method to get the [=down-mixed audio=] from the given input [=3D audio signal=], but the [=down-mixed audio=] SHALL comply with that by this section. Therefore, a down-mixer based on the down-mix mechanism is a combination of the following surround down-mixer(s) and top down-mixer(s) as depicted in the figure below. - Surround down-mixers @@ -2771,7 +2772,7 @@ For example, to get [=down-mixed audio=] 3.1.2ch from 7.1.4ch: This section describes the generation rule for channel layouts for scalable channel audio. -For a given channel layout (CL #n) of channel-based input audio, any list of CLs ({CL #i: i = 1, 2, ..., n}) for scalable channel audio SHALL conform with the following rules: +For a given channel layout (CL #n) of channel-based input [=3D audio signal=], any list of CLs ({CL #i: i = 1, 2, ..., n}) for scalable channel audio SHALL conform with the following rules: - Si ≤ Si+1 and Wi ≤ Wi+1 and Ti ≤ Ti+1 except Si = Si+1, Wi = Wi+1 and Ti = Ti+1 for i = n-1, n-2, …, 1. Where the ith channel layout CL #i = Si.Wi.Ti. - CL #i is one of [=loudspeaker_layout=]s supported in this version of the specification. @@ -2798,38 +2799,38 @@ If 10*log10(level Ok / maxL^2) is less than the first threshold value (-80dB is If 10*log10(level Ok / level Mk ) is less than the second threshold value (-6dB is RECOMMENDED), Recon_Gain (k, i) is set to the value which makes level Ok = Recon_Gain (k, i)^2 * level Dk. Otherwise, Recon_Gain (k, i) = 1. Actual value (i.e., [=recon_gain=]) to be delivered is floor(255*Recon_Gain). For example, if we assume CL #i = 7.1.4ch and CL #i-1 = 5.1.2ch, then de-mixed channels are D_Lrs7, D_Rrs7, D_Ltb4 and D_Rtb4. -- D_Lrs7 and D_Rrs7 are de-mixed from Ls5 and Rs5 in the (i-1)th [=ChannelGroup=] by using Lss7 and Rss7 in the ith [=ChannelGroup=] and its relevant demixing parameters (i.e., α(k) and β(k)) , respectively. -- D_Ltb4 and D_Rtb4 are de-mixed from Ltf2 and Rtf2 in the (i-1)th [=ChannelGroup=] by using Ltf4 and Rtf4 in the ith [=ChannelGroup=] and its relevant demixing parameter (i.e., γ(k)), respectively. +- D_Lrs7 and D_Rrs7 are de-mixed from Ls5 and Rs5 in the (i-1)th [=Channel Group=] by using Lss7 and Rss7 in the ith [=Channel Group=] and its relevant demixing parameters (i.e., α(k) and β(k)) , respectively. +- D_Ltb4 and D_Rtb4 are de-mixed from Ltf2 and Rtf2 in the (i-1)th [=Channel Group=] by using Ltf4 and Rtf4 in the ith [=Channel Group=] and its relevant demixing parameter (i.e., γ(k)), respectively. Recon_Gain for D_Lrs7: -- Level Ok is the signal power for the frame #k of Lrs7 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Ls5 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Lrs7 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Ls5 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Lrs7. Recon_Gain for D_Rrs7: -- Level Ok is the signal power for the frame #k of Rrs7 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Rs5 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Rrs7 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Rs5 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Rrs7. Recon_Gain for D_Ltb4: -- Level Ok is the signal power for the frame #k of Ltf4 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Ltf2 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Ltf4 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Ltf2 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Ltb4. Recon_Gain for D_Rtb4: -- Level Ok is the signal power for the frame #k of Rtf4 in the ith [=ChannelGroup=]. -- Level Mk is the signal power for the frame #k of Rtf2 in the (i-1)th [=ChannelGroup=]. +- Level Ok is the signal power for the frame #k of Rtf4 in the ith [=Channel Group=]. +- Level Mk is the signal power for the frame #k of Rtf2 in the (i-1)th [=Channel Group=]. - Level Dk is the signal power for the frame #k of D_Rtb4. -### Annex B-5: ChannelGroup Generation Rule ### {#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule} +### Annex B-5: Channel Group Generation Rule ### {#iamfgeneration-scalablechannelaudio-channelgroupgenerationrule} -This section describes the generation rule for [=ChannelGroup=]. +This section describes the generation rule for [=Channel Group=]. -For a given channel-based input audio and the list of CLs ({CL #i: i = 1, 2, ..., n}), the CG Generation module outputs the transformed audio (i.e., ChannelGroups) which SHALL conform to the following rules: -- It consists of C number of channels and is structured to n number of [=ChannelGroup=]s, where C is the number of channels for the input audio. -- [=ChannelGroup=] #1 (as called BCG): This [=ChannelGroup=] is the [=down-mixed audio=] itself for CL #1 generated from the input audio. It contains a C1 number of channels. -- [=ChannelGroup=] #i (as called DCG, i = 2, 3, …, n): This [=ChannelGroup=] contains (Ci – Ci-1) number of channels. (Ci – Ci-1) channel(s) consists of as follows: +For a given channel-based input audio and the list of CLs ({CL #i: i = 1, 2, ..., n}), the CG Generation module outputs the transformed audio (i.e., [=Channel Group=]s) which SHALL conform to the following rules: +- It consists of C number of channels and is structured to n number of [=Channel Group=]s, where C is the number of channels for the input [=3D audio signal=]. +- [=Channel Group=] #1 (as called BCG): This [=Channel Group=] is the [=down-mixed audio=] itself for CL #1 generated from the input [=3D audio signal=]. It contains a C1 number of channels. +- [=Channel Group=] #i (as called DCG, i = 2, 3, …, n): This [=Channel Group=] contains (Ci – Ci-1) number of channels. (Ci – Ci-1) channel(s) consists of as follows: - (Si – Si-1) surround channel(s) if Si > Si-1 . When S_set = { x | Si-1 < x ≤ Si and x is an integer}, - If 2 is an element of S_set, the L2 channel is contained in this CG #i. - If 3 is an element of S_set, the Center channel is contained in this CG #i. @@ -2837,8 +2838,8 @@ For a given channel-based input audio and the list of CLs ({CL #i: i = 1, 2, ... - If 7 is an element of S_set, the Lss7 and Rss7 channels are contained in this CG #i. - The LFE channel if Wi > Wi-1. - (Ti – Ti-1) top channels if Ti > Ti-1 . - - If Ti-1 = 0, the top channels of the [=down-mixed audio=] for CL #i are contained in this [=ChannelGroup=] #i. - - If Ti-1 = 2, the Ltf and Rtf channels of the [=down-mixed audio=] for CL #i are contained in this [=ChannelGroup=] #i. + - If Ti-1 = 0, the top channels of the [=down-mixed audio=] for CL #i are contained in this [=Channel Group=] #i. + - If Ti-1 = 2, the Ltf and Rtf channels of the [=down-mixed audio=] for CL #i are contained in this [=Channel Group=] #i. The figure below shows one example of a transformation matrix with 4 CGs (2ch/3.1.2ch/5.1.2ch/7.1.4ch). IA Down-mix Parameter and Loudness