Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert VideoCodecParameters.extra_data to vector #323

Open
wants to merge 5 commits into
base: dev-0.6
Choose a base branch
from

Conversation

sscobici
Copy link

it is based on constants renaming commit from PR #322
This PR is to get early feedback, mkv changes are not finished, I will add another commit later
Notes:

  • used let mut VideoCodecParameters and write values directly into it, instead of creating many temp variables and then copy them to VideoCodecParameters. This is not following other code pattern, but with the growing number of variables can be more suitable/readable. Please let me know your opinion.
  • removed VisualCodecSpecific, initially this followed AudioCodecSpecific, but I'm not sure it is needed. VideoCodecParameters should have codec_id_tag (FourCC code) where VisualCodecSpecific variations can be stored.

@sscobici
Copy link
Author

added mkv dolby vision configuration extra data parsing

@sscobici sscobici marked this pull request as ready for review October 20, 2024 23:01
@sscobici
Copy link
Author

sscobici commented Oct 21, 2024

fixed mkv dovi parsing with two BlockAdditionMapping elements,
added DOVIDecoderConfigurationRecord parsing to common

@pdeljanov
Copy link
Owner

Did a quick skim of this and everything seems very reasonable to me!

  • used let mut VideoCodecParameters and write values directly into it, instead of creating many temp variables and then copy them to VideoCodecParameters. This is not following other code pattern, but with the growing number of variables can be more suitable/readable. Please let me know your opinion.

That's fine. I generally like immutable variables, but past a few it can become messy and writing into a mutable struct is cleaner.

Further thought: Do you think the builder methods should just be removed in general on the codec parameters structs? I believe a proper builder pattern would actually use a separate object all together.

  • removed VisualCodecSpecific, initially this followed AudioCodecSpecific, but I'm not sure it is needed. VideoCodecParameters should have codec_id_tag (FourCC code) where VisualCodecSpecific variations can be stored.

I'm not attached to Audio/VideoCodecSpecific, however, it would be preferable if the same pattern is used for both.

@sscobici sscobici force-pushed the video-extra-data branch 3 times, most recently from 5474b40 to b5a3ae9 Compare October 22, 2024 20:57
@sscobici
Copy link
Author

Further thought: Do you think the builder methods should just be removed in general on the codec parameters structs? I believe a proper builder pattern would actually use a separate object all together.

I think the existing methods in the core codec parameters should be removed if we agree to adopt the struct initialization pattern. While a proper builder pattern does require a separate object, I’m not sure if you want to introduce the builder pattern in the library due to the potential runtime overhead.

I'm not attached to Audio/VideoCodecSpecific, however, it would be preferable if the same pattern is used for both.

Ok, let's try pattern alignment ;-). I've added two commits for your high-level review. These might be too many changes and potentially go against the contribution policy or introduce regressions, so feel free to reject them. I can then try to follow more closely to the existing code patterns in MP4.

Anyway, here are my reasons for the changes:

  1. They align video and audio codec parameter creation in MP4.
  2. They remove AudioCodecSpecific and treat AudioSampleEntry as a generic container for codec data, which is later transformed into AudioCodecParameters. I reviewed the MKV code, and it doesn't require additional enums to carry codec-specific info; instead, it uses TrackElement as a container. This aligns the approaches between MP4 and MKV.
  3. At a high level, we now have:
  • format model (TrackElement/Audio(Video)Element for mkv and StsdAtom/Audio(Video)SampleEntry for mp4).
  • generic model (Audio(Video)CodecParameters
    We should be able to create a trait to convert a format model into a generic model, which each format model can implement (see StsdAtom::make_codec_params).
  1. It should be possible to create a trait to convert Atom data into SampleEntry, simplifying some of the MP4 code (see AlacAtom::fill_audio_sample_entry, FlacAtom::fill_audio_sample_entry, etc.).

This PR lacks the following:

  • Removal of new() and builder methods from core codec parameters.
  • moving make_codec_params into TrackElement class.
  • A trait for converting the format model to the generic model
  • A trait for filling audio/video sample entries in MP4

Please propose trait names if you liked the idea and want them now, or this can be done later.

@sscobici
Copy link
Author

Forgot to mention that AudioSampleEntry.sample_rate was changed from f64 to u32. It is u32 in AudioCodecParameters, so not sure why it was created as f64.
How can I verify the changes with regression testing?
Any test file pack? Do you know how other libs do regression testing?

@pdeljanov
Copy link
Owner

Hi, thanks for your work!

Hopefully responding to everything below:

Further thought: Do you think the builder methods should just be removed in general on the codec parameters structs? I believe a proper builder pattern would actually use a separate object all together.

I think the existing methods in the core codec parameters should be removed if we agree to adopt the struct initialization pattern. While a proper builder pattern does require a separate object, I’m not sure if you want to introduce the builder pattern in the library due to the potential runtime overhead.

It doesn't seem particularly worth it to have a separate builder.

I recall the builder was introduced to force the user to set an AudioCodecId. However, nowadays, the entire struct can be be defaulted so maintaining the builder seems less relevant.

This pattern seems reasonable:

let params: VideoCodecParameters = Default::default();
params.codec = // CODEC_ID_ ...
// ...etc...

I'm not attached to Audio/VideoCodecSpecific, however, it would be preferable if the same pattern is used for both.

Ok, let's try pattern alignment ;-). I've added two commits for your high-level review. These might be too many changes and potentially go against the contribution policy or introduce regressions, so feel free to reject them. I can then try to follow more closely to the existing code patterns in MP4.

Anyway, here are my reasons for the changes:

  1. They align video and audio codec parameter creation in MP4.
  2. They remove AudioCodecSpecific and treat AudioSampleEntry as a generic container for codec data, which is later transformed into AudioCodecParameters. I reviewed the MKV code, and it doesn't require additional enums to carry codec-specific info; instead, it uses TrackElement as a container. This aligns the approaches between MP4 and MKV.

I'm not quite sure why AudioSampleEntry/VisualSampleEntry in MP4 just duplicate the contents of the AudioCodecParamters/VideoCodecParameters? Seems like at that point you'd just fill the codec parameters instead? I'm probably missing something.

  1. At a high level, we now have:
  • format model (TrackElement/Audio(Video)Element for mkv and StsdAtom/Audio(Video)SampleEntry for mp4).
  • generic model (Audio(Video)CodecParameters

I like how this makes a solid distinction between what is read from the media source, and how we represent that in Symphonia. 👍

If the sample entry structs were intended to be the format-model object, then you can likely disregard my last comment. However, then the sample_rate should be an f64 since the container stores sample rate as a 16.16 fixed point number.

In this distinction it would make sense for the format-model objects to store their fields in the data type that most closely matched the data type in the file. That would help debugging, and adds value to what could be a lot of duplication. Then, the conversion would occur when converting into the generic-model.

We should be able to create a trait to convert a format model into a generic model, which each format model can implement (see StsdAtom::make_codec_params).

  1. It should be possible to create a trait to convert Atom data into SampleEntry, simplifying some of the MP4 code (see AlacAtom::fill_audio_sample_entry, FlacAtom::fill_audio_sample_entry, etc.).

Why would we need a trait for this? Is there value in it?

This PR lacks the following:

  • Removal of new() and builder methods from core codec parameters.
  • moving make_codec_params into TrackElement class.
  • A trait for converting the format model to the generic model
  • A trait for filling audio/video sample entries in MP4

Please propose trait names if you liked the idea and want them now, or this can be done later.

We might be able to cross some of these off based on the discussion on the points above.

Forgot to mention that AudioSampleEntry.sample_rate was changed from f64 to u32. It is u32 in AudioCodecParameters, so not sure why it was created as f64. How can I verify the changes with regression testing? Any test file pack? Do you know how other libs do regression testing?

This should now be answered above!


On an administrative note, I think this entire PR can be reorganized. You can really squash everything down into 4 commits:

  1. Update VideoCodecParameters in symphonia-core
  2. Changes to symphonia-common
  3. Changes to MP4
  4. Changes to MKV

Also, let's not increase the scope of this PR any further than what we've discussed here so far. It is already going to be on the bigger side! Once we've reached a conclusion on all the points above, you can polish this PR off and I can then review and merge.

@sscobici
Copy link
Author

Thanks for review.

It doesn't seem particularly worth it to have a separate builder.

I will not add more cleanups in this PR. Will make a separate PR.

However, then the sample_rate should be an f64 since the container stores sample rate as a 16.16 fixed point number.

Agree, I just observed that in some places it is read as u32, I will change it back to f64.

Why would we need a trait for this? Is there value in it?

I hope that it might remove some boiler plate code and encourage further formats to follow the common pattern of conversion. I haven't tried it to see if it really worth the benefits. But I agree not to introduce them now.

On an administrative note, I think this entire PR can be reorganized. You can really squash everything down into 4 commits:

This might make some commits to be non-buildable. I'm not sure this is a good, but if you insist I can do it.

@sscobici
Copy link
Author

should be ready for detailed review now. Thanks for f64 note, the way I changed it to u32 was not correct.

@pdeljanov
Copy link
Owner

On an administrative note, I think this entire PR can be reorganized. You can really squash everything down into 4 commits:

This might make some commits to be non-buildable. I'm not sure this is a good, but if you insist I can do it.

Ah true, I guess you would have to make changes to adapt to extra_data now being a Vec. If you can fix that in 5-10 minutes then I think it's worthwhile for the cleaner history, however, if it would take longer than that, you can leave the commits as is. All commits being buildable is more important than a clean history.

I'll try to review this thoroughly in the next couple of days!

@sscobici
Copy link
Author

rebased, kept Dolby Vision changes commits separate from the refactoring commits.

Copy link
Owner

@pdeljanov pdeljanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again. This all looks very clean to me. I have noted a couple nit-picks for your consideration, and one potential hazard.

For testing, I generally use my media library. Unfortunately, since it's copyrighted, I can't share it. However, here's what I generally test:

  • AAC, ALAC, FLAC, PCM in MP4
    • Test by passing file path to symphonia-play, but also by piping in via cat.
  • Movies
    • Audio tracks should be playable. Other tracks detected.
  • Fragmented MP4
    • These should also work when played by file path, or piped in.
    • Easiest way to get a fragmented MP4 is from YouTube using yt-dlp. Pass -f 140 --fixup never to get a fragmented AAC audio file.

If these tests pass, it's generally good enough for me to merge.

When it comes to cutting a release, I run my entire library through symphonia-check and make sure there haven't been any regressions since the last release.

symphonia-common/src/mpeg/video/mod.rs Outdated Show resolved Hide resolved
symphonia-format-isomp4/src/atoms/alac.rs Outdated Show resolved Hide resolved
symphonia-format-isomp4/src/atoms/dovi.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants