Skip to content

Conversation

rolandd
Copy link

@rolandd rolandd commented Aug 28, 2025

I have a Canon camera that produces CR3 raw files and I wanted nom-exif support
for EXIF extraction so I hacked this up. I tried to imitate the HEIC/HEIF handling
since CR3 files also use ISOBMFF, but I definitely don't have a deep understanding
of the right way to implement this, so please let me know if anything needs to be
fixed up.

As more file formats are added, it gets messy if exif.rs has code
that handles the specific internals of each file format. Move most
of the logic from heif_extract_exif() into heif::extract_exif_data()
in heif.rs.
@rolandd
Copy link
Author

rolandd commented Aug 28, 2025

One question: CR3 files have more useful EXIF data in a second "CMT2" box (in addition to the CMT1 box that holds the most basic info). What would be the best way to handle merging two discontiguous TIFF structures when parsing a single file?

@mindeng
Copy link
Owner

mindeng commented Aug 29, 2025

There are some code formatting issues; please refer to the output of cargo fmt --check for details.

@mindeng
Copy link
Owner

mindeng commented Aug 29, 2025

One question: CR3 files have more useful EXIF data in a second "CMT2" box (in addition to the CMT1 box that holds the most basic info). What would be the best way to handle merging two discontiguous TIFF structures when parsing a single file?

ExifIter/IfdIter should already provide the possibility to traverse multiple IFDs.

@rolandd
Copy link
Author

rolandd commented Sep 2, 2025

Thanks, I took all the changes suggested by "cargo fmt" and pushed that out.

Will look at using the ifds member of ExifIter to handle the EXIF data in the CMT2 box as well, but I think this is ready to land if you think it would be useful.

@rolandd
Copy link
Author

rolandd commented Sep 3, 2025

Looking more at the ExifIter implementation, I do see how it handles multiple IfdIters in the ifds member, but it seems we need a new way of constructing an ExifIter that takes something like a Vec<&[u8]> instead of an Option<&[u8]> to handle multiple discontiguous boxes with EXIF data in them? Does that make sense as a way to proceed?

@mindeng
Copy link
Owner

mindeng commented Sep 4, 2025

Looking more at the ExifIter implementation, I do see how it handles multiple IfdIters in the ifds member, but it seems we need a new way of constructing an ExifIter that takes something like a Vec<&[u8]> instead of an Option<&[u8]> to handle multiple discontiguous boxes with EXIF data in them? Does that make sense as a way to proceed?

If you want to process all IFDs in CR3, I think the relatively straightforward approach for now would be to parse all CMT* boxes, extract all the necessary Exif data, and then merge them into a single Vec<u8>. However, I'm not sure whether the offsets in the Exif data within CMT2 and subsequent boxes might cause issues—this point needs to be carefully verified.

@rolandd
Copy link
Author

rolandd commented Sep 4, 2025

If you want to process all IFDs in CR3, I think the relatively straightforward approach for now would be to parse all CMT* boxes, extract all the necessary Exif data, and then merge them into a single Vec<u8>. However, I'm not sure whether the offsets in the Exif data within CMT2 and subsequent boxes might cause issues—this point needs to be carefully verified.

Just to be clear - CR3 handling should copy the EXIF data into a new consolidated buffer and then parse that with the existing code?

@mindeng
Copy link
Owner

mindeng commented Sep 5, 2025

Hi @rolandd

After studying the structure of CR3 files, I found that it differs somewhat from my initial understanding. The CMT1/CMT2 sections here are independent TIFF structures. Therefore, I have added a MultiExifIter to support this scenario.

Additionally, I have implemented ParseOutput for MultiExifIter. To enable traversing all TIFF/Exif data, you can parse the CR3 file as follows: let mut iter: MultiExifIter = parser.parse(ms).unwrap();.

Note that the current implementation of parse_multi_exif_iter is likely incomplete. Please refer to the comments and refine the relevant CR3 processing logic.

The code has been submitted to the PR branch.

Other issues:

  • In the uuid.rs file, CMT1/CMT2/CMT3 are hard-coded. I am uncertain whether this approach is appropriate. For example, are these names fixed? Could there be additional CMT* boxes (e.g., CMT4/CMT5) that need to be parsed?
  • The file size of testdata/canon-r6.cr3 is too large. Please try to strip out image data and other non-essential information, retaining only the minimal data required for parsing TIFF/Exif information.

@rolandd
Copy link
Author

rolandd commented Sep 6, 2025

Thanks, will look at this code and work on it. Meanwhile I updated the branch with a minimized CR3 file (down to ~400K) that still has all the metadata and is accepted by exiftool.

I looked back at https://github.com/lclevy/canon_cr3 and think it's safe to hard-code CMT1/CMT2/CMT3/CMT4 (CMT4 is not in my code but seems to be used for GPS info, I'll try to get a test file with GPS data in it). But anyway I think Canon has probably frozen the file structure for now.

rolandd and others added 4 commits September 6, 2025 21:40
See https://github.com/lclevy/canon_cr3 for information about the CR3
file format.

 - Add testdata/canon-r6.cr3: minimized valid CR3 file based on an image
   from a Canon R6 camera
 - Update to detect CR3 files in file.rs based on brand name 'crx '
 - Add bbox/cr3_moov.rs to handle 'moov' boxes and bbox/uuid.rs to handle
   Canon UUID sub-boxes that contain EXIF data for CR3 files
 - Add cr3.rs to handle extracting EXIF from CR3 files
 - Add basic test cases for CR3 parsing
@rolandd
Copy link
Author

rolandd commented Sep 18, 2025

Sorry, just getting back to this, I'm a little confused what the intention is with MultiExifIter. Is the idea that the consumer of the library needs to know which file types are multi-exif and which are fully parsed with a simple ExifIter? Wouldn't it be more ergonomic to extend the ExifIter internals to allow for multiple TIFF / IFD structures as in CR3 files, but leave a unified API for consumers parsing image files?

@mindeng
Copy link
Owner

mindeng commented Sep 21, 2025

Sorry, just getting back to this, I'm a little confused what the intention is with MultiExifIter. Is the idea that the consumer of the library needs to know which file types are multi-exif and which are fully parsed with a simple ExifIter? Wouldn't it be more ergonomic to extend the ExifIter internals to allow for multiple TIFF / IFD structures as in CR3 files, but leave a unified API for consumers parsing image files?

Yes, reusing and extending ExifIter is indeed an approach that maintains API consistency, but it comes with two issues:

  1. It would introduce additional complexity to ExifIter, making it harder to maintain and less aligned with the "Single Responsibility Principle."
  2. In practice, users may need to explicitly know they are handling multiple Exif data blocks. For example, if there are tag conflicts between multiple Exif blocks, what strategy should be adopted to handle them? MultiExifIter includes a duplicate_strategy field to address such cases, currently defaulting to the IgnoreDuplicates strategy (perhaps a set method should be added to allow users to control this strategy).

Under the current design, even if users continue to use ExifIter to receive parsing results for CR3 files, it will still work and provide access to the first Exif block's information. If they want to process all Exif blocks, they can optionally use MultiExifIter to receive the results.
Therefore, introducing MultiExifIter at this stage does not break API compatibility and preserves the option to extend ExifIter with built-in support for multiple Exif blocks in the future (if truly necessary).
However, if we were to directly modify ExifIter now to support multiple Exif structures, I still have the two concerns mentioned above, so I cannot yet make a decision.

These are my thoughts. Feel free to discuss and share your ideas anytime. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants