feat(xlsx): add pictures_with_positions method#592
feat(xlsx): add pictures_with_positions method#592mgcrea wants to merge 1 commit intotafia:masterfrom
Conversation
Add support for extracting embedded pictures along with their anchor positions (sheet name, row, column). Parses DrawingML files to get picture positions and supports both standard DrawingML anchors and Excel 365 Rich Data format (cell images). - Add Picture struct with position metadata - Add pictures_with_positions() method to Reader trait - Implement for Xlsx with DrawingML and Rich Data parsing - Add helper methods: anchor_cell(), col_to_letter(), parse_cell_ref() - Add test case with Rich Data Excel file
| } | ||
|
|
||
| // Read pictures with position information from DrawingML. | ||
| // sheets must be added before this is called!! |
There was a problem hiding this comment.
If "sheets must be added before this is called" is there a better way to ensure that happened? A flag or something else. What happens if it isn't called? Is there a panic?
| #[cfg(feature = "picture")] | ||
| xlsx.read_pictures()?; | ||
| #[cfg(feature = "picture")] | ||
| xlsx.read_pictures_with_positions()?; |
There was a problem hiding this comment.
This approach causes the image related XML to be parsed twice: for xlsx.read_pictures() and xlsx.read_pictures_with_positions() which is inefficient. I get that you are trying to avoid breaking backward compatibility but I think it is the lesser of the two evils here. I'd suggest expanding the xlsx.read_pictures() function to add the new functionality.
| pictures: Option<Vec<(String, Vec<u8>)>>, | ||
| /// Pictures with position information | ||
| #[cfg(feature = "picture")] | ||
| pictures_with_positions: Option<Vec<Picture>>, |
There was a problem hiding this comment.
Rather than an Option<Vec> would it make sense to just have Vec? If necessary the user can check if pictures_with_positions is empty.
| let mut pictures = Vec::new(); | ||
|
|
||
| // Step 2: For each sheet, find drawing relationships and parse drawings | ||
| for (sheet_name, sheet_path) in &self.sheets.clone() { |
There was a problem hiding this comment.
Is sheets.clone() necessary here?
| /// Convert a 0-based column index to Excel column letter(s) | ||
| fn col_to_letter(col: u32) -> String { | ||
| let mut result = String::new(); | ||
| let mut n = col; | ||
| loop { | ||
| result.insert(0, (b'A' + (n % 26) as u8) as char); | ||
| if n < 26 { | ||
| break; | ||
| } | ||
| n = n / 26 - 1; | ||
| } | ||
| result | ||
| } | ||
|
|
There was a problem hiding this comment.
I think the existing push_column() function can be used instead.
| #[cfg(feature = "picture")] | ||
| #[cfg_attr(docsrs, doc(cfg(feature = "picture")))] | ||
| #[derive(Debug, Clone)] | ||
| pub struct Picture { | ||
| /// File extension (e.g., "png", "jpeg") | ||
| pub extension: String, | ||
| /// Raw image data | ||
| pub data: Vec<u8>, | ||
| /// Sheet name where the picture is anchored | ||
| pub sheet_name: Option<String>, | ||
| /// Row index (0-based) where picture is anchored | ||
| pub anchor_row: Option<u32>, | ||
| /// Column index (0-based) where picture is anchored | ||
| pub anchor_col: Option<u32>, | ||
| /// Original filename in the media folder (e.g., "image1.png") | ||
| pub media_name: Option<String>, | ||
| } |
There was a problem hiding this comment.
Avoid the word anchor in the public facing functions. That won't make sense to end users. Just use row and col instead.
Also, sheet_name, anchor_row and anchor_col don't have to be Options is you replace the read_pictures() method with the new code.
media_name should be just name.
| /// Get pictures with position information | ||
| /// | ||
| /// Returns embedded pictures along with their anchor positions (sheet name, row, column). | ||
| /// This method parses DrawingML files to extract position information. |
There was a problem hiding this comment.
No need to mention DrawingML to the end user. Most won't know/care what it is.
| /// ```ignore | ||
| /// use calamine::{Reader, open_workbook, Xlsx}; | ||
| /// | ||
| /// let mut workbook: Xlsx<_> = open_workbook("file.xlsx")?; | ||
| /// if let Some(pics) = workbook.pictures_with_positions() { | ||
| /// for pic in pics { | ||
| /// println!( | ||
| /// "Image: {}.{}, Sheet: {:?}, Cell: {:?}", | ||
| /// pic.media_name.as_deref().unwrap_or("unknown"), | ||
| /// pic.extension, | ||
| /// pic.sheet_name, | ||
| /// pic.anchor_cell() | ||
| /// ); | ||
| /// } | ||
| /// } | ||
| /// ``` |
There was a problem hiding this comment.
Use a real sample file (existing or added to the tests directory) so that this example can be run as a test.
| Ok(()) | ||
| } | ||
|
|
||
| // cargo test --features picture pictures_with_positions_drawingml |
There was a problem hiding this comment.
Add a short explanation about what is being tested, not how to run it.
| use zip::result::ZipError; | ||
|
|
||
| use crate::datatype::DataRef; | ||
| use crate::formats::{builtin_format_by_id, detect_custom_number_format, CellFormat}; |
There was a problem hiding this comment.
Overall comment. Make sure to rustfmt the code additions/changes.
|
@mgcrea Any update on this? |
Hi!
Thanks for the great lib, I did need to properly extract images cell position info from a project I'm working on. Only works for xlsx files (either DrawingML anchors or 365 Rich Data). No breaking changes as I added a new
pictures_with_positionsmethod.