Skip to content

feat(xlsx): add pictures_with_positions method#592

Open
mgcrea wants to merge 1 commit intotafia:masterfrom
mgcrea:feat-pictures_with_positions
Open

feat(xlsx): add pictures_with_positions method#592
mgcrea wants to merge 1 commit intotafia:masterfrom
mgcrea:feat-pictures_with_positions

Conversation

@mgcrea
Copy link
Copy Markdown

@mgcrea mgcrea commented Dec 17, 2025

Hi!

Thanks for the great lib, I did need to properly extract images cell position info from a project I'm working on. Only works for xlsx files (either DrawingML anchors or 365 Rich Data). No breaking changes as I added a new pictures_with_positions method.

Add support for extracting embedded pictures along with their anchor
positions (sheet name, row, column). Parses DrawingML files to get
picture positions and supports both standard DrawingML anchors and
Excel 365 Rich Data format (cell images).

- Add Picture struct with position metadata
- Add pictures_with_positions() method to Reader trait
- Implement for Xlsx with DrawingML and Rich Data parsing
- Add helper methods: anchor_cell(), col_to_letter(), parse_cell_ref()
- Add test case with Rich Data Excel file
@jmcnamara
Copy link
Copy Markdown
Collaborator

Thanks. Overall it looks good, and useful. I'll try to get to a review soon.

In the meantime have a look at the, in progress, Contributor guide and make any required changes: #584 or this initial more comprehensive version: d204579

Maintainer's note: This would close #381

}

// Read pictures with position information from DrawingML.
// sheets must be added before this is called!!
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "sheets must be added before this is called" is there a better way to ensure that happened? A flag or something else. What happens if it isn't called? Is there a panic?

#[cfg(feature = "picture")]
xlsx.read_pictures()?;
#[cfg(feature = "picture")]
xlsx.read_pictures_with_positions()?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach causes the image related XML to be parsed twice: for xlsx.read_pictures() and xlsx.read_pictures_with_positions() which is inefficient. I get that you are trying to avoid breaking backward compatibility but I think it is the lesser of the two evils here. I'd suggest expanding the xlsx.read_pictures() function to add the new functionality.

pictures: Option<Vec<(String, Vec<u8>)>>,
/// Pictures with position information
#[cfg(feature = "picture")]
pictures_with_positions: Option<Vec<Picture>>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than an Option<Vec> would it make sense to just have Vec? If necessary the user can check if pictures_with_positions is empty.

let mut pictures = Vec::new();

// Step 2: For each sheet, find drawing relationships and parse drawings
for (sheet_name, sheet_path) in &self.sheets.clone() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sheets.clone() necessary here?

Comment on lines +196 to +209
/// Convert a 0-based column index to Excel column letter(s)
fn col_to_letter(col: u32) -> String {
let mut result = String::new();
let mut n = col;
loop {
result.insert(0, (b'A' + (n % 26) as u8) as char);
if n < 26 {
break;
}
n = n / 26 - 1;
}
result
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the existing push_column() function can be used instead.

Comment on lines +167 to +183
#[cfg(feature = "picture")]
#[cfg_attr(docsrs, doc(cfg(feature = "picture")))]
#[derive(Debug, Clone)]
pub struct Picture {
/// File extension (e.g., "png", "jpeg")
pub extension: String,
/// Raw image data
pub data: Vec<u8>,
/// Sheet name where the picture is anchored
pub sheet_name: Option<String>,
/// Row index (0-based) where picture is anchored
pub anchor_row: Option<u32>,
/// Column index (0-based) where picture is anchored
pub anchor_col: Option<u32>,
/// Original filename in the media folder (e.g., "image1.png")
pub media_name: Option<String>,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid the word anchor in the public facing functions. That won't make sense to end users. Just use row and col instead.

Also, sheet_name, anchor_row and anchor_col don't have to be Options is you replace the read_pictures() method with the new code.

media_name should be just name.

/// Get pictures with position information
///
/// Returns embedded pictures along with their anchor positions (sheet name, row, column).
/// This method parses DrawingML files to extract position information.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to mention DrawingML to the end user. Most won't know/care what it is.

Comment on lines +428 to +443
/// ```ignore
/// use calamine::{Reader, open_workbook, Xlsx};
///
/// let mut workbook: Xlsx<_> = open_workbook("file.xlsx")?;
/// if let Some(pics) = workbook.pictures_with_positions() {
/// for pic in pics {
/// println!(
/// "Image: {}.{}, Sheet: {:?}, Cell: {:?}",
/// pic.media_name.as_deref().unwrap_or("unknown"),
/// pic.extension,
/// pic.sheet_name,
/// pic.anchor_cell()
/// );
/// }
/// }
/// ```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a real sample file (existing or added to the tests directory) so that this example can be run as a test.

Ok(())
}

// cargo test --features picture pictures_with_positions_drawingml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short explanation about what is being tested, not how to run it.

use zip::result::ZipError;

use crate::datatype::DataRef;
use crate::formats::{builtin_format_by_id, detect_custom_number_format, CellFormat};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall comment. Make sure to rustfmt the code additions/changes.

@jmcnamara
Copy link
Copy Markdown
Collaborator

@mgcrea Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants