Skip to content
This repository has been archived by the owner on Jan 2, 2019. It is now read-only.

Write conditional formatting styles to worksheet instead of copying to e... #172

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

sorpigal
Copy link

Write conditional formatting styles to worksheet instead of copying to each individual cell.

This change reduces the size of the generated XML, especially for large ranges, and does not appear to cause any harm. The original behavior looked kind of crazy, to me, and definitely makes Excel2012 unhappy.

Conditional formatting can now apply to a very large range without ridiculously inflating the time it takes to rewrite the XLSX. I have successfully used range sizes up to the maximum that Excel 2010 supports.

@sorpigal
Copy link
Author

If you create a workbook with a large number of conditional formatting styles that apply to an expression or a set of cells and the ranges involved are very larget (encompassing hundreds or thousands of cells) and then load the XLSX file with PHPExcel and write it out again the resulting sheet.xml for the sheet containing the conditional formatting will be excessively large.

In my real world case where this issue was encountered a sheet.xml which was 2.2M in size as written by Excel 2010 became 230M+ when read and then written by PHPExcel.

It seems like PHPExcel is generally used either to create workbooks from scratch or to read them to extract values and is not often used in my scenario, which is to read a complex template, modify it and write it back out. It is only in this last scenario where the problem becomes obvious.

In the Excel2007 reader after conditional formatting is read each range is, for reasons I cannot identify, expanded into individual cell references and then each conditional formatting block is applied to each cell. This can be highly non-performant and does not appear to be necessary.

@MarkBaker
Copy link
Member

I'll definitely be taking a look at this, but will want to run a good few tests before I merge it; but it could really be helpful. Thanks

@sorpigal
Copy link
Author

Looking at this further today I can say that it's definitely not right as it is. Splitting on space is wrong, instead the ref should be written as-is at read time and the write-time code must be updated to write the correct sort of cell ref. I have some code for this but it's not quite commit-ready. When it is I'll push to the same branch.

…roper cell reference when writing <formula> blocks. Support for stopIfTrue and {notC,c}ontainsBlanks was added.
@syntax53
Copy link

Extremely interested in being able to apply conditional formatting at the worksheet level. Has there been any more development on this? I have a spreadsheet with conditional formatting on 5 columns of 3500 rows. As you can imagine, applying CF to each cell of those columns is not pretty in many ways.

@sorpigal
Copy link
Author

I can attest that conditional formatting for a large number of rows and a small number of columns is not hard in Excel2007. In the worksheet XML you write a conditionalFormatting stanza with an sqref that is a range (e.g. A1:B10). Each actual cfRule formula should reference any row (doesn't matter) as if it applied just to one row. Getting them in the right order is something I haven't figured out (you'd think it was just straight priority= on the cfRule, but it doesn't seem to work to just write it out sequentially. Perhaps reverse-sequentially as seems common in Excel-made sheets? Needs investigation). Fortunately for most people the order in which distinct rulesets are tested is not important.

PHPExcel at the moment is missing support for some formula types, but if you use each one systematically in an XLSX and then unpack the sheet1.xml (or whichever) you'll find the structure quite easy to duplicate.

Sadly the code for this that I have written is not redistributable. Anyone with a PHP dev environment and a copy of Excel 2007 or later should be able to implement this in a couple days. Keep in mind that older versions of Excel have a hard limit on the number of conditional formatting rules for a worksheet(16, IIRC), but for Excel2007 the limit is much higher (or maybe gone).

foreach ($aReferences as $reference) {
$docSheet->getStyle($reference)->setConditionalStyles($conditionalStyles);
foreach(explode(' ', $ref) as $single_ref){
docSheet->setConditionalStyles($single_ref, $conditionalStyles);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docSheet => $docSheet

@syntax53
Copy link

Briefly looking at that it looks like that will move the conditionals from the cells to the worksheet, but the larger issue is they need to be specified differently in the first place. e.g. Instead of applying 5000 conditional statements on 5000 cells, apply one conditional that covers the range of those cells. I assume this wouldn't change the current behavior of the duplicateConditionalStyle function?

@sorpigal
Copy link
Author

So as it turns out I do have some improvements for this sitting in my local repo but I have absolutely no way to verify correctness (no Excel2007) and don't have a php dev environment at the moment. I will commit this anyway in case someone with the wherewithal to look in to it wants to do so.

@sorpigal
Copy link
Author

So, just to summarize: The first commit was wrong (split on space is wrong). The second commit fixes the reader and added some of the missing cfRule types (this works OK). The last commit just adds stopIfTrue and priority writing. Priority as it stands is wrong but not harmful. The assumption that the conditional string cell reference for the *ConditionalStyles functions would be a single cell has implicitly changed to allow it to be a single cell or a range of cells. In theory $_conditionalStylesCollection could contain any sort of cell ref (even a range), but I don't know if that would cause problems elsewhere in the code.

Yes, duplicateConditionalStyle doesn't match what Excel2007 expects from real range styles (it just copies cell styles around). It might be appropriate to update duplicateConditionalStyle to simply create one entry for the range in the conditionalStylesCollection, but at that point it would be no different than calling setConditionalStyles with the first param as a range string.

I expect this all to work correctly but I just cannot test. Sorry.

$objConditional->setStyle(clone $dxfs[intval($cfRule["dxfId"])]);

// the rule may unset formatting in which case dxfId will not be present
if(isset($cfRule["dxfId"]])){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parse error: syntax error, unexpected ']' in PHPExcel\Reader\Excel2007.php on line 944

@rentalhost
Copy link

After fix some fatal errors (cited above) it fixed a infinity looping ocurring on Writer.
Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants