At work we had some memory problems with exporting spreadsheets with Apache POI. Talking about solutions, we thought on creating a Proof of Concept with a more optimized solution, like closing and reopening the spreadsheet at every X rows somehow.
With this goal in mind, I created this project to generate a large amount of random fake data and export it with the implementation we have been using at work and with another more optimized implementation, after some research. Fortunately, this was easier than I thought.
My focus here is exporting spreadsheets, so there is not database or API consumption. I used Java Faker library to generate data.
By default the application will generate 300000 random "Data" objects, but you can use another amount by passing a data.quantity
property or JVM arguments. The DataRepositoryFakeImpl is responsible for the data generation and prints the progress when it's working.
In this project, I implemented a "not optimized" spreadsheet export logic similar to the one we use where I work, and a "optimized" implementation that uses much less memory. Luckily, POI already have something useful for this: http://poi.apache.org/components/spreadsheet/limitations.html.
The "not optimized" logic uses the XSSFWorkbook implementation for Workbook, which keep everything on memory until it's written somewhere. So it's not hard to know that it can use a lot of memory:
Not optimized run without any max heap size. I remember that when I implemented it, it managed to successfully exported files without memory limit.
HSSFWorkbook Workbook's implementation has a limit of 65536 rows, so I couldn't use it for this test.
To run this implementation on this project, execute the main class with the profile not_optimized
. Also, check the How to run section.
The "optimized" logic uses SXSSFWorkbook implementation that keeps only a certain number of rows in memory, reducing drastically the memory usage, as in the image below:
Optimized run without any max heap size.
Still, I could use autoSizeColumn(), that I believed would not be possible with this Workbook implementation. So far, this implementation has no drawbacks for my needs.
To run this implementation on this project, execute the main class with the profile optimized
. Also, check the How to run section.
With the optimized implementation I could export 300000 rows while limiting the JVM max heap size to 150 MB (-Xmx150m
):
Optimized run with 150 MB max heap size. It couldn't successfully export files with values lower than this.
But the POI implementation wasn't everything: setting null to each list position after inserted on the sheet also helped a lot (but not enough to export without SXSSFWorkbook
).
- Clone this repository and open it on your favorite Java IDE (like IntelliJ or Eclipse);
- Run Maven to import all dependencies;
- Configure the profile to
not-optimized
oroptimized
. You can manually set thespring.profiles.active
property inside application.properties or when configuring your IDE to run the application, like here; - (Optional) You can also set the amount of data to generate, as described in Data generation;
- Configure the output destination through Command Line argument
destination
. It must be a folder, and a spreadsheet file will be written on it with the patternExported spreadsheet yyyy-MM-ddTHH:mm:ss.ssssss.xlsx
; - Run the main class;
- (Optional) Monitor the performance and memory usage through a Profiler, like VisualVM or JProfiler.