apache-poi-batch

An experiment to explore ways to export large amounts of data with Apache POI.

At work we had some memory problems with exporting spreadsheets with Apache POI. Talking about solutions, we thought on creating a Proof of Concept with a more optimized solution, like closing and reopening the spreadsheet at every X rows somehow.

With this goal in mind, I created this project to generate a large amount of random fake data and export it with the implementation we have been using at work and with another more optimized implementation, after some research. Fortunately, this was easier than I thought.

Data generation

My focus here is exporting spreadsheets, so there is not database or API consumption. I used Java Faker library to generate data.

By default the application will generate 300000 random "Data" objects, but you can use another amount by passing a data.quantity property or JVM arguments. The DataRepositoryFakeImpl is responsible for the data generation and prints the progress when it's working.

Spreadsheet export

In this project, I implemented a "not optimized" spreadsheet export logic similar to the one we use where I work, and a "optimized" implementation that uses much less memory. Luckily, POI already have something useful for this: http://poi.apache.org/components/spreadsheet/limitations.html.

Not optimized

The "not optimized" logic uses the XSSFWorkbook implementation for Workbook, which keep everything on memory until it's written somewhere. So it's not hard to know that it can use a lot of memory:

Not optimized run without any max heap size. I remember that when I implemented it, it managed to successfully exported files without memory limit.

HSSFWorkbook Workbook's implementation has a limit of 65536 rows, so I couldn't use it for this test.

To run this implementation on this project, execute the main class with the profile not_optimized. Also, check the How to run section.

Optimized

The "optimized" logic uses SXSSFWorkbook implementation that keeps only a certain number of rows in memory, reducing drastically the memory usage, as in the image below:

Optimized run without any max heap size.

Still, I could use autoSizeColumn(), that I believed would not be possible with this Workbook implementation. So far, this implementation has no drawbacks for my needs.

To run this implementation on this project, execute the main class with the profile optimized. Also, check the How to run section.

Results

With the optimized implementation I could export 300000 rows while limiting the JVM max heap size to 150 MB (-Xmx150m):

Optimized run with 150 MB max heap size. It couldn't successfully export files with values lower than this.

But the POI implementation wasn't everything: setting null to each list position after inserted on the sheet also helped a lot (but not enough to export without SXSSFWorkbook).

How to run

Clone this repository and open it on your favorite Java IDE (like IntelliJ or Eclipse);
Run Maven to import all dependencies;
Configure the profile to not-optimized or optimized. You can manually set the spring.profiles.active property inside application.properties or when configuring your IDE to run the application, like here;
(Optional) You can also set the amount of data to generate, as described in Data generation;
Configure the output destination through Command Line argument destination. It must be a folder, and a spreadsheet file will be written on it with the pattern Exported spreadsheet yyyy-MM-ddTHH:mm:ss.ssssss.xlsx;
Run the main class;
(Optional) Monitor the performance and memory usage through a Profiler, like VisualVM or JProfiler.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.mvn/wrapper		.mvn/wrapper
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

apache-poi-batch

An experiment to explore ways to export large amounts of data with Apache POI.

Data generation

Spreadsheet export

Not optimized

Optimized

Results

How to run

About

Releases

Packages

Languages

FernandoAlvarez28/apache-poi-batch

Folders and files

Latest commit

History

Repository files navigation

apache-poi-batch

An experiment to explore ways to export large amounts of data with Apache POI.

Data generation

Spreadsheet export

Not optimized

Optimized

Results

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages