[SPARKNLP-1317] Further NerDL Optimizations #14699

DevinTDHa · 2025-11-24T09:37:47Z

Description

This PR introduces further optimizations for NerDLApproach:

Data can now be fed with a threaded dataloader, when using setEnableMemoryOptimizer(true) and setPrefetchBatches(int)
Dataframe partitioning is now optimized for NerDLApproach training by default, can be disabled with setOptimizePartitioning(false).

Motivation and Context

Training is slow on clusters and the threaded dataloader improves training times
If using large partitions, the driver node is at risk of running out of memory. The optmized partitioning prevents this.

How Has This Been Tested?

Old and new tests passing

Threaded NerDLDataLoader fetches batches in the background while training is happening in NerDLApproach, reducing idle time in the driver thread.

Allow NerDLApproach to repartition the input dataset, so the driver does not go out of memory when training on large partitions.

DevinTDHa added 4 commits November 19, 2025 16:46

NerDLGraphChecker add missing setter on scala side

5dcb0af

Introduce NerDLDataLoader for NerDLApproach

60227de

Threaded NerDLDataLoader fetches batches in the background while training is happening in NerDLApproach, reducing idle time in the driver thread.

NerDLApproach: Optimize partitioning flag

6580ad5

Allow NerDLApproach to repartition the input dataset, so the driver does not go out of memory when training on large partitions.

NerDL Optimizations python side

631b350

DevinTDHa force-pushed the optimization/nerdl-dataloader branch from f1c8046 to 631b350 Compare November 24, 2025 11:47

DevinTDHa changed the base branch from master to release/623-release-candidate November 25, 2025 12:07

DevinTDHa changed the title ~~Further NerDL Optimizations~~ [SPARKNLP-1317] Further NerDL Optimizations Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARKNLP-1317] Further NerDL Optimizations #14699

[SPARKNLP-1317] Further NerDL Optimizations #14699

Uh oh!

DevinTDHa commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARKNLP-1317] Further NerDL Optimizations #14699

Are you sure you want to change the base?

[SPARKNLP-1317] Further NerDL Optimizations #14699

Uh oh!

Conversation

DevinTDHa commented Nov 24, 2025

Description

Motivation and Context

How Has This Been Tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant