Approach to Parallelize Blocking Calls for Large Multi Stream #1208

murphye · 2023-02-05T18:14:00Z

murphye
Feb 5, 2023

I could use some guidance on how I should implement a complex scenario to process a large amount of data, potentially using two Multi streams.

For my use case, I have very CPU-intensive work that I need to parallelize. I plan to run a Point in Polygon GIS function. I expect to spread this load across 16 CPU cores. The specific call that I plan to make is PreparedGeometry.contains https://locationtech.github.io/jts/javadoc/org/locationtech/jts/geom/prep/PreparedGeometry.html#contains-org.locationtech.jts.geom.Geometry- which is a blocking call.

If I were to just implement some basic imperative code for this scenario, it might look like this:

for(Polygon polygon: polygons) { // Large amount of polygons

    for(Point point: points) { // Large amount of points
    
        if(polygon.contains(point)) { // Blocking call that I want to spread across multiple CPU cores

            // Perform a non-blocking update of an in-memory cache

        }
    }
}

My first thought on my implementation is to use Vert.x Verticle scaled to 16 instances with executeBlocking, ordered being false, and a dedicated thread pool of size 16. This would allow me to create a loop structure (to process all Point in all Polygon and make non-blocking calls around this blocking call (such as updating a threadsafe cache).

Now, I am thinking this should all be possible with Mutiny's Multi. I would have a Multi<Point> stream and a Multi<Polygon> stream. I would need to iterate across both Multi streams so I process each and every point for each and every polygon. I would also need to make the blocking calls to PreparedGeometry.contains as the stream is processed.

I did read https://smallrye.io/smallrye-mutiny/2.0.0/guides/emit-on-vs-run-subscription-on/ and I am still confused if emitOn or runSubscriptionOn with a dedicated ThreadPoolExecutor would be the best choice for my use case for allowing a large number of parallel blocking calls while processing the Multi. Once again, the order of processing does not matter.

What guidance might you have for me to implement this complex scenario?

How to process all Multi<Point> against all Multi<Polygon>?
How to correctly handle the blocking call with arunSubscriptionOn or emitOn?
How to distribute load across 16 CPU cores using a ThreadPoolExecutor?

murphye · 2023-02-05T22:28:33Z

murphye
Feb 5, 2023
Author

OK, I thought about this some more, and here is a very contrived solution that doesn't seem ideal to me but does simplify things a bit. I create a Multi<PointInPolygon> from the two lists and process that instead.

This solution doesn't really take full advantage of Mutiny and I would prefer to source the data from a Multi<PreparedPolygon> and a Multi<Point> for this example, not just ArrayList.

Thank you for any thoughts you may have on a better way to do this!

Untested Code:

    record PointInPolygon(
        PreparedPolygon polygon,
        Point point,
        AtomicBoolean isPointInPolygon
    ) {}

    void process() {

        ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(16);

        var polygonList = new ArrayList<PreparedPolygon>(); // Pretend this is populated with data
        var pointList = new ArrayList<Point>(); // Pretend this is populated with data
        var pointInPolygonList = new ArrayList<PointInPolygon>();

        // Create a very long list of PointInPolygon to process
        for (var polygon : polygonList) {
            for (var point : pointList) {
                pointInPolygonList.add(new PointInPolygon(polygon, point, new AtomicBoolean(false)));
            }
        }

        Multi.createFrom().iterable(pointInPolygonList).onItem().invoke(pip -> {
                    if (pip.polygon.contains(pip.point)) { // Blocking call
                        pip.isPointInPolygon.set(true);
                    }
                }).runSubscriptionOn(executor)
                .subscribe().with(
                        item -> System.out.println("Item: " + item),
                        Throwable::printStackTrace);
    }

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approach to Parallelize Blocking Calls for Large Multi Stream #1208

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Approach to Parallelize Blocking Calls for Large Multi Stream #1208

murphye Feb 5, 2023

Replies: 1 comment

murphye Feb 5, 2023 Author

murphye
Feb 5, 2023

murphye
Feb 5, 2023
Author