Skip to content

RichardScottOZ/Pairwise

Repository files navigation

Pairwise

Open-source parallel geospatial processing library

Reverse-engineered from ESRI's pairwise functionality for open use.

Overview

Pairwise is a Python library that provides parallel processing capabilities for geospatial operations. Inspired by ESRI ArcGIS Pro's pairwise tools, this library enables efficient processing of large geospatial datasets by distributing work across multiple CPU cores.

Key Features

  • Parallel Processing: Automatically divides work across available CPU cores
  • Flexible Configuration: Control the number of processes used (all cores, specific count, or percentage)
  • Multiple Operations: Support for all major ESRI pairwise tools
    • Buffer: Create buffer zones around features
    • Clip: Extract features within a boundary
    • Dissolve: Aggregate features by attributes
    • Erase: Remove overlapping portions
    • Intersect: Compute geometric intersections
    • Integrate: Snap vertices within tolerance
  • Multiple Input Types: Works with GeoDataFrame, Shapely geometries, and more
  • Extensible Architecture: Core processing engine can be used for various geospatial operations
  • ESRI-Compatible API: Similar interface to ESRI's pairwise tools for easy migration

Installation

pip install -r requirements.txt

Quick Start

Basic Operations

Buffer Operation

from pairwise import pairwise_buffer
import geopandas as gpd
from shapely.geometry import Point

# Create some sample data
gdf = gpd.GeoDataFrame(
    geometry=[Point(0, 0), Point(1, 1), Point(2, 2)]
)

# Buffer with automatic parallel processing
buffered = pairwise_buffer(gdf, distance=1.0)

Clip Operation

from pairwise import pairwise_clip
from shapely.geometry import box

# Clip features to a boundary
clip_boundary = box(0, 0, 2, 2)
clipped = pairwise_clip(gdf, clip_boundary)

Dissolve Operation

from pairwise import pairwise_dissolve

# Create features with categories
gdf = gpd.GeoDataFrame({
    'category': ['A', 'A', 'B'],
    'geometry': [box(0,0,1,1), box(1,0,2,1), box(0,1,1,2)]
})

# Dissolve by category
dissolved = pairwise_dissolve(gdf, by='category')

Intersect Operation

from pairwise import pairwise_intersect

# Find intersections between two feature sets
gdf1 = gpd.GeoDataFrame(geometry=[box(0,0,2,2)])
gdf2 = gpd.GeoDataFrame(geometry=[box(1,1,3,3)])
intersections = pairwise_intersect(gdf1, gdf2)

Erase Operation

from pairwise import pairwise_erase

# Remove portions that overlap with erase features
erase_area = box(0.5, 0.5, 1.5, 1.5)
erased = pairwise_erase(gdf, erase_area)

Integrate Operation

from pairwise import pairwise_integrate
from shapely.geometry import LineString

# Snap vertices within tolerance
lines = gpd.GeoDataFrame(geometry=[
    LineString([(0, 0), (1, 0)]),
    LineString([(1.001, 0), (2, 0)])
])
integrated = pairwise_integrate(lines, tolerance=0.01)

Configure Parallel Processing

from pairwise import pairwise_buffer, ParallelConfig

# Use specific number of processes
config = ParallelConfig(factor=4)
buffered = pairwise_buffer(gdf, distance=1.0, config=config)

# Use 50% of available cores
config = ParallelConfig(factor=0.5)
buffered = pairwise_buffer(gdf, distance=1.0, config=config)

# Use percentage as string
config = ParallelConfig(factor="75%")
buffered = pairwise_buffer(gdf, distance=1.0, config=config)

Advanced Usage with Core Processor

from pairwise import PairwiseProcessor, ParallelConfig
import geopandas as gpd

# Initialize processor
config = ParallelConfig(factor=4)
processor = PairwiseProcessor(config)

# Define custom operation
def custom_operation(batch):
    # Your custom processing logic here
    return batch.buffer(1.0)

# Process in parallel
results = processor.process_features(
    features=gdf,
    operation=custom_operation,
    merge_function=lambda results: pd.concat(results)
)

How It Works

The library implements ESRI's pairwise parallel processing pattern:

  1. Batch Division: Input features are divided into batches
  2. Parallel Processing: Each batch is processed on a separate CPU core
  3. Result Merging: Results from all batches are combined

This approach provides significant performance improvements for large datasets, especially on multi-core systems.

Comparison to ESRI Pairwise Tools

Feature ESRI Pairwise This Library
Parallel Processing
Buffer Operations
Clip Operations
Dissolve Operations
Erase Operations
Intersect Operations
Integrate Operations
Configurable CPU Usage
Open Source
Python API Limited Full
Works with GeoPandas Via Conversion Native

Performance

Performance improvements depend on:

  • Dataset size (larger datasets benefit more)
  • Number of CPU cores available
  • Complexity of geometric operations
  • System memory

Typical performance improvements: 2-8x faster on 4-8 core systems with large datasets (10,000+ features).

Requirements

  • Python 3.7+
  • numpy
  • (Optional) geopandas - for GeoDataFrame support
  • (Optional) shapely - for geometry operations

Examples

See the examples/ directory for more detailed examples:

  • basic_buffer.py - Simple buffer operations
  • advanced_usage.py - Custom operations with core processor
  • performance_comparison.py - Performance benchmarks

API Reference

Core Operations

All pairwise operations support parallel processing with configurable CPU usage.

pairwise_buffer(features, distance, config=None, dissolve=False, **kwargs)

Create buffer polygons using parallel processing.

Parameters:

  • features: Input features (GeoDataFrame, list of geometries, etc.)
  • distance: Buffer distance in coordinate system units
  • config: ParallelConfig object for controlling parallelism
  • dissolve: Whether to dissolve overlapping buffers
  • **kwargs: Additional arguments passed to buffer operation

Returns: Buffered features (same type as input)

pairwise_clip(input_features, clip_features, config=None)

Extract features that fall within clip boundary using parallel processing.

Parameters:

  • input_features: Features to clip
  • clip_features: Clip boundary (GeoDataFrame, geometry, or list)
  • config: ParallelConfig object

Returns: Clipped features

pairwise_dissolve(features, by=None, aggfunc='first', config=None)

Aggregate features based on attributes using parallel processing.

Parameters:

  • features: Features to dissolve
  • by: Field name(s) to dissolve by (None = dissolve all)
  • aggfunc: Aggregation function for attributes
  • config: ParallelConfig object

Returns: Dissolved features

pairwise_erase(input_features, erase_features, config=None)

Remove portions that overlap with erase features using parallel processing.

Parameters:

  • input_features: Features to erase from
  • erase_features: Features defining areas to erase
  • config: ParallelConfig object

Returns: Erased features

pairwise_intersect(input_features, intersect_features=None, config=None)

Compute geometric intersections using parallel processing.

Parameters:

  • input_features: First feature set
  • intersect_features: Second feature set (None = self-intersect)
  • config: ParallelConfig object

Returns: Intersection results

pairwise_integrate(features, tolerance, config=None)

Adjust vertices within tolerance for alignment using parallel processing.

Parameters:

  • features: Features to integrate
  • tolerance: Distance tolerance for snapping vertices
  • config: ParallelConfig object

Returns: Integrated features with adjusted vertices

ParallelConfig(factor=None)

Configuration for parallel processing.

Parameters:

  • factor: Controls number of processes
    • None or "auto": Use all available cores
    • int: Use exactly this many processes
    • float (0.0-1.0): Use this percentage of cores
    • str: Percentage like "50%" or number

PairwiseProcessor(config=None)

Core parallel processor for custom operations.

Methods:

  • process_features(features, operation, batch_size=None, merge_function=None): Process features in parallel batches
  • process_pairwise(features1, features2, operation, merge_function=None): Process pairwise operations between two feature sets

Contributing

Contributions are welcome! This is an open-source reverse engineering project aimed at providing free alternatives to proprietary GIS tools.

License

MIT License - See LICENSE file for details

Acknowledgments

This library is inspired by ESRI's ArcGIS Pro pairwise tools, reverse-engineered for open-source use. It is not affiliated with or endorsed by ESRI.

References

About

First pass at parallel version

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages