Open-source parallel geospatial processing library
Reverse-engineered from ESRI's pairwise functionality for open use.
Pairwise is a Python library that provides parallel processing capabilities for geospatial operations. Inspired by ESRI ArcGIS Pro's pairwise tools, this library enables efficient processing of large geospatial datasets by distributing work across multiple CPU cores.
- Parallel Processing: Automatically divides work across available CPU cores
- Flexible Configuration: Control the number of processes used (all cores, specific count, or percentage)
- Multiple Operations: Support for all major ESRI pairwise tools
- Buffer: Create buffer zones around features
- Clip: Extract features within a boundary
- Dissolve: Aggregate features by attributes
- Erase: Remove overlapping portions
- Intersect: Compute geometric intersections
- Integrate: Snap vertices within tolerance
- Multiple Input Types: Works with GeoDataFrame, Shapely geometries, and more
- Extensible Architecture: Core processing engine can be used for various geospatial operations
- ESRI-Compatible API: Similar interface to ESRI's pairwise tools for easy migration
pip install -r requirements.txtfrom pairwise import pairwise_buffer
import geopandas as gpd
from shapely.geometry import Point
# Create some sample data
gdf = gpd.GeoDataFrame(
geometry=[Point(0, 0), Point(1, 1), Point(2, 2)]
)
# Buffer with automatic parallel processing
buffered = pairwise_buffer(gdf, distance=1.0)from pairwise import pairwise_clip
from shapely.geometry import box
# Clip features to a boundary
clip_boundary = box(0, 0, 2, 2)
clipped = pairwise_clip(gdf, clip_boundary)from pairwise import pairwise_dissolve
# Create features with categories
gdf = gpd.GeoDataFrame({
'category': ['A', 'A', 'B'],
'geometry': [box(0,0,1,1), box(1,0,2,1), box(0,1,1,2)]
})
# Dissolve by category
dissolved = pairwise_dissolve(gdf, by='category')from pairwise import pairwise_intersect
# Find intersections between two feature sets
gdf1 = gpd.GeoDataFrame(geometry=[box(0,0,2,2)])
gdf2 = gpd.GeoDataFrame(geometry=[box(1,1,3,3)])
intersections = pairwise_intersect(gdf1, gdf2)from pairwise import pairwise_erase
# Remove portions that overlap with erase features
erase_area = box(0.5, 0.5, 1.5, 1.5)
erased = pairwise_erase(gdf, erase_area)from pairwise import pairwise_integrate
from shapely.geometry import LineString
# Snap vertices within tolerance
lines = gpd.GeoDataFrame(geometry=[
LineString([(0, 0), (1, 0)]),
LineString([(1.001, 0), (2, 0)])
])
integrated = pairwise_integrate(lines, tolerance=0.01)from pairwise import pairwise_buffer, ParallelConfig
# Use specific number of processes
config = ParallelConfig(factor=4)
buffered = pairwise_buffer(gdf, distance=1.0, config=config)
# Use 50% of available cores
config = ParallelConfig(factor=0.5)
buffered = pairwise_buffer(gdf, distance=1.0, config=config)
# Use percentage as string
config = ParallelConfig(factor="75%")
buffered = pairwise_buffer(gdf, distance=1.0, config=config)from pairwise import PairwiseProcessor, ParallelConfig
import geopandas as gpd
# Initialize processor
config = ParallelConfig(factor=4)
processor = PairwiseProcessor(config)
# Define custom operation
def custom_operation(batch):
# Your custom processing logic here
return batch.buffer(1.0)
# Process in parallel
results = processor.process_features(
features=gdf,
operation=custom_operation,
merge_function=lambda results: pd.concat(results)
)The library implements ESRI's pairwise parallel processing pattern:
- Batch Division: Input features are divided into batches
- Parallel Processing: Each batch is processed on a separate CPU core
- Result Merging: Results from all batches are combined
This approach provides significant performance improvements for large datasets, especially on multi-core systems.
| Feature | ESRI Pairwise | This Library |
|---|---|---|
| Parallel Processing | ✓ | ✓ |
| Buffer Operations | ✓ | ✓ |
| Clip Operations | ✓ | ✓ |
| Dissolve Operations | ✓ | ✓ |
| Erase Operations | ✓ | ✓ |
| Intersect Operations | ✓ | ✓ |
| Integrate Operations | ✓ | ✓ |
| Configurable CPU Usage | ✓ | ✓ |
| Open Source | ✗ | ✓ |
| Python API | Limited | Full |
| Works with GeoPandas | Via Conversion | Native |
Performance improvements depend on:
- Dataset size (larger datasets benefit more)
- Number of CPU cores available
- Complexity of geometric operations
- System memory
Typical performance improvements: 2-8x faster on 4-8 core systems with large datasets (10,000+ features).
- Python 3.7+
- numpy
- (Optional) geopandas - for GeoDataFrame support
- (Optional) shapely - for geometry operations
See the examples/ directory for more detailed examples:
basic_buffer.py- Simple buffer operationsadvanced_usage.py- Custom operations with core processorperformance_comparison.py- Performance benchmarks
All pairwise operations support parallel processing with configurable CPU usage.
Create buffer polygons using parallel processing.
Parameters:
features: Input features (GeoDataFrame, list of geometries, etc.)distance: Buffer distance in coordinate system unitsconfig: ParallelConfig object for controlling parallelismdissolve: Whether to dissolve overlapping buffers**kwargs: Additional arguments passed to buffer operation
Returns: Buffered features (same type as input)
Extract features that fall within clip boundary using parallel processing.
Parameters:
input_features: Features to clipclip_features: Clip boundary (GeoDataFrame, geometry, or list)config: ParallelConfig object
Returns: Clipped features
Aggregate features based on attributes using parallel processing.
Parameters:
features: Features to dissolveby: Field name(s) to dissolve by (None = dissolve all)aggfunc: Aggregation function for attributesconfig: ParallelConfig object
Returns: Dissolved features
Remove portions that overlap with erase features using parallel processing.
Parameters:
input_features: Features to erase fromerase_features: Features defining areas to eraseconfig: ParallelConfig object
Returns: Erased features
Compute geometric intersections using parallel processing.
Parameters:
input_features: First feature setintersect_features: Second feature set (None = self-intersect)config: ParallelConfig object
Returns: Intersection results
Adjust vertices within tolerance for alignment using parallel processing.
Parameters:
features: Features to integratetolerance: Distance tolerance for snapping verticesconfig: ParallelConfig object
Returns: Integrated features with adjusted vertices
Configuration for parallel processing.
Parameters:
factor: Controls number of processesNoneor"auto": Use all available coresint: Use exactly this many processesfloat(0.0-1.0): Use this percentage of coresstr: Percentage like "50%" or number
Core parallel processor for custom operations.
Methods:
process_features(features, operation, batch_size=None, merge_function=None): Process features in parallel batchesprocess_pairwise(features1, features2, operation, merge_function=None): Process pairwise operations between two feature sets
Contributions are welcome! This is an open-source reverse engineering project aimed at providing free alternatives to proprietary GIS tools.
MIT License - See LICENSE file for details
This library is inspired by ESRI's ArcGIS Pro pairwise tools, reverse-engineered for open-source use. It is not affiliated with or endorsed by ESRI.