A Python package for modeling and forecasting the effectiveness of identification techniques at scale. It provides tools to predict how the accuracy of identification methods changes as the population size increases.
This package helps analyze three types of identification methods:
- Exact matching: Identifying individuals using exact matches of attributes (e.g., demographics)
- Sparse matching: Identification using sparse data points (e.g., location history)
- Robust matching: Machine learning-based identification handling noisy or approximate data
Key terminology:
- κ (kappa): The fraction of people accurately identified in a population
- Gallery size: The number of individuals against which identification is attempted
- k-anonymity: A privacy measure ensuring each combination of attributes appears at least k times
- Empirical Analysis: Fast numpy code to analyze identification accuracy across different gallery sizes
- Scaling Prediction: Two-parameter Bayesian model to forecast identification correctness (κ), uniqueness, and % of k-anonymity violations at larger scales
- Extrapolation: Methods to extrapolate small-scale experimental results to real-world scenarios
This project uses pixi for package management to ensure reproducible environments:
pixi install
Requirements:
- Python ≥ 3.11
- numpy ≥ 2.0.0
- pandas ≥ 2.2.2
- scipy ≥ 1.14.0
from dataless.extrapolate import PYPExtrapolation
import pandas as pd
import numpy as np
# Create sample data: identification accuracy at different gallery sizes
d = pd.DataFrame({'n': [1, 10, 100], 'κ': [1, 0.99, 0.95]})
# Train model and predict accuracy at larger scales
model = PYPExtrapolation(d)
model.train()
model.test(np.array([1, 10, 100, 1000, 10000]))
# array([1. , 0.99000117, 0.95000214, 0.88420427, 0.81462242])
pixi run test
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Run tests to ensure they pass
- Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Please report bugs and request features using the issue tracker. When reporting bugs:
- Describe what you expected to happen
- Describe what actually happened
- Include code samples and error messages if relevant
- Include version information (Python, dataless, key dependencies)
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.