Development guide for contributing to InsightfulPy v0.2.0.
See CONTRIBUTING.md for setup instructions including prerequisites, installation, and pre-commit hooks.
src/insightfulpy/
__init__.py # Public API and help functions
core.py # Core utilities and imports
constants.py # Configuration constants
eda.py # Backward compatibility layer
summary.py # Summary statistics
statistics.py # Statistical calculations
data_quality.py # Data quality checks
visualization.py # Basic visualizations
advanced_viz.py # Pair-wise visualizations
analysis.py # Individual column analysis
comparison.py # Multi-dataset comparison
Core Infrastructure:
core.py: Environment detection, dependency imports, warning suppressionconstants.py: Centralized configuration values__init__.py: Public API definition, function categorization, help system
Function Modules:
summary.py: DataFrame summaries and groupingstatistics.py: Statistical calculationsdata_quality.py: Missing values, outliers, data typesvisualization.py: Single-variable visualizationsadvanced_viz.py: Multi-variable visualizationsanalysis.py: Individual column analysiscomparison.py: Multi-dataset operations
Compatibility Layer:
eda.py: Import hub for backward compatibility (no implementations)
All function modules follow this pattern:
from __future__ import annotations
from typing import Any, Dict, List, Optional, Tuple, Union
from .core import *All functions use type hints:
def detect_outliers(
data: pd.DataFrame,
max_display: int = constants.DEFAULT_MAX_DISPLAY_OUTLIERS
) -> pd.DataFrame:
"""Detects outliers using the IQR method."""
# ImplementationUse concise docstrings:
def mad(data: pd.Series) -> float:
"""Calculate Median Absolute Deviation."""
result = np.mean(np.abs(data - data.mean()))
return float(result)Use constants from constants.py instead of magic numbers:
# Good
if data[col].nunique() > constants.DEFAULT_HIGH_CARDINALITY_THRESHOLD:
print("High cardinality")Use _safe_display() for DataFrame output (works in Jupyter and terminal):
from .core import _safe_display
_safe_display(df)Never implement functions in eda.py. Only import and re-export from specialized modules.
Three-step pattern for backward compatibility:
- Implement in specialized module (e.g.,
statistics.py) - Import in
eda.pyand add to__all__ - Re-export in
__init__.pyand add to category list
Minimum coverage: 80%. Fixtures in conftest.py.
pytest # All tests
pytest tests/test_statistics.py::test_calc_stats # Specific test
pytest -n auto # Parallel
pytest --cov=src/insightfulpy # Coverageblack src/ tests/ # Format
isort --profile=black src/ tests/ # Sort imports
flake8 src/ tests/ # Lint
mypy src/ # Type check
pre-commit run --all-files # Run all checksSee CONTRIBUTING.md for commit message conventions.
- Create feature branch from main
- Implement changes
- Run tests and code quality checks
- Commit with conventional format
- Push and create pull request
Branch naming:
feature/description- New featuresfix/description- Bug fixesdocs/description- Documentation updatesrefactor/description- Code refactoring
See CONTRIBUTING.md for pre-commit hook details.
Before submitting:
- All tests pass (
pytest) - Coverage meets 80% threshold
- Code formatted with Black
- Imports sorted with isort
- No linting errors (Flake8)
- Type hints added (mypy passes)
- Documentation updated
- Commit messages follow convention
- Pre-commit hooks pass
- CONTRIBUTING.md - Contribution workflow
- Configuration - Constants reference
Version: 0.2.0 | Status: Beta | Python: 3.8-3.12
Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/insightfulpy