You can install LintData via pip:
pip install lintdataVia UV:
uv add lintdataOr install from source:
git clone https://github.com/patelheet30/lintdata.git
cd lintdata
pip install -e .✅ 20+ Data Quality Checks - Missing values, duplicates, outliers, type consistency, and more
✅ Zero Configuration - Works out of the box with sensible defaults
✅ Highly Configurable - Customize thresholds and select specific checks
✅ Multiple Export Formats - Text, HTML, JSON, and CSV reports
✅ Custom Checks API - Extend with your own validation logic
✅ Pandas Native - Integrates seamlessly via .lint accessor
import pandas as pd
import lintdata
# Load your DataFrame
df = pd.read_csv("your_data.csv")
# Run quality checks
report = df.lint.report()
print(report)Example Output:
--- LintData Quality Report ---
Shape: (1000, 8)
Running Checks:
Found 5 issue(s):
1. [Missing Values] Column 'age': 45 missing values (4.5%)
2. [Duplicates] Found 12 duplicate rows (1.2% of data)
3. [Outliers] Column 'salary': 8 potential outliers detected (IQR method)
4. [Mixed Types] Column 'phone' contains both numeric and string values
5. [High Cardinality] Column 'user_id' has 987 unique values (98.7%)
--- End of Report ---
LintData includes 22+ built-in checks across multiple categories:
- Missing Data: Missing values, missing patterns
- Duplicates: Duplicate rows, duplicate columns
- Data Types: Mixed types, type consistency
- Statistical: Outliers, skewness, correlation warnings
- Categorical: Cardinality, rare categories, case consistency
- Numerical: Negative values, zero inflation
- Strings: Whitespace, special characters, length outliers
- Dates: Format consistency, future dates, date range anomalies
- Multi-table: Referential integrity (foreign key validation)
Save reports in multiple formats:
# HTML report with visualizations
df.lint.report(report_format='html', output='report.html')
# JSON for programmatic access
df.lint.report(report_format='json', output='report.json')
# CSV for spreadsheet analysis
df.lint.report(report_format='csv', output='issues.csv')Extend LintData with your own validation logic:
def check_email_format(df):
"""Validate email addresses."""
warnings = []
for col in df.select_dtypes(include='object').columns:
if 'email' in col.lower():
invalid = df[~df[col].str.contains('@', na=False)]
if len(invalid) > 0:
warnings.append(f"[Email] Column '{col}': {len(invalid)} invalid emails")
return warnings
# Register and use
df.lint.register_check(check_email_format)
df.lint.report()Full documentation available at: LintData Documentation
For general help or to report bugs, please open an issue on GitHub: LintData Issues.
If you have questions or need assistance, feel free to reach out via Discord: patelheet30