Fiscal Patriots

Team (Fiscal Patriots)

Name	Role	Bio
Khaled Alkurd	Team Lead	GMU Spring '26 Accounting and Business Analytics, Alteryx, Tableau, R
Pranavi Doodala	Project Manager, ML Lead	GMU Spring '26 Business Analytics; Data Mining, Predictive Modeling, Project Management
Mariam Debas	Visualization Lead	GMU Spring '26 Accounting, Data Analytics, Tableau, R
Nikita Chandrasing	Product Manager	GMU Fall '25 Business Analytics and MIS; Data Mining, Data Visualization, Product Design
Andy Yaro	Product Developer	GMU Fall '28 Cybersecurity Engineering; python, git, aws

AuditsMadeSimple • AGA Datathon 2026

A public-facing educational platform helping citizens understand federal financial assistance and audit data.

🌐 Live Website: https://gmufiscalpatriots.bytechisel.com

📊 Presentation: View Presentation on Canva

▶️ Video Walkthrough: Watch on YouTube

Development Tools

Category	Tools	Used For
Data Processing	Alteryx, Python	Data cleaning, merging, Audit Health Score pipeline
Visualization	Tableau Public	Interactive dashboards embedded on the platform
Machine Learning	scikit-learn	HistGradientBoosting classifier, permutation importance
Interpretability	SHAP	Feature-level impact analysis for model transparency
Website	HTML, CSS, AWS	Static site hosting and deployment
AI Assistance	Claude, Google Gemini	Code generation assistance and iterative development

Python Libraries

Library	Purpose
`scikit-learn`	Model training, evaluation (ROC-AUC), permutation importance
`pandas`	Data cleaning, feature engineering, entity-year aggregation
`numpy`	Numerical operations and array handling
`shap`	SHAP value analysis for model interpretability
`matplotlib`	Feature importance and performance visualization

Key Findings

$8.58 trillion in federal financial assistance distributed (FY2019–2024)
Top 10 states received 50.9% of all federal grants
California alone received $1.05 trillion (12% of total)
HHS and USDA account for 67% of all federal assistance
57,448 entities required to submit Single Audits (FY2016–2024)
16,300+ entities had at least one audit finding

Why This Exists

Federal funding data is large, fragmented, and hard to interpret quickly. Audit data is even harder because it uses specialized terminology and the impact is not always obvious. Our goal is to make these systems approachable for everyone.

This project helps users:

Explore funding patterns by geography and time
Learn audit terminology with plain-language definitions
Follow guided examples (case studies and next steps) to investigate audit data more deeply

Key Features of AuditsMadeSimple

1) Interactive Exploration (Tableau)

Dashboards that highlight:

Funding distribution across states
Funding trends over time
Major funding sources by federal agency
Higher-risk entities with funding context

2) Financial Literacy Glossary

A glossary translating common audit and spending terms into short, usable definitions.

3) Dataset Guide

Plain descriptions of USAspending, FAC, and SAM: what each tracks, what each misses, and how to use them responsibly.

4) Guided Engagement (Case Studies + What’s Next)

Short walkthroughs and a checklist-style “What’s next?” page that keeps visitors engaged after their first chart.

Integrated Data Sources

We connect official public sources using UEI (Unique Entity Identifier) whenever possible.

Source	What It Tracks
USAspending.gov	Federal spending and award transactions: who received funds, how much, and from which agency
FAC.gov	Single Audit submissions for non-federal entities expending $750K+ in federal awards
SAM.gov	Governmentwide exclusions: debarments, suspensions, and other exclusion actions

Repo Structure (Main Branch)

deliverables/   Submission artifacts (dashboards, slides, report, video)
webapp/         Website source (HTML pages + assets)
data/           Clean outputs (CSV + Hyper) organized by domain
pipeline/       Alteryx workflows + pipeline notes
docs/           Data dictionaries + appendix hubs + team/competition docs
models/         ML model notebooks, outputs, and screenshots

Methodology Screenshots (Alteryx)

These live in: docs/appendix_hubs/methodology/screenshots/alteryx/

USAspending transformations

USAspending transaction-level data was transformed into recipient-level summaries for each fiscal year (2019–2024), then combined for cumulative analysis. Outputs were exported to CSV and Tableau Hyper formats for dashboard integration.

FAC Master Clean

Four FAC tables (General, Findings, Corrective Action Plans, Federal Awards) were joined and aggregated to entity level by auditee_uei, producing 57.4K clean audited entity records. Outputs preserve audit flags, finding counts, and federal expenditure amounts for risk scoring and USAspending integration.

Audit Health Score construction

The Audit Health Score was calculated for each entity using weighted risk factors including going concern (25 pts), material weakness (20 pts), repeat findings (15 pts), and significant deficiencies (10 pts), with mitigating factors applied. Entities were tiered into Red, Yellow, and Green categories to support dashboard visualizations and funding analysis.

FAC + USAspending Merge

FAC audit data was joined to USAspending financial assistance records on UEI to link audit findings with federal funding received. This merge enables analysis of how much taxpayer money flowed to entities with material weaknesses, repeat findings, or going concern flags.

SAM + FAC Merge

SAM exclusion data was joined to FAC audit records on UEI to identify entities with both exclusion history and audit findings. This cross-reference supports transparency gap analysis of debarred or suspended entities.

SAM Exclusion Cleaning

SAM exclusion records were split by UEI availability (38K with UEI, 120K legacy records without), with date parsing to calculate exclusion duration and active status. Outputs support transparency gap analysis and cross-referencing of excluded entities against federal award recipients.

Methodology Screenshots (Tableau)

These live in: docs/appendix_hubs/methodology/screenshots/tableau/

Funding Distribution & Trends

These visualizations show where federal financial assistance flows geographically and how funding levels changed over the six-year period. The Top 10 states account for over half of all federal grants, with California alone receiving 12% of the national total.

Audit Oversight Signals

These maps highlight where audit findings are concentrated relative to funding received. Material weakness rates vary significantly by state, revealing geographic patterns in internal control quality.

Risk Tiering

This visualization shows the relationship between Audit Health Score and federal funding received. Red-tier entities received disproportionately more funding than Green-tier entities with cleaner audit records.

ML Model & Predictive Findings

We built a predictive model that acts as an early warning signal. It flags organizations that are more likely to have audit findings next year (t+1), using what we already know this year (t). The goal is to help oversight teams prioritize review and support when resources are limited.
Full case study: https://gmufiscalpatriots.bytechisel.com/ml_model_casestudy.html

Data and setup

Data source: Federal Audit Clearinghouse (FAC), audit year files from 2019 to 2022.
Grain: one row per entity per audit year (entity-year panel). 110,739 total observations across 56,333 unique entities.
Entity key: Employer Identification Number (EIN), since Unique Entity Identifier (UEI) has many placeholders in earlier years.
Label: whether the entity has any findings in the next year (t+1), created by shifting outcomes forward within each entity.
Train/test split: time-based. Trained on 2019–2020, tested on 2021 to prevent data leakage.

Model and performance

Model: Histogram-based Gradient Boosting (HistGradientBoostingClassifier), from scikit-learn.
Primary metric: ROC-AUC = 0.7656 — the model correctly ranks a risky entity above a non-risky one 77% of the time.
Secondary metric: PR-AUC = 0.5439 (3x the random baseline of 0.18), reflecting strong positive-class concentration despite class imbalance in audit findings.

What the model learned

We used permutation feature importance and SHAP (SHapley Additive Explanations) to keep results explainable. Top signals included:

prior findings indicators (whether findings happened, and how many)
total federal expenditures
breadth across agencies (distinct agencies)
award structure (direct award lines)

SHAP analysis confirmed that prior findings are the dominant predictor, with high feature values pushing predictions strongly toward positive (finding expected). This directional insight informed the weighting used in the Audit Health Score.

Figures

How to Explore the Project

Visit the live site: https://gmufiscalpatriots.bytechisel.com
Open the Tableau dashboards (linked from the site)
Read the glossary to understand audit terminology
Use “What’s next?” to follow guided questions and case studies

Disclaimer

This project is for educational and public understanding purposes. Audit and exclusion signals require context. The presence of a finding or a higher-risk tier is not proof of wrongdoing. Always validate conclusions using primary documentation, program context, and appropriate investigative standards.

Fiscal Patriots • AuditsMadeSimple • AGA Datathon 2026
George Mason University • Costello College of Business

Provide feedback

Saved searches