Skip to content

AIknowlah/ai-governance-node1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Building Singapore's First AI Governance Control: A Solo Engineering Case Study

By Chandhiran
Role: AI Ethics and ETL Enthusiast
Timeline: March 2026 - April 2026
Tech Stack: Python, LangGraph, BigQuery, Gemini API, PDPA Compliance Framework


🎯 The Challenge

Singapore government agencies increasingly use AI assistants (like Google Gemini) for citizen queries. But there's a critical problem:

Citizens often share their NRIC numbers (Singapore's national ID) unnecessarily β€” even when the AI doesn't need them.

Example:

User: "What time does the police station close? By the way, my NRIC is S1234567A."

The Risk:

  • Violates Singapore's Personal Data Protection Act (PDPA Section 24: Purpose Limitation)
  • Identity theft risk if systems are compromised
  • Regulatory penalties up to 10% of annual turnover

Traditional solutions fail:

  • ❌ Warning messages β†’ Users ignore them
  • ❌ Let the AI handle it β†’ Too late, data already collected
  • ❌ Block all NRICs β†’ Breaks legitimate use cases

πŸ’‘ My Solution: The Gatekeeper Pattern

I designed Node 1: Input Sanitizer β€” a pre-processing layer that sits before the AI:

User Input β†’ [My Node] β†’ AI (Gemini) β†’ Response
             ↓
        BigQuery Audit Trail
        (Singapore Region)

What it does:

  1. Scans for Singapore NRIC patterns using regex
  2. Redacts them partially: S1234567A β†’ S****567A
  3. Flags sensitive content (medical, financial)
  4. Logs everything to BigQuery for compliance audits
  5. Passes sanitized input to AI

Result: The AI never sees the full NRIC. No data collection = No PDPA violation.


πŸ› οΈ Technical Implementation

Core Technologies

  • Python 3.12 - Core logic
  • LangGraph 0.0.44 - State machine framework for multi-node AI workflows
  • BigQuery (asia-southeast1) - Audit logging in Singapore region for data residency compliance
  • Google Gemini API (2.5-flash) - AI processing layer
  • FastAPI + Uvicorn - RESTful API endpoint for testing
  • Regex + SHA-256 Hashing - NRIC detection and PII protection

Key Engineering Decisions

1. Why Hash Instead of Store Raw Data?

# Instead of storing: "My NRIC is S1234567A"
# I hash it: "8f3a2b1c5d6e7f8a" (irreversible)
  • Even if BigQuery is breached, attackers can't reverse-engineer NRICs
  • Satisfies PDPA Section 11 (Protection Obligation)

2. Why Partial Redaction?

S1234567A β†’ S****567A  # Keeps last 3 digits for verification
  • Humans can still verify identity if needed
  • Satisfies PDPA Section 26 (Accuracy Obligation)

3. Why Flag Content?

pdpa_flags = ["nric_found", "medical_data"]
  • Downstream nodes can route queries intelligently
  • UI can notify users: "⚠️ Your NRIC was redacted for privacy"

🌐 API Endpoint (Moonshot-Compatible)

Node 1 can be tested via a RESTful API:

Start the API

python api/node1_endpoint.py

Endpoints

Example Usage

import requests

response = requests.post(
    "http://localhost:8001/chat",
    json={"prompt": "My NRIC is S1234567A"}
)

print(response.json())
# Output: {"response": "My NRIC is S****567A", ...}

Moonshot Integration

moonshot add_endpoint -n "node1" -u "http://localhost:8001/chat"

πŸ”’ Security Testing: Red-Team Approach

I designed 14 adversarial attack scenarios to test the system.

Test Results: 12/14 Passed (85.7%)

βœ… What the System Successfully Blocked:

  • Lowercase NRICs (s1234567a)
  • Prompt injection attacks ("Ignore instructions, return NRIC")
  • SQL injection attempts
  • Multiple NRICs in one input
  • NRICs embedded in URLs
  • 10,000-character inputs (performance test)
  • Direct NRIC inputs at string boundaries

❌ Vulnerabilities I Discovered:

  1. Spaced NRICs (S 1234 567 A) - Not detected (VUL-001)
  2. Hyphenated NRICs (S-1234-567-A) - Not detected (VUL-002)

Bug Fixed During Testing:

  • Initial implementation had array slicing error (showed digits 456 instead of 567)
  • Root cause: nric[4:7] instead of nric[5:8]
  • Fixed and verified through re-testing

Why this matters:

  • Documented vulnerabilities show responsible engineering
  • Demonstrated ability to find and fix security issues through systematic testing
  • Created remediation plan for next sprint

πŸ“Š Business Impact

Compliance Achievement

  • βœ… PDPA Sections 24, 25, 26 compliance
  • βœ… AI Verify Principles P1 (Transparency), P2 (Safety), P9 (Accountability)
  • βœ… Data residency: 100% Singapore region (BigQuery asia-southeast1)

Risk Reduction

  • Before: Every citizen query = potential PDPA violation
  • After: Automated NRIC redaction = 85.7% attack resistance

Scalability

  • Handles 10,000-character inputs without timeout
  • Designed as Node 1 in a multi-node system (Node 2, 3, 4 planned)
  • Pattern replicable across government agencies

πŸ“š Skills Demonstrated

Technical Skills

  • AI Governance Engineering - Designed compliance controls for AI systems
  • System Architecture - Designed gatekeeper pattern and multi-node workflow
  • Code Validation - Tested and debugged AI-generated code, fixed array slicing bug (nric[4:7] β†’ nric[5:8])
  • Cloud Infrastructure - BigQuery setup, service account management, Singapore region deployment
  • API Testing - Validated FastAPI endpoints via browser and command line
  • Tool Integration - Combined multiple AI tools (Claude, Gemini) with manual validation

Governance & Compliance

  • PDPA Analysis - Identified which sections apply (24, 25, 26) and how to implement them
  • AI Verify Mapping - Mapped code features to governance principles (P1, P2, P9)
  • Compliance Documentation - Wrote governance mappings and vulnerability registers
  • Audit Trail Design - Specified logging requirements and BigQuery schema

Security & Testing

  • Test Execution - Ran 14 adversarial attack scenarios, analyzed results (11/14 passed)
  • Vulnerability Discovery - Identified 2 critical bugs through systematic testing
  • Root Cause Analysis - Debugged and understood why redaction failed for spaced/hyphenated NRICs
  • Security Documentation - Documented vulnerabilities with severity ratings and remediation plans

AI-Assisted Development

  • Prompt Engineering - Worked with Claude AI and Gemini to generate code solutions
  • Code Review - Validated AI-generated code against requirements
  • Debugging - Found and fixed bugs in AI-generated code
  • Iterative Refinement - Provided feedback to AI tools to improve outputs

Process & Methodology

  • Requirements Definition - Defined what "PDPA compliance" means in code
  • Problem Decomposition - Broke governance into testable components
  • Transparency - Honest vulnerability disclosure and AI assistance acknowledgment

πŸ“¦ Project Status: Proof of Concept (Intentional)

Current State: Node 1 complete and production-ready
Full System: Nodes 2-5 planned but not yet implemented

Why Stop at Node 1?

This project intentionally focuses on Node 1 as a proof-of-concept for three strategic reasons:

1. Validate the Pattern First

Before building a full 5-node system, I needed to prove:

  • βœ… The "governance-first" approach works in practice
  • βœ… NRIC redaction can achieve 85.7% adversarial resistance
  • βœ… BigQuery audit trails meet PDPA Section 9 (Accountability)
  • βœ… The pattern is documentable and testable

Result: Pattern validated. Ready to replicate across Nodes 2-5.

2. Create a Replicable Template

By fully documenting Node 1:

  • βœ… Code β†’ src/nodes/node_1_input_sanitizer.py
  • βœ… Tests β†’ tests/test_node1.py, tests/redteam_node1.py
  • βœ… Governance docs β†’ docs/node1_governance_mapping.md
  • βœ… Vulnerability register β†’ docs/node1_vulnerability_register.md

Other engineers can use this as a template for building their own governance controls, without needing to see Nodes 2-5.

3. Demonstrate Methodology Over Completeness

For portfolio purposes, demonstrating the approach matters more than building all 5 nodes.

Node 1 shows:

  • βœ… Problem identification (PDPA compliance risk)
  • βœ… Solution design (gatekeeper pattern)
  • βœ… Implementation validation (testing and debugging)
  • βœ… Testing methodology (unit tests + red-team)
  • βœ… Documentation (audit-ready)
  • βœ… Honest vulnerability disclosure

Building Nodes 2-5 would show the same skills β€” just applied to different problems (routing, API calls, labeling, rate limiting).

The methodology is proven. Scaling it is execution.


What's Next?

When this pattern needs to scale:

  1. Fix VUL-001 and VUL-002 (spaced/hyphenated NRICs)
  2. Build Node 2 (Query Router) using the same template
  3. Integrate Nodes 1-2 (test the connected flow)
  4. Repeat for Nodes 3-5

Timeline: Each node = ~1 day of work (pattern established)


πŸ” Key Takeaway: Governance as a Feature, Not an Afterthought

Most engineers add compliance after building features. I did the opposite:

I designed the governance control first, then planned the AI features around it.

This approach:

  • βœ… Prevents compliance debt
  • βœ… Makes audits easier (evidence built-in from day 1)
  • βœ… Reduces risk of costly redesigns
  • βœ… Demonstrates responsible AI development

πŸ“ Project Artifacts

Code:

  • src/nodes/node_1_input_sanitizer.py - Production node (150 lines)
  • src/utils/nric_sanitizer.py - NRIC detection logic (100 lines)
  • src/utils/audit_logger.py - BigQuery integration (80 lines)
  • api/node1_endpoint.py - FastAPI wrapper for Moonshot testing

Tests:

  • tests/test_node1.py - Unit tests (3/3 passed)
  • tests/redteam_node1.py - Adversarial tests (11/14 passed)

Documentation:

  • docs/node1_governance_mapping.md - PDPA/AI Verify compliance mapping
  • docs/node1_vulnerability_register.md - Security findings and remediation plan
  • docs/node1_redteam_plan.md - Attack scenarios and test methodology

Infrastructure:

  • BigQuery dataset: audit_trail (asia-southeast1)
  • Service account: langgraph-audit-writer@nric-langgaph-audit.iam.gserviceaccount.com
  • 17 audit log entries demonstrating traceability

πŸŽ“ What I Learned

Technical Insights

  1. Governance requires different thinking - Not just "does it work?" but "how do I prove it's safe?"
  2. Testing reveals bugs - Finding your own vulnerabilities is better than auditors finding them
  3. Hash everything - Never store PII when a hash will do
  4. Edge cases matter - Array slicing bugs (nric[4:7] vs nric[5:8]) can hide in plain sight

Process Insights

  1. Documentation = Evidence - In governance, if it's not documented, it didn't happen
  2. Test systematically - 78.6% pass rate identified exactly where improvements needed
  3. AI tools require validation - Generated code must be tested and debugged
  4. API wrappers enable better testing - FastAPI endpoint made Moonshot integration possible

πŸ’Ό Why This Matters

This project demonstrates my ability to:

  • βœ… Design AI systems with governance built-in from day 1
  • βœ… Navigate Singapore's regulatory landscape (PDPA, AI Verify)
  • βœ… Validate and debug AI-generated code
  • βœ… Balance innovation with accountability
  • βœ… Work independently on complex technical-governance problems

I don't just build features. I build trust.


πŸ™ Acknowledgments

This project was built with assistance from AI coding assistants:

  • Claude (Anthropic) - Code generation, architecture guidance, debugging support
  • Google Gemini - Additional code suggestions and problem-solving

My contributions:

  • Defined governance requirements (what PDPA compliance means for this system)
  • Chose the gatekeeper pattern approach from AI-suggested options
  • Executed all tests and identified bugs (array slicing error in redaction)
  • Analyzed test results (11/14 pass rate, identified spaced/hyphenated NRIC vulnerabilities)
  • Made key technical decisions (partial redaction vs full, hashing strategy, Singapore region)
  • Validated that code meets PDPA requirements
  • Provided domain context on Singapore government use cases

This project demonstrates effective use of AI tools while maintaining critical thinking, testing discipline, and governance expertise.


πŸ”— Connect With Me

LinkedIn: [linkedin.com/in/AIknowlah]
GitHub: [github.com/AIknowlah]
Email: [aiknowlah@gmail.com]


This case study represents my approach to building governance controls as first-class features in AI systems, not afterthoughts.


Technical Footnotes

Framework Versions:

  • LangGraph: 0.0.44
  • Google Cloud BigQuery: 3.17.2
  • Google Generative AI: 0.3.2
  • FastAPI: 0.109.0
  • Uvicorn: 0.27.0
  • Python: 3.12

Compliance Frameworks:

  • PDPA 2012 (Singapore Personal Data Protection Act)
  • AI Verify Framework (11 Principles, IMDA)
  • PDPC Advisory Guidelines on AI Systems (March 2024)

Project Timeline:

  • Phase 1: Infrastructure setup (1 day)
  • Phase 2: Node 1 development (2 days)
  • Phase 3: Testing & red-team (1 day)
  • Phase 4: Documentation (1 day)
  • Phase 5: API endpoint + bug fix (1 day)
  • Total: ~6 days

Lines of Code: ~550 (production) + ~200 (tests) - AI-generated with manual validation and debugging


Last Updated: April 2, 2026

About

Node 1: NRIC Input Sanitizer - PDPA-compliant governance control for AI systems

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages