Skip to content

GenAI-Security-Project/GenAI-Data-Security-Initiative

Repository files navigation

OWASP GenAI Data Security Initiative

Part of the OWASP GenAI Security Project · Data Security Initiative Page


Overview

The OWASP GenAI Data Security Initiative addresses the data security risks unique to Large Language Models, Generative AI, and Agentic AI systems. As AI introduces new data surfaces — prompts, context windows, embeddings, vector stores, agent traces, tool payloads — and new failure modes — prompt-driven extraction, cross-session bleed, inference attacks, plugin data drains — traditional data security frameworks no longer map cleanly to what needs protection.

This initiative produces community-developed, peer-reviewed guidance to help organizations understand and address these emerging challenges. All materials are released under CC BY-SA 4.0.


Key Deliverables

GenAI Data Security Risks and Mitigations 2026 (v1.0)

📄 Download PDF · Released March 2026

A comprehensive enumeration of 21 data security risks specific to GenAI systems, each with tiered mitigations (Foundational → Hardening → Advanced) designed for organizations at different maturity levels. This is not a Top 10 — it is a structured risk taxonomy following data as it moves through a GenAI system.

Cross-referenced to the OWASP Top 10 for LLM Applications and the OWASP Top 10 for Agentic Applications 2026.

DSGAI Risk Taxonomy (21 entries)
ID Risk
DSGAI01 Sensitive Data Leakage
DSGAI02 Agent Identity & Credential Exposure
DSGAI03 Shadow AI & Unsanctioned Data Flows
DSGAI04 Data, Model & Artifact Poisoning
DSGAI05 Data Integrity & Validation Failures
DSGAI06 Tool, Plugin & Agent Data Exchange Risks
DSGAI07 Data Governance, Lifecycle & Classification for AI Systems
DSGAI08 Non-Compliance & Regulatory Violations
DSGAI09 Multimodal Capture & Cross-Channel Data Leakage
DSGAI10 Synthetic Data, Anonymization & Transformation Pitfalls
DSGAI11 Cross-Context & Multi-User Conversation Bleed
DSGAI12 Unsafe Natural-Language Data Gateways (LLM-to-SQL/Graph)
DSGAI13 Vector Store Platform Data Security
DSGAI14 Excessive Telemetry & Monitoring Leakage
DSGAI15 Over-Broad Context Windows & Prompt Over-Sharing
DSGAI16 Endpoint & Browser Assistant Overreach
DSGAI17 Data Availability & Resilience Failures in AI Pipelines
DSGAI18 Inference & Data Reconstruction
DSGAI19 Human-in-the-Loop & Labeler Overexposure
DSGAI20 Model Exfiltration & IP Replication
DSGAI21 Disinformation & Integrity Attacks via Data Poisoning

Each entry follows a consistent structure: attack scenario in GenAI-specific terms, attacker capabilities, impact, and tiered mitigations with scope annotations (Buy / Build / Both).

LLM and GenAI Data Security Best Practices 2025 (v1.0)

📄 Download PDF · Released February 2025

The companion implementation guide covering data security principles, secure deployment architectures, monitoring and auditing guidelines, governance models, and future trends. Topics include data minimization, encryption strategies, access control for LLM pipelines, securing data flows in LLM agents, and regulatory compliance alignment.


Framework Crosswalk

The initiative maintains a comprehensive crosswalk of GenAI data security risks against 20 widely recognized cybersecurity and AI frameworks. The crosswalk covers all 41 entries across the OWASP LLM Top 10 2025, Agentic Top 10 2026, and DSGAI Risk Taxonomy.

The web application provides an interactive interface for exploring crosswalk mappings, framework coverage, incident data, security tooling, implementation recipes, and a glossary of key terms.

Crosswalk source files are maintained in the /crosswalk directory.

Frameworks Covered

Cybersecurity & AI Standards — NIST CSF 2.0, NIST AI RMF 1.0, ISO/IEC 27001, ISO/IEC 42001, ISO/IEC 23894, ISO/IEC 5338

Threat Modeling & Adversarial — MITRE ATT&CK, MITRE ATLAS, STRIDE, FAIR

Application Security & Compliance — CIS Controls, ASVS, SOC 2, PCI DSS, ENISA, CycloneDX ML SBOM

Governance & Architecture — COBIT, OPENCRE, SAMM, BSIMM

OT / Industrial AI Security — ISA/IEC 62443


Workstreams

Data Collection — Public open call for real-world vulnerability data and incident reports related to LLMs and GenAI applications. Submit data through the Slack channel or by opening an issue in this repository.

Framework Crosswalk — Crosswalking the OWASP Top 10 for LLM Applications 2025 and Agentic Top 10 2026 to recognized cybersecurity and AI frameworks. See the Interactive Crosswalk Explorer and the /crosswalk directory for current crosswalk data.

Data Security Risks & Best Practices — Research, authoring, and maintenance of the initiative's two published white papers: the DSGAI risk taxonomy and the companion implementation guide. This workstream also maintains alignment with the broader OWASP GenAI Security Project deliverables.

Community Datasets — Development of open, community-contributed datasets for research, benchmarking, and security testing. See the /datasets directory for current and proposed datasets.

Data Validation — Automated and peer-reviewed validation of all contributed data. Scripts in /data_validation ensure integrity and accuracy.


Datasets

Current

  • Vulnerability Dataset — Real-world vulnerabilities affecting LLM applications
  • Exploit Dataset — Documented exploits and attack techniques targeting LLMs
  • Risk Assessment Dataset — Mapped risk assessments for various LLM deployments

Proposed Community Datasets

We are looking for contributors to help build datasets for the broader AI security community. Priority areas include:

  • GenAI Data Security Incident Database — Structured, anonymized catalog of real-world data security incidents in GenAI deployments (leakage events, cross-tenant bleed, RAG exfiltration, agent credential exposure), focused on failure modes enumerated in DSGAI01–DSGAI21.
  • Prompt Injection & Data Extraction Test Cases — Adversarial prompts and extraction techniques mapped to the DSGAI risk taxonomy for red-teaming and regression testing.
  • RAG Poisoning & Retrieval Integrity Dataset — Benign and adversarial document sets for testing vector store integrity, retrieval-time redaction, and poisoning detection.
  • Cross-Framework Control Crosswalk Dataset — Machine-readable crosswalk of DSGAI risks to controls across NIST CSF 2.0, NIST AI RMF, MITRE ATLAS, ISO/IEC 42001, and OWASP LLM Top 10 for automated compliance gap analysis.
  • Agent Data Flow & Tool Exchange Traces — Sanitized traces of agentic AI tool calls, plugin data exchanges, and delegation chains supporting research into DSGAI06 and related agent security patterns.

If you are interested in contributing, join the Slack channel or reach out directly.


AI Risk Database Collaboration

The initiative collaborates with leading AI risk authorities to consolidate efforts and avoid fragmented approaches to risk identification:

Community members are encouraged to report new GenAI data security risks to these organizations as well as to this initiative.


How to Contribute

All contributions are welcome — from security practitioners, AI engineers, researchers, compliance professionals, and anyone working to secure GenAI systems.


OWASP GenAI Security Project — Initiatives

This initiative is one of several under the OWASP GenAI Security Project:

Initiative Description Link
Agentic App Security Securing autonomous and agentic AI systems, including the Top 10 for Agentic Applications 2026 Initiative Page
AI Red Teaming & Evaluation Methodology, benchmarks, and tools for adversarial testing of GenAI systems Initiative Page
AI Security Solutions Landscape Vendor-agnostic mapping of the GenAI security tooling ecosystem Solutions Directory
AIBOM Generator Open-source tool for generating AI Bills of Materials for supply chain transparency Initiative Page
Data Security GenAI data security risks, mitigations, best practices, and framework crosswalks (this initiative) Initiative Page
Governance Checklist (COMPASS) Cybersecurity and governance checklist for LLM and GenAI deployments Resource Page
Secure AI Adoption Center of Excellence guidance for safe, ethical, and secure organizational AI adoption Initiative Page
Threat Intelligence Research into LLM-enabled exploit generation and deepfake threat preparation Initiative Page

Acknowledgments

Initiative Lead: Emmanuel Guilherme Junior

This initiative is made possible by the contributions of its authors, contributors, and reviewers from across the global AI security community. Thank you to everyone who has helped build and shape this community resource. Full contributor lists are included in each published document.


License

All materials produced by this initiative are licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).

You are free to share and adapt the material for any purpose, including commercial, under the following terms: provide appropriate attribution including the project name and asset name, and distribute any derivative works under the same license.

About

GenAI Data Security Initiative - repo

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors