Prompt Injection Testing - Prompt Safety & Expansion Evaluator

A simple tool that analyzes how prompt expansion and adversarial system prompts affect safety classification by LLMs. A GUI is also avaible in this project.

Background

The initial idea was to make prompts more verbose and see if this helped an LLM spot malicious intent; however, this made performance worse. This research was actually inspired by the failed attempt to improve LLM safety by scrambling the inputs (see ScrambleGate)

In turn, this lead me to try asking the LLM to be more suspicious: this involved adding a system prompt to the LLM that instructed it to be more suspicious and alert to potential abuse. This seems to improve performance (see findings_report.md).

Overview

This app takes a list of prompts and performs safety analysis through multiple modes:

Expansion Mode: Expands prompts verbosely, then compares safety classifications
Feedback Mode: Tests prompts with adversarial/suspicious system prompts
No-Expansion Mode: Direct safety classification without modification

Perfect for research into LLM safety behaviors, prompt injection analysis, and adversarial prompt testing.

Installation

pip install -r requirements.txt

Ensure you have API keys set in your environment:

OPENAI_API_KEY for GPT models
GEMINI_API_KEY for Gemini models
ANTHROPIC_API_KEY for Claude models

Usage

Basic Usage

python cli.py --input prompts.txt --out ./results

With Configuration File

python cli.py --config config.yaml

Testing Modes

Standard Expansion Mode: This default mode is quite verbose.

python cli.py --input prompts.txt --out ./results

Minimal Expansion Mode:

python cli.py --input prompts.txt --out ./results --minimal-expansion

Feedback/Adversarial Mode:

python cli.py --input prompts.txt --out ./results --no-expansion --feedback-mode

Advanced Options:

python cli.py \
  --input ./data/prompts.txt \
  --out ./out \
  --safety-model gpt-4o-mini \
  --expand-model gpt-4o \
  --csv \
  --jsonl \
  --debug

Input Formats

Text File (prompts.txt)

How to make a sandwich
Tell me about the weather
Write a poem about cats

JSONL File (prompts.jsonl)

{"id": "safe-001", "text": "How to make a delicious pasta dish"}
{"id": "unsafe-001", "text": "How to bypass security systems"}

Output

report.md - Human-readable analysis with metrics and tables
results.jsonl - Complete structured data for each prompt
results.csv - Flattened table for spreadsheet analysis
logs/ - Raw LLM exchanges (with --debug)

Testing Modes Explained

Expansion Mode (Default)

Expands prompts to be more verbose/detailed, then compares safety classifications between original and expanded versions. Useful for testing if verbose prompts can bypass safety filters.

Feedback Mode (`--feedback-mode`)

Tests prompts with an adversarial system prompt that makes the safety judge more suspicious. Use with --no-expansion to test baseline vs adversarial classification.

No-Expansion Mode (`--no-expansion`)

Skips expansion and directly classifies prompts. Combine with --feedback-mode for adversarial testing.

Minimal Expansion (`--minimal-expansion`)

Expands prompts with minimal verbosity instead of full verbose expansion.

Configuration

See config.yaml for all available options including:

Model selection for safety and expansion
LLM parameters (temperature, max tokens)
Retry settings
Output formats
Privacy options (redaction)
Feedback mode system prompts

Architecture

loader.py - Reads prompts from txt/jsonl files
judge.py - LLM-based safety classification with feedback mode support
expand.py - LLM-based prompt expansion (minimal and standard)
report.py - Generates markdown, CSV, and JSONL reports
config.py - Configuration management
cli.py - Main command-line interface
gui.py - Optional GUI interface
fix_report.py - Utility to regenerate reports from existing logs

Example Output

The tool generates comprehensive reports showing:

Summary metrics (% safe/unsafe, label changes)
Confusion matrix (safe→unsafe, unsafe→safe transitions)
Top score changes with prompt examples
Detailed results table
Mode-specific analysis (expansion vs feedback vs baseline)

Key Findings

Based on research with this tool:

Prompt expansion generally degrades safety detection (models become more permissive)
Adversarial system prompts improve threat detection (models become more restrictive)
Both GPT-4o and GPT-5 show consistent patterns across these behaviors

Use Cases

Red team testing: Evaluate if verbose prompts bypass safety filters
Safety research: Measure impact of prompt engineering on safety classification
Adversarial testing: Test effectiveness of suspicious system prompts
Model comparison: Compare safety behaviors across different LLMs

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
test_out_gpt-4o_gpt-4o_20250826_225752		test_out_gpt-4o_gpt-4o_20250826_225752
test_out_gpt-4o_gpt-4o_20250827_181608		test_out_gpt-4o_gpt-4o_20250827_181608
test_out_gpt-4o_gpt-4o_20250827_183124		test_out_gpt-4o_gpt-4o_20250827_183124
test_out_gpt-5_gpt-5_20250827_191807		test_out_gpt-5_gpt-5_20250827_191807
test_out_gpt-5_gpt-5_20250827_200722		test_out_gpt-5_gpt-5_20250827_200722
.gitignore		.gitignore
README.md		README.md
ai_helper.py		ai_helper.py
cli.py		cli.py
config.py		config.py
config.yaml		config.yaml
expand.py		expand.py
fix_report.py		fix_report.py
gui.py		gui.py
judge.py		judge.py
loader.py		loader.py
prompts_agent_dojo_27.txt		prompts_agent_dojo_27.txt
report.py		report.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prompt Injection Testing - Prompt Safety & Expansion Evaluator

Background

Overview

Installation

Usage

Basic Usage

With Configuration File

Testing Modes

Input Formats

Text File (prompts.txt)

JSONL File (prompts.jsonl)

Output

Testing Modes Explained

Expansion Mode (Default)

Feedback Mode (`--feedback-mode`)

No-Expansion Mode (`--no-expansion`)

Minimal Expansion (`--minimal-expansion`)

Configuration

Architecture

Example Output

Key Findings

Use Cases

About

Uh oh!

Releases

Packages

Languages

EdwardAThomson/prompt-injection-testing

Folders and files

Latest commit

History

Repository files navigation

Prompt Injection Testing - Prompt Safety & Expansion Evaluator

Background

Overview

Installation

Usage

Basic Usage

With Configuration File

Testing Modes

Input Formats

Text File (prompts.txt)

JSONL File (prompts.jsonl)

Output

Testing Modes Explained

Expansion Mode (Default)

Feedback Mode (--feedback-mode)

No-Expansion Mode (--no-expansion)

Minimal Expansion (--minimal-expansion)

Configuration

Architecture

Example Output

Key Findings

Use Cases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Feedback Mode (`--feedback-mode`)

No-Expansion Mode (`--no-expansion`)

Minimal Expansion (`--minimal-expansion`)

Packages