Context:

These scripts are intended for the creation a training dataset to fine-tune an LLM nad make it resistant to prompt injection attacks, as described on WithSecure labs research blog: This was the output of research described in this WithSecure Labs article: https://labs.withsecure.com/publications/llama3-prompt-injection-hardening..

For further details see this TogetherAi API documentation.

For examination of the model created from this experiment see this link to view the WithSecure Huggingface profile.

Training - Usage guide [Simple]:

For an easy use, formatted for TogetherAi Llama3.1-8b:

Populate your emails.jsonl file with email "text", related "instruction" and desired "output"

Populate your Breakouts.txt file with the desired prompt-injection breakout text, placement of the malicious prompt should be marked by [XXX].

Populate Prompts.txt with your desired malicious prompts, making sure these can easily be examined for 'canary' values.

Execute Run.py, to run the following scripts automatically.

Training - Usage guide [Custom]:

The purposes of each provided python script are as follows:

Combine.py - combine each possible combination of breakout and prompt injection input - Output.txt is created containing -ALL- of these combinations.

Compile.py - randomly compile each prompt created from combine.py into an email payload. - this script will randomly decide if an email is selected to have a payload added. - the payload is added either, in the middle, at the end or at the end following two line breaks; for greater variety of injections. - dataset.jsonl is created.

Prep.py - renames the section labells and shuffles the order of your dataset. - adds data tags and the context question to the dataset. - datasetReady.jsonl is created.

Format.py - adds the system prompt and formats the dataset to suit the Llama3.1-8b formatting reqauired for fine-tuning with togetherAi - FormattedFinal.jsonl is created.

Calc.py - to calculate the dataset size, number of samples and details. - ensure the size of your dataset is appropriate before beginning training.

Validate.py - to ensure correct Together pip formatting (pass = good)

Testing - Usage guide:

Ensure a portion of the training dataset created is removed from your file, this is necessary for testing purposes. Your testing dataset should be saved under TESTselection.jsonl.

Compare.py - to cycle through and compare outputs from your base and fine-tuned models. - Ensure canary_words is upto date, warning: this variable may contain profanity. - TestOutputs.jsonl is created.

Examine.py - to examine the contents of output.jsonl file via excel for data collection purposes.

Notes:

Together PiP package is required.

LangDetect PiP package is required.

Never hard-code your API keys.

Always check foreign scripts before running them.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Breakouts.txt		Breakouts.txt
FormattedFinal.jsonl		FormattedFinal.jsonl
LICENSE		LICENSE
Prompts.txt		Prompts.txt
README.md		README.md
Run.py		Run.py
TESTselection.jsonl		TESTselection.jsonl
calc.py		calc.py
combine.py		combine.py
compare.py		compare.py
compile.py		compile.py
dataset.jsonl		dataset.jsonl
datasetReady.jsonl		datasetReady.jsonl
emails.jsonl		emails.jsonl
examine.py		examine.py
format.py		format.py
output.jsonl		output.jsonl
output.txt		output.txt
prep.py		prep.py
processed_emails.xlsx		processed_emails.xlsx
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context:

Training - Usage guide [Simple]:

Training - Usage guide [Custom]:

Testing - Usage guide:

Notes:

About

Releases

Packages

Languages

License

WithSecureLabs/llama-3-prompt-injection-fine-tuning

Folders and files

Latest commit

History

Repository files navigation

Context:

Training - Usage guide [Simple]:

Training - Usage guide [Custom]:

Testing - Usage guide:

Notes:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages