Skip to content

WithSecureLabs/llama-3-prompt-injection-fine-tuning

Repository files navigation

Context:

These scripts are intended for the creation a training dataset to fine-tune an LLM nad make it resistant to prompt injection attacks, as described on WithSecure labs research blog: This was the output of research described in this WithSecure Labs article: https://labs.withsecure.com/publications/llama3-prompt-injection-hardening..

For further details see this TogetherAi API documentation.

For examination of the model created from this experiment see this link to view the WithSecure Huggingface profile.

Training - Usage guide [Simple]:

For an easy use, formatted for TogetherAi Llama3.1-8b:

Populate your emails.jsonl file with email "text", related "instruction" and desired "output"

Populate your Breakouts.txt file with the desired prompt-injection breakout text, placement of the malicious prompt should be marked by [XXX].

Populate Prompts.txt with your desired malicious prompts, making sure these can easily be examined for 'canary' values.

Execute Run.py, to run the following scripts automatically.

Training - Usage guide [Custom]:

The purposes of each provided python script are as follows:

Combine.py - combine each possible combination of breakout and prompt injection input - Output.txt is created containing -ALL- of these combinations.

Compile.py - randomly compile each prompt created from combine.py into an email payload. - this script will randomly decide if an email is selected to have a payload added. - the payload is added either, in the middle, at the end or at the end following two line breaks; for greater variety of injections. - dataset.jsonl is created.

Prep.py - renames the section labells and shuffles the order of your dataset. - adds data tags and the context question to the dataset. - datasetReady.jsonl is created.

Format.py - adds the system prompt and formats the dataset to suit the Llama3.1-8b formatting reqauired for fine-tuning with togetherAi - FormattedFinal.jsonl is created.

Calc.py - to calculate the dataset size, number of samples and details. - ensure the size of your dataset is appropriate before beginning training.

Validate.py - to ensure correct Together pip formatting (pass = good)

Testing - Usage guide:

Ensure a portion of the training dataset created is removed from your file, this is necessary for testing purposes. Your testing dataset should be saved under TESTselection.jsonl.

Compare.py - to cycle through and compare outputs from your base and fine-tuned models. - Ensure canary_words is upto date, warning: this variable may contain profanity. - TestOutputs.jsonl is created.

Examine.py - to examine the contents of output.jsonl file via excel for data collection purposes.

Notes:

Together PiP package is required.

LangDetect PiP package is required.

Never hard-code your API keys.

Always check foreign scripts before running them.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages