Evaluation on the Vulnerability of Current Generative Models

This is my semester project in EPFL, 2024 Spring. It is a pleasure to work at IVRL and to be supervised by Dr. Daichi Zhang and Prof. Sabine Süsstrunk.

I use this Google Doc to record the progress of my semester project.

Description

The rapid development of generative models has brought great changes in our daily life, such as large language models, diffusion models and even vision foundation models. However, are those models always safe enough to use? Will they cause harm to users, such as data leakage, generating biased results, or simply attacked or manipulated by attackers?

With the rapid development of Large Language Model (LLM), the concept of "jailbreak" has become popular. We would like to first study the existing attacks against LLM.
The existing Text-to-Image models (e.g. diffusion models) are often trained on biased datasets, leading to the generation of biased content. Hence, we will also work on biases inherent in the models.

In this project, we are interested in the safety and bias problems of current generative models and aim to evaluate how vulnerable they are.

Key Questions

Data Leakage: since the generatve models could access to large-scale training data as well as the user input data, will it cause data leakage when genereating results?
Biased Results: will the generated results be fair enough or just a biased view of trained generative models.
Attack by users: can we perturbate or attack target generative models to make it generating wrong or desired manipulated results?

Timeline

Week	Planned Work
1	Set up the cluster and environment required for the project.
2 & 3	Experiment extensively with existing attack methods on current models and test for effectiveness.
4 & 5	Determine the target models and technical baseline.
6 & 7	Perform preliminary implementation of existing ideas and compare the experimental results. Filter and define the final technology path.
8	Collate previous results and prepare for the midterm presentation.
April 19 Friday	Midterm presentation (11:15am BC410)
9	Count the success rate of attacks by existing models on different models.
10	Test existing diffusion models for potential bias issues.
11	Replicate existing bias mitigation approaches.
12 & 13	Attempt to propose our own solutions for bias mitigation.
14	Complete the project report.
June 07 Friday	Final report due
June 27 Thursday	Final representation

This part could be changed at any time.

Evaluated Models

In the models folder, I recorded the models used by this project for evaluation.

Dataset

In the tests for diffusion models, I used a dataset of images that I created myself.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
bias-stat		bias-stat
modules		modules
pre & report		pre & report
promptApproach		promptApproach
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation on the Vulnerability of Current Generative Models

Description

Key Questions

Timeline

Evaluated Models

Dataset

About

Releases

Packages

Languages

License

YanY-Henry/Evaluation-on-the-Vulnerability-of-Current-Generative-Models

Folders and files

Latest commit

History

Repository files navigation

Evaluation on the Vulnerability of Current Generative Models

Description

Key Questions

Timeline

Evaluated Models

Dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages