This is my semester project in EPFL, 2024 Spring. It is a pleasure to work at IVRL and to be supervised by Dr. Daichi Zhang and Prof. Sabine Süsstrunk.
I use this Google Doc to record the progress of my semester project.
The rapid development of generative models has brought great changes in our daily life, such as large language models, diffusion models and even vision foundation models. However, are those models always safe enough to use? Will they cause harm to users, such as data leakage, generating biased results, or simply attacked or manipulated by attackers?
-
With the rapid development of Large Language Model (LLM), the concept of "jailbreak" has become popular. We would like to first study the existing attacks against LLM.
-
The existing Text-to-Image models (e.g. diffusion models) are often trained on biased datasets, leading to the generation of biased content. Hence, we will also work on biases inherent in the models.
In this project, we are interested in the safety and bias problems of current generative models and aim to evaluate how vulnerable they are.
- Data Leakage: since the generatve models could access to large-scale training data as well as the user input data, will it cause data leakage when genereating results?
- Biased Results: will the generated results be fair enough or just a biased view of trained generative models.
- Attack by users: can we perturbate or attack target generative models to make it generating wrong or desired manipulated results?
Week | Planned Work |
---|---|
1 | Set up the cluster and environment required for the project. |
2 & 3 | Experiment extensively with existing attack methods on current models and test for effectiveness. |
4 & 5 | Determine the target models and technical baseline. |
6 & 7 | Perform preliminary implementation of existing ideas and compare the experimental results. Filter and define the final technology path. |
8 | Collate previous results and prepare for the midterm presentation. |
April 19 Friday | Midterm presentation (11:15am BC410) |
9 | Count the success rate of attacks by existing models on different models. |
10 | Test existing diffusion models for potential bias issues. |
11 | Replicate existing bias mitigation approaches. |
12 & 13 | Attempt to propose our own solutions for bias mitigation. |
14 | Complete the project report. |
June 07 Friday | Final report due |
June 27 Thursday | Final representation |
- This part could be changed at any time.
In the models folder, I recorded the models used by this project for evaluation.
In the tests for diffusion models, I used a dataset of images that I created myself.