KAIST 2020 Fall CS454 Artificial Intelligence Based Software Engineering
Authorized Jongchan Park
Authorized Jungyeon Jeon
Authorized Donghwan Kim
Authorized Seungil Lee
Generating testdata is necessary for debugging, but test data generation is time-consuming and annoying process. We tried to generate test data automatically using parallel genetic algorithm.
Test Data Generation Using Parallel Genetic Algorithm
Genetic Algorithm (GA) is a search heuristic that is inspired by Charles Darwin’s theory of natural evolution.
GA can explore larger search region thanks to crossover and mutation. Nevertheless, GA has some disadvantages alike other search-based algorithms: it has a tendency to converge to local optima.
To overcome the weakness, we introduce new variant of genetica algorithm, which we named parallel genetic algorithm (PGA). It has a tree-like architecture.
This is pseudocode of PGA.
Line 10 to 15 : perform GA parallel n times.
Line 16 to 22 : perform inter-crossover, so each population can share information with others.
For memory and calcuation time, we limit the max number of populations in one generation as k. Pruning pick best k populations among one generation.
-
Generate instance of evaluator
evaluator = Tester.instance()
-
Condition initialize
evaluator.reset(argnum, max_value, condition_range, error_rate, correction_range)
There are 5 arguments to control error conditions. You can see the detail explanation in code.
-
Run experiment
evaluator.run(input)
For experiment, we use evaluator from genetic_CIT. We experiment the performance of PGA compared to GA in terms of population size and time. We run PGA and GA until 80,000 population size, and 400 seconds. We execute 5 times and show the average for experiments about population size, and one time for experiments about time. Also, experiment evaluator without correction range and evaluator with a correction range; error region 0 to 3 in 70% for parameter 0 to 2, error region 3 to 6 in 70% for parameter 0 to 2, and error region 6 to 9 in 70% for parameter 0 to 2.
These are results in terms of population size. Left one is with correction range, and right one is without.
These are results in terms of elapsed time. Also, Left one is with correction range, and right one is without.
We implement both GA and PGA to compare the performance.
-
Clone this repository:
git clone https://github.com/ChoiIseungil/CS454Project.git cd CS454Project
-
Experiment environments:
correction range for evaluator
save performance with population size or with time in GA
save performance with population size or with time in PGA
- Hyper parameters
python main.py -p True -m 0.1 -n 3 -l 100 -c 15 -r 0.5
arguments when running program
-
-p: "True" for PGA and "False" for GA (default = "False")
-
-m: mutation rate (default = 0.05)
-
-n: arg_num for evaluator (default = 5)
-
-l: max_value for evaluator (default = 20)
-
-c: condition_range for evaluator (default = 5)
-
-r: error_rate for evaluator (default = 0.3)