-
Notifications
You must be signed in to change notification settings - Fork 5
fitness functions
- Introduction
- Fitness functions
- Encodings
- Algorithms
- Genetic operators
- Stop conditions
- Metrics
- Miscellaneous
The solve
methods of the GAs expect a fitness function
as their first argument. This fitness function defines
the optimization problem that will be solved by the GA.
Fitness functions have to be implemented as a class derived
either from FitnessFunctionBase
or FitnessFunction
, overriding
their invoke
method.
There are several things that have to be considered when defining a
fitness function, all of which depend on the particular problem we
are trying to find the solution for.
The first thing that should be considered is the encoding type,
meaning how the solutions to the problem should be represented
in the population. There are several options provided by the
library, but user-defined encodings can also be used in the cases
where none of these fit the problem well enough.
The representation is determined by the gene type used in the
fitness function and in the genetic algorithm classes: the solutions
will be encoded as a vector of the specified gene type.
template<typename GeneType>
using Chromosome = std::vector<GeneType>;
The options for the gene type already provided by the library are:
- BinaryGene
- RealGene
- PermutationGene
- IntegerGene
The gene type is specified as the type parameter of the
FitnessFunctionBase
and FitnessFunction
classes, and it will
also determine the GA class that has to be used to find the optimum
of the fitness function.
Note
See encodings.md for more information about the encoding types.
The length of the chromosomes is another parameter that has to be
specified for the fitness function. Generally, the chromosome length
is going to be equal to the number of variables in the problem, but
depending on the encoding used, this will not always be true.
For example, if the fitness function uses binary-encoding, a
single variable will most likely be represented by multiple binary
genes instead of just a single one.
The chromosome length will also determine the base class that
should be used for defining the fitness function. FitnessFunctionBase
can be used for every problem, and the chromosome length is specified
in its constructor. FitnessFunction
can only be used if the chromosome
length is known at compile-time, and the chromosome length is specified
as a template parameter of the class in addition to the gene type.
template<typename GeneType>
class FitnessFunctionBase;
template<typename GeneType, size_t ChromLen>
class FitnessFunction;
The number of objectives is another important property of the fitness functions, but it doesn't have to be specified explicitly in the definition of the fitness function. Instead, it will be deduced from the legth of the fitness vectors returned by it. Note that the fitness function always returns a fitness vector, even for single-objective problems. In the single-objective case the length of the fitness vectors will simply be 1.
The library assumes that a fitness function will always return fitness vectors of the same length, which is equal to the number of objectives. This can't be changed.
The exact number of objectives is not generally relevant to the GAs, but whether it is single- or multi-objective will determine which algorithms can be used in the GAs for the particular fitness function.
Note
See algorithms.md for more information about the different algorithms.
The genetic algorithms in the library will try to find the maximum of the fitness function, which means that the fitness functions always have to be implemented for maximization. When trying to find the minimum of some function, the problem can easily be transformed into a maximization problem by multiplying the function by -1. This multiplication is not an issue as all parts of the GAs are implemented in a way so that they can handle negative fitness values. This includes operators which might not usually support negative values, such as the roulette selection operator.
As a simple example, let's say that we are trying to find the minimum
of the function f(x) = x^2
. The implementation of the fitness
function for this problem would be the following:
class XSquare : public FitnessFunction</* GeneType = */ RealGene, /* ChromLen = */ 1>
{
FitnessVector invoke(const Chromosome<RealGene>& x) const override
{
return { -x[0] * x[0] };
}
};
A fitness function for a multi-objective problem would be implemented
the same way, with the only difference being the size of the fitness vector
returned by invoke
.
There are some additional parameters of the fitness function that can be specified, but these have default values which will typically work for most problems. More complex fitness functions, however, might have to set their values differently from the defaults.
The fitness functions are, by default, assumed to always return the same fitness vector for a given chromosome passed to them as an argument. This assumption is used to prevent unnecessary fitness function calls, but it would also cause potentially incorrect fitness vectors to be assigned to some solutions if the assumption is not true.
In order to prevent this, the fitness functions have a Type
associated
with them, which can either be Static
or Dynamic
. The type of a fitness
function can be set in its constructor, with the default type being Static
.
For fitness functions where this default behaviour would be incorrect, the
value of the type
parameter in the constructor of the fitness function
should to be set to Dynamic
. This will disable any kind of caching that
might normally be used in the GAs, and cause the solutions to be evaluated
using the fitness function every time it's needed.
// Implementation of a dynamic fitness function
class MyFitnessFunction : public FitnessFunction<RealGene, 1>
{
MyFitnessFunction() : FitnessFunction(/* type = */ Type::Dynamic) {}
FitnessVector invoke(const Chromosome<RealGene>& x) const override;
};
The library allows for variable chromosome lengths to be used to encode
the solutions. However, the chromosome length parameter of the fitness
function still has to be specified even when variable length chromosomes
are used to represent the solutions.
In this case the value will only be used to generate the solutions of the
initial population (assuming that the initial population is generated instead
of being explicitly specified by the user).
Note
When using variable chromosome lengths, every part of the GAs must be able to handle them. This usually means implementing custom crossover and mutation methods, as most of the operators already implemented in the library can only be used with static chromosome lengths.
The number of times the fitness function is evaluated during a run of the GA is determined by the number of candidates in the population, and the number of generations. Naively, the number of fitness function calls during a run would be:
N = population_size * generations
While there are cases where this will really be the number of fitness function calls, such as when the fitness function is dynamic, the library generally tries to minimize the number of calls to the fitness function where possible, which means that the actual number will typically be lower than this.
By default, only a simple method is used to achieve this, with minimal overhead during the runs, but it is also possible to cache the fitnesses of the candidate solutions during a run to further reduce the number of fitness function calls. Doing this has a larger overhead though, so it's only worth doing if the fitness function is relatively expensive to evaluate.
This cache can be turned on using the cache_size
method of the GAs:
GA.cache_size(2); // cache the last 2 generations
The candidate solutions of a population are evaluated concurrently
in each generation of a run. This means that the implementation
of the invoke
method in the fitness functions must be thread-safe.
The library, in general, only assumes that the fitness values returned
by the fitness function are valid numbers (i.e. no NaN
values will
be returned by the fitness function).
Whether infinite fitness values are allowed or not depends on the selection method used in the GA. If the fitness function can return fitness vectors that contain infinite values, the selection method (or the algorithm) will have to be chosen accordingly.
Similarly, whether negative fitness values are allowed depends on the selection method used, but all selection methods implemented as part of the library are able to handle negative values without any issues.
Check the documentations of the selection methods and the algorithms for information on what fitness values they support. The default methods used support both negative and infinite values.
Next: Encodings