In the context of Industry 4.0, Supply Chain Management (SCM) faces challenges in adopting advanced optimization techniques due to the "black-box" nature of AI-based solvers 🤖. This complexity often leads to reluctance among company stakeholders. In response, this study employs an Interpretable Artificial Intelligence (IAI) approach called ELDT, which combines evolutionary computation with reinforcement learning to create interpretable decision-making policies in the form of decision trees (DT) 🌳.
This interpretable solver is integrated into a simulation-based optimization framework specifically designed to manage the inherent uncertainties and stochastic behaviors of modern supply chains 🔄. To our knowledge, this represents the first attempt to combine IAI solvers with simulation-based optimization for SCM decision-making.
The methodology has been tested on two supply chain optimization problems: one fictional (make-or-buy) and one from the real world (Hybrid Flow Shop Scheduling, or HFS) ⚙️. Its performance has been compared against widely used algorithms. The results indicate that the interpretable approach not only delivers competitive performance but, in some cases, even outperforms traditional methods 📈. This challenges the common belief that there is a trade-off between interpretability and optimization efficiency.
Furthermore, the developed framework shows strong potential for use in industry, offering seamless integration with various Python-based solvers 🐍.
Before getting started, make sure you have installed all the requirements.
pip install -r requirements.txt
The optimization algorithms we implemented are divided into two categories based on how they handle the input list of customer orders. Schedule-as-a-whole approaches consider the entire list of orders collectively. In contrast, policy-generating approaches concentrate on training a policy that processes each order individually, making decisions based on the specific features of that order.
For the make-or-buy decision problem, the RS baseline generates a random binary vector x of length n_iterations
, the algorithm tracks and updates the highest-revenue solution.
For the HFS task, the RS baseline generates random job permutations and evaluates each over n_iterations
, tracking the best-performing solution based on the revenue achieved.
python optimization_random_search.py --n_iterations <n_iterations> --out_dir <output_directory> --no_runs <no_runs>
⚠️ Note: For the make-or-buy decision problem, the list of orders is provided in an Excel document. Please ensure that the "population_orders" in the AnyLogic model is correctly referencing the appropriate Excel file.
python optimization_random_search.py --n_iterations <n_iterations> --out_dir <output_directory> --no_runs <no_runs> --m <M> --e <E> --r <R> --dataset <dataset_path>
For the HFS problem we include in our experimental evaluation a simple deterministic greedy heuristic, which prioritizes jobs based on their due dates. Specifically, jobs with the earliest due dates are given the highest priority.
python optimization_heuristic.py --out_dir <output_directory> --m <M> --e <E> --r <R> --dataset <dataset_path>
Optuna is an open-source optimizer that employs state-of-the-art algorithms for solving optimization problems. It includes a sampler, which generates trial configurations to explore promising solution regions, and a pruner, which stops unpromising trials. By default, Optuna uses the Tree-structured Parzen Estimator for sampling and the Median Pruner for early stopping.
In the make-or-buy decision problem, OPTUNA generates a binary vector x of length n_trials
trials.
In the HFS task, OPTUNA is executed for n_trials
trials, where each iteration assigns a priority value between
python optimization_optuna.py --n_trials <n_trials> --out_dir <output_directory> --no_runs <no_runs>
⚠️ Note: For the make-or-buy decision problem, the list of orders is provided in an Excel document. Please ensure that the "population_orders" in the AnyLogic model is correctly referencing the appropriate Excel file.
python optimization_optuna.py --n_trials <n_trials> --out_dir <output_directory> --no_runs <no_runs> --m <M> --e <E> --r <R> --dataset <dataset_path>
We developed GA using the structure provided by the open-source inspyred Python library.
For the make-or-buy task, each individual candidate solution, denoted as x, is represented by a fixed-length binary genotype, where the length corresponds to the total number of orders. Each gene in x encodes a decision for a specific order: either "make" (0) or "buy" (1). The fitness of an individual is determined by the revenue ga_psize
randomly generated individuals. Following the default setting of inspyred, our GA employs rank selection, one-point crossover, bit-flip mutation. Rank selection indicates that individuals are ranked based on their fitness and higher-ranked individuals have a higher chance of being selected for reproduction. One-point crossover is a popular crossover technique. Given two parent individuals, a single crossover point is randomly chosen along the length of the chromosome, and the offspring inherit the genetic material from one parent up to that point and from the other parent after it. The operator is applied with a probability denoted by ga_cp
. Bit-flip mutation is a mutation operator where the chromosomes are randomly flipped from 0 to 1 or from 1 to 0. Each bit is mutated with probability ga_mp
. Evolution follows the generational replacement with elitism approach, where the top ga_elities
individuals, meaning those with the highest fitness, are preserved across generations. The remaining population is replaced with new offspring generated through selection, crossover, and mutation, ensuring the introduction of novel genetic material in each generation and guiding the algorithm towards convergence. The evolutionary process is controlled by a computational budget defined as a maximum number of generations, ga_gen
.
In line with current trends in HFS research, we also include a GA implementation for the HFS problem. The population consists of ga_psize
individuals, where each candidate solution is a list of input jobs arranged in descending priority order. The fitness of each individual is determined by the makespan achieved at the end of the AnyLogic simulation based on the proposed job sequence. The evolutionary algorithm runs for ga_gen
generations. Rank selection is used to choose parent individuals for crossover, meaning individuals are selected based on their rank in the population, with higher-ranked individuals having a greater chance of being selected. Generational replacement with elitism is employed at each generation, where the entire population is replaced by new offspring, except for the best ga_elites
individuals, who are preserved to ensure high-quality solutions carry over to the next generation. Mutation and crossover are applied with probabilities ga_mp
and ga_cp
, respectively. Since individuals represent a permutation of the input jobs, both evolutionary operators must avoid introducing duplicates in the genotype of the newly generated offspring. The mutation operator randomly swaps genes in an individual a specified number of times, ga_mn
. For crossover, we use the order crossover (OX) method originally proposed by Davis. The operation of the OX operator can be illustrated using an example from Zbigniew Michalewicz’s book "Genetic Algorithms + Data Structures = Evolution Programs".
Given two parent solutions, p₁
and p₂
, with two randomly selected cut points marked by '|':
p₁
= (1 2 3 | 4 5 6 7 | 8 9)
p₂
= (4 5 2 | 1 8 7 6 | 9 3)
First, the segments between the cut points are copied directly into the offspring:
o₁
= (x x x | 4 5 6 7 | x x)
o₂
= (x x x | 1 8 7 6 | x x)
Next, starting from the second cut point of one parent, the genes from the other parent are copied in the same order, skipping any genes already present in the offspring. Once the end of the string is reached, the procedure continues from the first place of the list. The final offspring are:
o₁
= (2 1 8 | 4 5 6 7 | 9 3)
o₂
= (3 4 5 | 1 8 7 6 | 9 2)
This method effectively transfers information about the relative order of jobs from parents to offspring, emphasizing the importance of job sequencing in the resulting solutions.
python optimization_ga.py --out_dir <output_directory> --no_runs <no_runs> --population_size <ga_psize> --offspring_size <ga_psize> --max_generations <ga_gen>
⚠️ Note: For the make-or-buy decision problem, the list of orders is provided in an Excel document. Please ensure that the "population_orders" in the AnyLogic model is correctly referencing the appropriate Excel file.
python optimization_ga.py --out_dir <output_directory> --no_runs <no_runs> --m <M> --e <E> --r <R> --dataset <dataset_path> --max_generations <ga_gen> --population_size <ga_psize> --offspring_size <ga_psize>
To develop our optimization algorithm ACO, we adopt the implementation structure provided by the inspyred Python library. In the inspyred framework, a swarm of aco_psize
artificial ants constructs potential solutions by iteratively selecting components from a set of trial components
At each step, each ant q_0
, the ant chooses the edge q_0
, a probabilistic transition rule is used:
In the equation, the probability that ant
Finally, the pheromone levels on the components comprising the best solution path
These updates modify the environment, guiding the evolution of subsequent ant generations.
We found that in the case of the make-or-buy decision problem, incorporating problem-specific heuristic information is not necessary for achieving good results, so we set aco_gen
.
The authors of [paper] were the first to apply ant colony optimization to solve the HFS problem. Their experiments used benchmarks with a small number of jobs, suggesting that limiting the number of input jobs is necessary to avoid excessive execution times. Initially, in our implementation we defined the discrete set of trial components as
where aco_psize
ants that evolve over aco_gen
generations.
python optimization_aco.py --out_dir <output_directory> --no_runs <no_runs> --population_size <aco_psize> --max_generations <aco_gen>
⚠️ Note: For the make-or-buy decision problem, the list of orders is provided in an Excel document. Please ensure that the "population_orders" in the AnyLogic model is correctly referencing the appropriate Excel file.
python optimization_aco.py --out_dir <output_directory> --dataset <dataset_path> --no_runs <no_runs> --m <M> --e <E> --r <R> --max_generations <aco_gen> --population_size <aco_psize>
For our RL approach, we utilize the PPO implementation from RLlib, an open-source library designed for reinforcement learning applications. The training environment, where the reinforcement learning agent learns its policy, is built using the OpenAI Gymnasium Python library.
As regards the make-or-buy decision task, the specific environment to be solved is a list of orders, and the policy the agent aims to learn is a deep neural network that decides for each order whether to outsource it or produce it in-house. Since the agent processes one order at a time, the observation space is a 5-dimensional vector encoding the input order. The first entry is the order's unique identifier, the second is the number of components of type A required, the third is the number of components of type B, the fourth is the number of components of type C, and the fifth is the order's deadline timestamp. The action space consists of two discrete options: 0 indicates the decision to produce the order internally, while 1 represents the decision to outsource the order to an external supplier. A key aspect of the training process is the use of delayed rewards. The agent receives no immediate feedback after processing each order; a zero reward is given until the entire list of orders has been processed. After the agent has made a "make-or-buy" decision (0 or 1) for each order, the final reward is determined by the total revenue obtained, as calculated through the AnyLogic simulation. An alternative approach could involve providing intermediate rewards by running multiple AnyLogic simulations after a set number of processed orders. Notably, this would significantly slow down the training due to the increased computational cost. Therefore, despite the challenges that are usually associated with delayed rewards, such as potential training instability, we do not adopt this alternative to avoid the excessive execution time and implementation overhead that it would introduce. During training, we count the number of AnyLogic simulations executed and terminate the process after a predefined number rl_eval
of AnyLogic evaluations.
For the HFS, the task assigned to the reinforcement learning agent involves optimizing the scheduling of the list of laser cutting machine orders rl_eval
.
python optimization_rl.py --out_dir <output_directory> --no_runs <no_runs> --no_evaluations <rl_eval>
⚠️ Note: For the make-or-buy decision problem, the list of orders is provided in an Excel document. Please ensure that the "population_orders" in the AnyLogic model is correctly referencing the appropriate Excel file.
⚠️ Note: In the RL implementation for solving the make-or-buy decision problem, the linedf = pd.read_excel("orders.xlsx")
has to be modified to reference the specific Excel file containing the input orders.
python optimization_rl.py --out_dir <output_directory> --no_runs <no_runs> --no_evaluations <rl_eval> --m <M> --e <E> --r <R> --num_machine_types <num_machine_types> --max_makespan <max_makespan> --dataset <dataset_path>
⚠️ Note: To properly determine the value of$\text{makespan}_{\text{max}}$ for a given problem instance, we recommend executing the optimization process using another optimizer. This will provide insight into the makespan values obtained, allowing you to set the$\text{makespan}_{\text{max}}$ argument accordingly. For the datasets used in this paper, we established the following$\text{makespan}_{\text{max}}$ values:$\text{makespan}_{\text{max}} = 500$ for datasets d1, d2, and d3;$\text{makespan}_{\text{max}} = 650$ for dataset d4; and$\text{makespan}_{\text{max}} = 2500$ for dataset d5.
⚠️ Note: As previously explained, machine types are represented by integer values. However, in the dataset files, these machine types are indicated using strings. Before running the code, it is crucial to update the dictionaryself.machine_type_encoding: Dict[str, int]
in theAdigeEnv(gym.Env)
environment to accurately reflect the mapping between machine types and their corresponding integer values.
To implement GP, we use the open-source Python library deap.
For the make-or-buy decision problem, the goal of the evolutionary algorithm is to evolve a population of gp_psize
decision trees that take an order as input and return a binary decision: whether to "make" or "buy" the order. Each order is represented by three integers, corresponding to the number of components of type A, B, and C required for processing. We intentionally exclude the order's deadline from the input features to reduce the solution space, thereby simplifying training, speeding up execution times, and discouraging the generation of complex tree structures that would be difficult to interpret and therefore less useful. The fitness of a decision tree is determined by the revenue generated in the AnyLogic simulation, based on the "make-or-buy" decisions made by the tree. The evolutionary algorithm aims to guide the population toward high-fitness solutions. We define a strongly typed grammar, where each function and terminal has a specified type. The output type of a function must match the input type of another function for them to be connected. The function set
and(a, b): 𝔹 × 𝔹 → 𝔹
or(a, b): 𝔹 × 𝔹 → 𝔹
not(a): 𝔹 → 𝔹
lt(x, y): ℤ × ℤ → 𝔹
gt(x, y): ℤ × ℤ → 𝔹
if_then_else(c, x, y): 𝔹 × 𝕊 × 𝕊 → 𝕊
Where 𝔹 represents boolean values (True/False), ℤ denotes integers, and 𝕊 refers to strings.
The terminal set
"outsource" ∈ 𝕊
"not_outsource" ∈ 𝕊
False ∈ 𝔹
True ∈ 𝔹
The constant integer values are restricted to the interval gp_psize
individuals is randomly generated using the genHalfAndHalf
method. Half of the population is created using the genGrow
method, where tree depths vary between gp_minT
and gp_maxT
. The other half is generated using getFull
, where all trees have a fixed depth between gp_minT
and gp_maxT
. For mutation, we employ a uniform mutation operator. In this approach, a random point in a tree is replaced by a new subtree generated using the genHalfAndHalf
method, parameterized by a minimum depth of gp_minM
and a maximum depth of gp_maxM
. For crossover, we use one-point crossover, where randomly selected subtrees from two parent individuals are swapped. The mutation and crossover operators are applied with probabilities gp_mp
and gp_cp
, respectively. We use tournament selection with tournament size gp_tsize
to choose the individuals for mating. The evolutionary process terminates after a maximum of gp_gen
generations.
In the case of the HFS problem, GP evolves a population of gp_psize
decision trees that take a laser cutting machine encoded as a 1-D vector of features and return a priority level from 0 to priority_levels
. The range
eq(x, y): {mt} × {mt} → 𝔹
lt(x, y): {db, de, dd} × {db, de, dd} → 𝔹
gt(x, y): {db, de, dd} × {db, de, dd} → 𝔹
if_then_else(c, x, y): 𝔹 × 𝕊 × 𝕊 → 𝕊
The terminal set
False ∈ 𝔹
True ∈ 𝔹
priority_levels
.
The GP approach begins with a randomly generated initial population of gp_psize
individuals using the genHalfAndHalf method. Uniform mutation and one-point crossover operators are applied, with tournament selection of size gp_tsize
used to select individuals for mating. The evolutionary process terminates after a maximum of gp_gen
generations.
python optimization_gp.py --out_dir <output_directory> --no_runs <no_runs> --population_size <gp_psize> --max_generations <gp_gen>
⚠️ Note: For the make-or-buy decision problem, the list of orders is provided in an Excel document. Please ensure that the "population_orders" in the AnyLogic model is correctly referencing the appropriate Excel file.
⚠️ Note: In the GP implementation for solving the make-or-buy decision problem, the linedf = pd.read_excel("orders.xlsx")
has to be modified to reference the specific Excel file containing the input orders.
python optimization_gp.py --out_dir <output_directory> --no_runs <no_runs> --m <M> --e <E> --r <R> --num_machine_types <num_machine_types> --priority_levels <priority_levels> --dataset <dataset_path> --population_size <gp_psize> --max_generations <gp_gen>
The outer loop of the optimization process in ELDT is an evolutionary algorithm, which evolves a population of p_size
individuals. Each individual is encoded as a fixed-length list of integers. To evaluate an individual, it is first translated into a corresponding decision tree, based on the production rules of a problem-specific BNF grammar. The grammar implemented for the make-or-buy decision task is as follows.
⟨bt⟩ ::= ⟨if⟩
⟨if⟩ ::= if ⟨condition⟩ then ⟨a⟩ else ⟨a⟩
⟨a⟩ ::= leaf | ⟨if⟩
⟨condition⟩ ::= ⟨in₀⟩ ⟨comp⟩ ⟨c⟩ | ⟨in₁⟩ ⟨comp⟩ ⟨c⟩ | ⟨in₂⟩ ⟨comp⟩ ⟨c⟩
⟨comp⟩ ::= < | >
⟨c⟩ ::= v ∈ [0, 21]
In this grammar, <bt> represents the starting symbol. The <if> rule is used to generate if-then-else statements. The outcome of an if-then-else statement, denoted by <a>, can either be a terminal leaf or another <if> statement. Within each <if> statement, the <condition> is expressed as an inequality comparison (<comp>) between the input variables ⟨in₀⟩, ⟨in₁⟩, or ⟨in₂⟩, which represent the number of components of types A, B, or C required to process the input order, and a constant ⟨c⟩, an integer in the range [1, 21]. At the terminal leaves of the decision tree, the agent must decide between two possible actions (n_episodes
episodes. Similar to RL and GP, the environment is the list of orders enumerated in the Excel document. The agent perceives and processes these orders one at a time.
In the case of the HFS problem, the production rules are defined as follows:
⟨bt⟩ ::= ⟨if⟩
⟨if⟩ ::= if ⟨condition⟩ then ⟨a⟩ else ⟨a⟩ | if ⟨conditioneq⟩ then ⟨a⟩ else ⟨a⟩
⟨a⟩ ::= leaf | ⟨if⟩
⟨condition⟩ ::= ⟨in₁⟩ ⟨comp⟩ ⟨c₁⟩ | ⟨in₂⟩ ⟨comp⟩ ⟨c₂⟩ | ⟨in₃⟩ ⟨comp⟩ ⟨c₃⟩ | ⟨in₄⟩ ⟨comp⟩ ⟨c₄⟩
⟨conditioneq⟩ ::= ⟨in₀⟩ = ⟨c₀⟩
⟨comp⟩ ::= < | >
⟨c₀⟩ ::= v ∈ [0, num_machine_types]
⟨c₁⟩ ::= v ∈ [0, date_basement_arrival_max]
⟨c₂⟩ ::= v ∈ [0, date_electrical_panel_arrival_max]
⟨c₃⟩ ::= v ∈ [0, date_delivery_max]
⟨c₄⟩ ::= v ∈ [0, n]
In this grammar, ⟨bt⟩ is the starting symbol. ⟨if⟩ statements are used to produce conditional logic, where the outcome, denoted by ⟨a⟩, can either be a terminal leaf or another ⟨if⟩ statement, allowing for recursive branching within the tree. The condition production generates inequalities (⟨comp⟩) between constant integer values of types ⟨c₁⟩,...,⟨c₄⟩ and the input order features ⟨in₁⟩, ..., ⟨in₄⟩. The ⟨conditioneq⟩, on the other hand, generates equalities between integer constants of type ⟨c₀⟩ and the input feature ⟨in₀⟩. For a given input order priority_levels
priority levels. The range n_episodes
to update the content of the leaves using Q-learning. The environment in this problem is a list of jobs. To avoid evolving decision trees that overfit to the specific order in which jobs appear, the Gym environment is reset at each generation, and the list of input orders is randomly shuffled. This ensures that the evolved interpretable decision trees are capable of making decisions based on the intrinsic properties of the orders, rather than the order in which they are presented.
python optimization_ge.py --out_dir <output_directory> --no_runs <no_runs> --population_size <p_size> --max_generations <ge_gen> --episodes <n_episodes>
⚠️ Note: For the make-or-buy decision problem, the list of orders is provided in an Excel document. Please ensure that the "population_orders" in the AnyLogic model is correctly referencing the appropriate Excel file.
⚠️ Note: In the ELDT implementation for solving the make-or-buy decision problem, the linedf = pd.read_excel("orders.xlsx")
has to be modified to reference the specific Excel file containing the input orders.
python optimization_ge.py --m <M> --e <E> --r <R> --num_machine_types <num_machine_types> --max_makespan <max_makespan> --max_generations <ge_gen> --dataset <dataset_path> --out_dir <output_directory> --no_runs <no_runs> --population_size <p_size> --episodes <n_episodes> --priority_levels <priority_levels>
⚠️ Note: To properly determine the value of$\text{makespan}_{\text{max}}$ for a given problem instance, we recommend executing the optimization process using another optimizer. This will provide insight into the makespan values obtained, allowing you to set the$\text{makespan}_{\text{max}}$ argument accordingly. For the datasets used in this paper, we established the following$\text{makespan}_{\text{max}}$ values:$\text{makespan}_{\text{max}} = 500$ for datasets d1, d2, and d3;$\text{makespan}_{\text{max}} = 650$ for dataset d4; and$\text{makespan}_{\text{max}} = 2500$ for dataset d5.
⚠️ Note: As previously explained, machine types are represented by integer values. However, in the dataset files, these machine types are indicated using strings. Before running the code, it is crucial to update the dictionaryself.machine_type_encoding: Dict[str, int]
in theAdigeEnv(gym.Env)
environment to accurately reflect the mapping between machine types and their corresponding integer values.
To ensure a fair comparison, each algorithm was allocated 5000 AnyLogic simulation executions. Hyperparameters affecting the computational budget were adjusted accordingly, while other parameters were set to their library defaults.
Parameter | Value |
---|---|
n_iterations |
5000 |
Parameter | Value |
---|---|
n_trials |
5000 |
Parameter | Value |
---|---|
ga_psize |
10 |
ga_gen |
500 |
ga_mp |
1.0 |
ga_cp |
1.0 |
ga_mn |
1 |
ga_elites |
1 |
Parameter | Value |
---|---|
aco_gen |
500 |
aco_psize |
10 |
q_0 |
0.5 |
α |
0.1 |
δ |
1.0 |
ρ |
0.1 |
Parameter | Value |
---|---|
rl_eval |
5000 |
Parameter | Value |
---|---|
gp_gen |
250 |
gp_psize |
20 |
gp_mp |
0.2 |
gp_cp |
0.5 |
gp_tsize |
3 |
gp_minT |
2 |
gp_maxT |
5 |
gp_minM |
0 |
gp_maxM |
2 |
gp_hof_size |
1 |
Parameter | Value |
---|---|
g_l |
100 |
g_max |
40000 |
p_size |
5 |
c_p |
0.5 |
m_p |
0.5 |
ge_gen |
200 |
t_size |
2 |
α |
0.001 |
γ |
0.05 |
ε |
0.05 |
n_episodes |
5 |
Authors:
- Stefano Genetti, MSc Student University of Trento (Italy), stefano.genetti@unitn.it
- Giovanni Iacca, Associate Professor University of Trento (Italy), giovanni.iacca@unitn.it
For every type of doubt/question about the repository please do not hesitate to contact us.