SDM'Studio is a C++ librairy that provides efficient solvers for sequential decision making problems.
POSG | Dec-POMDP | ZSPOSG | NDPODMP | SG | Dec-MDP |
---|---|---|---|---|---|
❌ | ✔️ | ❌ | ✔️ | ❌ | ❌ |
POMDP | MDP |
---|---|
✔️ | ✔️ |
HSVI | Q-Learning | Value Iteration | A* | Policy Iteration | JESP |
---|---|---|---|---|---|
✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ |
Follow installation instructions on https://sdmstudio.github.io/tutorials/install
(aurelien : ) if u have issues with toulbar2 (error : cannot link -ltb2), then just put the .so of ltb2 that is in lib/ in /usr/lib.
Several scripts are available after installing SDMS. The main program SDMStudio
should cover most of the basic usage. If, this is not enough, you may have a look to other SDMS programs sdms-xxxx
.
SDMStudio algorithms
SDMStudio worlds
SDMStudio solve [ARG...]
SDMStudio solve [-a ALGO] [-p PROBLEM] [-f FORMALISM] [-e ERROR] [-d DISCOUNT] [-h HORIZON] [-t TRIALS] [-n EXP_NAME]
SDMStudio solve [--algorithm ALGO] [--problem PROBLEM] [--formalism FORMALISM] [--error ERROR] [--discount DISCOUNT] [--horizon HORIZON] [--trials TRIALS] [--name EXP_NAME]
Exemple: solve the multi-agent problem called tiger as it was a single-agent problem. HSVI will be used by default.
cd sdms/
SDMStudio solve -p data/world/dpomdp/tiger.dpomdp -f pomdp -e 0.001 -d 1.0 -h 4
SDMStudio learn [ARG...]
SDMStudio learn [-a ALGO] [-p PROBLEM] [-f FORMALISM] [-l LEARNING_RATE] [-d DISCOUNT] [-h HORIZON] [-t NUM_TIMESTEPS] [-n EXP_NAME]
SDMStudio learn [--algorithm ALGO] [--problem PROBLEM] [--formalism FORMALISM] [--lr LEARNING_RATE] [--discount DISCOUNT] [--horizon HORIZON] [--nb_timesteps NUM_TIMESTEPS] [--name EXP_NAME]
Exemple: solve the multi-agent problem called tiger as it was a single-agent problem. Q-learning will be used by default.
cd sdms/
SDMStudio learn -p data/world/dpomdp/tiger.dpomdp -f pomdp -l 0.01 -d 1.0 -h 4 -t 30000
SDMStudio test [ARG...]
#include <iostream>
#include <sdm/worlds.hpp>
#include <sdm/parser/parser.hpp>
int main(int argc, char **argv)
{
auto dpomdp_world= sdm::parser::parse_file("my_problem.dpomdp");
std::cout << "Nb Agents : " << dpomdp_world->getNumAgents() << std::endl;
std::cout << "State Space : " << *dpomdp_world->getStateSpace() << std::endl;
std::cout << "Action Space : " << *dpomdp_world->getActionSpace() << std::endl;
std::cout << "Observation space : " << *dpomdp_world->getObsSpace() << std::endl;
return 0;
}
- Garder état simultané lié à l'état séquentialisé
- Faire équivalence sur l'état simultané
- Stocker a^{\kappa} lié au s^{\kappa}{t} et le s^{\kappa}{t+1} qui en résulte
- Ne mettre à jour le a^kappa qu'à une fréquence fixé
- Troncature = m donc à horizon infini on a que m+1 pas de temps
- faire en sorte de pouvoir charger n'importe quel problème simultané
- pouvoir le sérialiser
- factoriser la représentation des états d'occupations et des structures de représentation (fonction de valeur, dynamique, reward)
-
déplacer les définitions vers les states
-
transition et récompense
-
Récompense :
- produit scalaire
- ou récompense vectoriel
-
Transition :
-
SG, ZS-SG, ZS-POSG, POSG