-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline extras and example configuration files #55
Conversation
Signed-off-by: Otavio Napoli <otavio.napoli@gmail.com>
…utils Signed-off-by: Otavio Napoli <otavio.napoli@gmail.com>
Signed-off-by: Otavio Napoli <otavio.napoli@gmail.com>
…culcate seismic attributes (train and eval) Signed-off-by: Otavio Napoli <otavio.napoli@gmail.com>
configs/pipelines/lightning_pipeline/unet_f3_reconstruct_evaluate.yaml
Outdated
Show resolved
Hide resolved
configs/pipelines/lightning_pipeline/unet_f3_reconstruct_train.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Otavio Napoli <otavio.napoli@gmail.com>
De fato, os elementos relacionados aos experimentos não devem fazer parte da biblioteca. No último commit, Eu movi os arquivos de configuração para o repositório minerva-seismic. Lá, podemos manter os arquivos de configuração utilizados para realizar experimentos, resultados e saídas (caso necessário) e demais operações de customização especializadas em sísmica. |
Nota: eu movi todos os arquivos de configurações para o repositório minerva-seismic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Pipelines and Examples
In this PR we add a simple yet powerful generic pipeline named
SimpleLightningPipeline
. This class is a subclass ofPipeline
and it is designed to work with PyTorch Lightning models (train/test/predict and evaluate) and also with jsonargparse CLI module.The
SimpleLightningPipeline
receives a model and a trainer as init_args. The entry-point (run method) receives the data and the task (fit, test, predict, or evaluate) and runs the corresponding method of the trainer.regression_metrics
andclassification_metrics
, that uses torchmetrics Metric API. This method should be further customized to complex evaluation tasks that does not fit on torchmetrics API, such as performing overlapping of samples and plotting images, creating confusion matrix, etc.The
SimpleLightningPipeline
can be called from standard Python code or from the command line using jsonargparse.In the
SimpleLightningPipeline
we already added a CLI at main code that expose the class init arguments and the run method arguments. The CLI is generated using jsonargparse, and it is very flexible and powerful. It can be used to run the pipeline from the command line, and also to generate the command line help and documentation.All the arguments of the
SimpleLightningPipeline
are exposed in the CLI, and the user can pass them as command line arguments or as a json/YAML file. In config folder, we added some configurations that are useful for the pipeline, such as the model, the trainer, the data, etc.Example of training a model to compute seismic attributes
Suppose we have the
original.zarr
andenvelope.zarr
files that correspond to the original and envelope seismic F3 data, respectively. We want to train a model to compute the seismic attributes of the envelope data. We can use theSimpleLightningPipeline
to train the model and evaluate it.We can create a config file
config.yaml
with the following content:We can run the pipeline using the following command:
Or if using the already configuration files, which is in a modular format:
Configuration Files
The configuration files are very flexible and can be used to run the pipeline in different ways. We have structured the configuration files in a modular way, using the following directory structure:
configs/callbacks/
: Contains default configurations for callbacks. This is used when instantiating the Trainer.configs/data/
: Contains default configurations for data modules. This is used when instantiating the DataModule, for each dataset/task.configs/logger/
: Contains default configurations for loggers. This is used when instantiating the Trainer.configs/models/
: Contains default configurations for models. This is used when instantiating the model.configs/pipelines/
: Contains configurations for the pipeline. This is used when instantiating the pipeline. Inside this folder, we have the following subfolders:configs/pipelines/lightning_pipeline/
: Contains configurations for the SimpleLightningPipeline. This is used when instantiating the pipeline.configs/pipelines/other_pipeline/
: Contains configurations for other pipelines. This is used when instantiating the pipeline.configs/trainer/
: Contains default configurations for trainers. This is used when instantiating the Trainer.The configurations for
SimpleLightningPipeline
are in theconfigs/pipelines/lightning_pipeline/
folder. These are the configurations that are used when instantiating theSimpleLightningPipeline
and usually contians all the CLI arguments that are passed to theSimpleLightningPipeline
class. This you could simple run:Others
SimpleLightningPipeline.evaluate
method when torchmetrics API is not enough to evaluate the model. This method should be further customized to complex evaluation tasks that does not fit on torchmetrics API, such as performing overlapping of samples and plotting images, creating confusion matrix, etc.class_path
andinit_args
are used to instantiate the classes. Theclass_path
is the path to the class, and theinit_args
are the arguments that are passed to the class constructor. Theinit_args
can contain references to other configuration files. For more information about the configuration files, see the jsonargparse documentation.typing
module. This allows the user to know the expected type of the variables and also allows the IDE to provide code completion and type checking. Also, this allows seamless integration with the jsonargparse module, which uses the typing module to infer the types of the variables. In fact, the jsonargparse CLI will fail if the types of the variables are not correctly defined or assigned.predict_dataloader
method, which returns a split of the dataset to make predictions (usually, the test part).