diff --git a/docs/source/tutorials.rst b/docs/source/tutorials.rst index 74449acd..0a44dcf5 100644 --- a/docs/source/tutorials.rst +++ b/docs/source/tutorials.rst @@ -15,5 +15,6 @@ Tutorials tutorials/nn_SiC tutorials/parameter_transform tutorials/uq_mcmc + tutorials/uq_bootstrap tutorials/lennard_jones tutorials/linear_regression diff --git a/docs/source/tutorials/uq_bootstrap.ipynb b/docs/source/tutorials/uq_bootstrap.ipynb new file mode 100644 index 00000000..37d3e0ef --- /dev/null +++ b/docs/source/tutorials/uq_bootstrap.ipynb @@ -0,0 +1,269 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "656f7afc", + "metadata": {}, + "source": [ + "# Bootstrapping\n", + "\n", + "In this example, we demonstrate how to perform uncertainty quantification (UQ) using\n", + "bootstrap method. We use a Stillinger-Weber (SW) potential for silicon that is archived\n", + "in OpenKIM_.\n", + "\n", + "For simplicity, we only set the energy-scaling parameters, i.e., ``A`` and ``lambda`` as\n", + "the tunable parameters. These parameters will be calibrated to energies and forces of a\n", + "small dataset, consisting of 4 compressed and stretched configurations of diamond silicon\n", + "structure." + ] + }, + { + "cell_type": "markdown", + "id": "98b590d7", + "metadata": {}, + "source": [ + "To start, let's first install the SW model::\n", + "\n", + "$ kim-api-collections-management install user SW_StillingerWeber_1985_Si__MO_405512056662_006\n", + "\n", + ".. seealso::\n", + " This installs the model and its driver into the ``User Collection``. See\n", + " :ref:`install_model` for more information about installing KIM models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e617de91", + "metadata": { + "ExecuteTime": { + "end_time": "2024-10-07T12:37:31.393276Z", + "start_time": "2024-10-07T12:37:29.472146Z" + } + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "\n", + "from kliff.calculators import Calculator\n", + "from kliff.dataset import Dataset\n", + "from kliff.loss import Loss\n", + "from kliff.models import KIMModel\n", + "from kliff.uq.bootstrap import BootstrapEmpiricalModel\n", + "from kliff.utils import download_dataset\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "id": "57f71678", + "metadata": {}, + "source": [ + "Before running bootstrap, we need to define a loss function and train the model. More\n", + "detail information about this step can be found in :ref:`tut_kim_sw` and :ref:`tut_params_transform`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a3aa1d13", + "metadata": { + "ExecuteTime": { + "end_time": "2024-10-07T12:37:59.004347Z", + "start_time": "2024-10-07T12:37:58.979490Z" + } + }, + "outputs": [], + "source": [ + "# Create the model\n", + "model = KIMModel(model_name=\"SW_StillingerWeber_1985_Si__MO_405512056662_006\")\n", + "\n", + "# Set the tunable parameters and the initial guess\n", + "opt_params = {\"A\": [[\"default\"]], \"lambda\": [[\"default\"]]}\n", + "\n", + "model.set_opt_params(**opt_params)\n", + "model.echo_opt_params()\n", + "\n", + "# Get the dataset\n", + "dataset_path = download_dataset(dataset_name=\"Si_training_set_4_configs\")\n", + "# Read the dataset\n", + "tset = Dataset(dataset_path)\n", + "configs = tset.get_configs()\n", + "\n", + "# Create calculator\n", + "calc = Calculator(model)\n", + "# Only use the forces data\n", + "ca = calc.create(configs, use_energy=False, use_forces=True)\n", + "\n", + "# Instantiate the loss function\n", + "residual_data = {\"normalize_by_natoms\": False}\n", + "loss = Loss(calc, residual_data=residual_data)" + ] + }, + { + "cell_type": "markdown", + "id": "39a95904", + "metadata": {}, + "source": [ + "To perform UQ by bootstrapping, the general workflow starts by instantiating :class:`~kliff.uq.bootstrap.BootstrapEmpiricalModel`, or :class:`~kliff.uq.bootstrap.BootstrapNeuralNetworkModel` if using a neural network\n", + "potential." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "966614ab", + "metadata": { + "ExecuteTime": { + "end_time": "2024-10-07T12:38:38.479190Z", + "start_time": "2024-10-07T12:38:38.475357Z" + } + }, + "outputs": [], + "source": [ + "# Instantiate bootstrap class object\n", + "BS = BootstrapEmpiricalModel(loss, seed=1717)" + ] + }, + { + "cell_type": "markdown", + "id": "b8be6029", + "metadata": {}, + "source": [ + "Then, we generate some bootstrap compute arguments. This is equivalent to generating\n", + "bootstrap data. Typically, we just need to specify how many bootstrap data samples to\n", + "generate. Additionally, if we call ``generate_bootstrap_compute_arguments`` multiple\n", + "times, the new generated data samples will be appended to the previously generated data\n", + "samples. This is also the behavior if we read the data samples from the previously\n", + "exported file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e660eb87", + "metadata": { + "ExecuteTime": { + "end_time": "2024-10-07T12:39:14.455217Z", + "start_time": "2024-10-07T12:39:14.442511Z" + } + }, + "outputs": [], + "source": [ + "# Generate bootstrap compute arguments\n", + "BS.generate_bootstrap_compute_arguments(100)" + ] + }, + { + "cell_type": "markdown", + "id": "898350eb", + "metadata": {}, + "source": [ + "Finally, we will iterate over these bootstrap data samples and train the potential\n", + "using each data sample. The resulting optimal parameters from each data sample give a\n", + "single sample of parameters. By iterating over all data samples, then we will get an\n", + "ensemble of parameters.\n", + "\n", + "Note that the mapping from the bootstrap dataset to the parameters involve optimization.\n", + "We suggest to use the same mapping, i.e., the same optimizer setting, in each iteration.\n", + "This includes using the same set of initial parameter guess. In the case when the loss\n", + "function has multiple local minima, we don't want the parameter ensemble to be biased\n", + "on the results of the other optimizations. For neural network model, we need to reset\n", + "the initial parameter value, which is done internally.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d347a576", + "metadata": { + "ExecuteTime": { + "end_time": "2024-10-07T12:39:53.510993Z", + "start_time": "2024-10-07T12:39:48.359289Z" + } + }, + "outputs": [], + "source": [ + "# Run bootstrap\n", + "min_kwargs = dict(method=\"lm\") # Optimizer setting\n", + "initial_guess = calc.get_opt_params() # Initial guess in the optimization\n", + "BS.run(min_kwargs=min_kwargs, initial_guess=initial_guess)" + ] + }, + { + "cell_type": "markdown", + "id": "e2526a32", + "metadata": {}, + "source": [ + "The resulting parameter ensemble can be accessed in `BS.samples` as a `np.ndarray`.\n", + "Then, we can plot the distribution of the parameters, as an example, or propagate the\n", + "error to the target quantities we want to study." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e33a7732", + "metadata": { + "ExecuteTime": { + "end_time": "2024-10-07T12:40:23.927758Z", + "start_time": "2024-10-07T12:40:23.759710Z" + } + }, + "outputs": [], + "source": [ + "# Plot the distribution of the parameters\n", + "plt.figure()\n", + "plt.plot(*(BS.samples.T), \".\", alpha=0.5)\n", + "param_names = list(opt_params.keys())\n", + "plt.xlabel(param_names[0])\n", + "plt.ylabel(param_names[1])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "fe68cf9b", + "metadata": {}, + "source": [ + ".. _OpenKIM: https://openkim.org" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": false, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tests/uq/conftest.py b/tests/uq/conftest.py new file mode 100644 index 00000000..0b18c706 --- /dev/null +++ b/tests/uq/conftest.py @@ -0,0 +1,40 @@ +from pathlib import Path + +import pytest + +from kliff.dataset import Dataset +from kliff.models import KIMModel + + +@pytest.fixture(scope="session") +def uq_test_dir(): + # Directory of uq test files + return Path(__file__).resolve().parent + + +@pytest.fixture(scope="session") +def uq_test_data_dir(): + # Directory of uq test data + return Path(__file__).resolve().parents[1] / "test_data/configs/Si_4" + + +@pytest.fixture(scope="session") +def uq_test_configs(uq_test_data_dir): + # Load test configs + data = Dataset(uq_test_data_dir) + return data.get_configs() + + +@pytest.fixture(scope="session") +def uq_kim_model(): + # Load a KIM model + modelname = "SW_StillingerWeber_1985_Si__MO_405512056662_006" + model = KIMModel(modelname) + model.set_opt_params(A=[["default"]]) + return model + + +@pytest.fixture(scope="session") +def uq_nn_orig_state_filename(uq_test_dir): + """Return the original state filename for the NN model.""" + return uq_test_dir / "orig_model.pkl" diff --git a/tests/uq/test_bootstrap_empirical.py b/tests/uq/test_bootstrap_empirical.py index ef25bf93..0bf5a26b 100644 --- a/tests/uq/test_bootstrap_empirical.py +++ b/tests/uq/test_bootstrap_empirical.py @@ -1,13 +1,9 @@ -from pathlib import Path - import numpy as np import pytest from scipy.optimize import OptimizeResult from kliff.calculators.calculator import Calculator, _WrapperCalculator -from kliff.dataset import Dataset from kliff.loss import Loss -from kliff.models import KIMModel from kliff.uq.bootstrap import ( Bootstrap, BootstrapEmpiricalModel, @@ -15,61 +11,72 @@ bootstrap_cas_generator_empirical, ) -seed = 1717 -np.random.seed(seed) +np.random.seed(1717) + +# Some variables +min_kwargs = dict(method="lm") # Optimizer settings +nsamples = np.random.randint(1, 5) # Number of samples -# model -modelname = "SW_StillingerWeber_1985_Si__MO_405512056662_006" -model = KIMModel(modelname) -model.set_opt_params(A=[["default"]]) -# training set -FILE_DIR = Path(__file__).absolute().parent # Directory of test file -path = FILE_DIR.parent.joinpath("test_data/configs/Si_4") -data = Dataset(path) -configs = data.get_configs() +@pytest.fixture(scope="module") +def calc_forces(uq_kim_model, uq_test_configs): + """Calculator that only computes forces.""" + calculator_forces = Calculator(uq_kim_model) + cas_forces = calculator_forces.create( + uq_test_configs, use_energy=False, use_forces=True + ) + ncas_forces = len(cas_forces) + return calculator_forces, cas_forces, ncas_forces -# calculators -# forces -calc_forces = Calculator(model) -cas_forces = calc_forces.create(configs, use_energy=False, use_forces=True) -ncas_forces = len(calc_forces.get_compute_arguments()) -calc_energy = Calculator(model) -cas_energy = calc_energy.create(configs, use_energy=True, use_forces=False) -ncas_energy = len(calc_energy.get_compute_arguments()) -calc_comb = _WrapperCalculator(calculators=[calc_energy, calc_forces]) +@pytest.fixture(scope="module") +def calc_energy(uq_kim_model, uq_test_configs): + """Calculator that only computes energy.""" + calculator_energy = Calculator(uq_kim_model) + cas_energy = calculator_energy.create( + uq_test_configs, use_energy=True, use_forces=False + ) + ncas_energy = len(cas_energy) + return calculator_energy, ncas_energy -# Single calculator -# loss -min_kwargs = dict(method="lm") -loss_forces = Loss(calc_forces) -loss_forces.minimize(**min_kwargs) -# initial state -orig_params = calc_forces.get_opt_params() -# Bootstrap class -BS_1calc = Bootstrap(loss_forces) -nsamples = np.random.randint(1, 5) +@pytest.fixture(scope="module") +def BS_1calc(calc_forces): + """Bootstrap class with a single calculator.""" + loss_forces = Loss(calc_forces[0]) + loss_forces.minimize(**min_kwargs) + return Bootstrap(loss_forces) -# Multiple calculators -# loss -loss_comb = Loss(calc_comb) -loss_comb.minimize(**min_kwargs) +@pytest.fixture(scope="module") +def orig_params(calc_forces, BS_1calc): + """Original parameters of the model.""" + return calc_forces[0].get_opt_params() -# Bootstrap class -BS_2calc = Bootstrap(loss_comb) +@pytest.fixture(scope="module") +def BS_2calc(uq_kim_model, uq_test_configs): + """Bootstrap class with multiple calculators.""" + # Combined calculators + calculator_forces = Calculator(uq_kim_model) + _ = calculator_forces.create(uq_test_configs, use_energy=False, use_forces=True) + calculator_energy = Calculator(uq_kim_model) + _ = calculator_energy.create(uq_test_configs, use_energy=True, use_forces=False) + calc_comb = _WrapperCalculator(calculators=[calculator_energy, calculator_forces]) -def test_wrapper(): + loss_comb = Loss(calc_comb) + loss_comb.minimize(**min_kwargs) + return Bootstrap(loss_comb) + + +def test_wrapper(BS_1calc): """Test if the Bootstrap class wrapper instantiate the correct class.""" assert isinstance( BS_1calc, BootstrapEmpiricalModel ), "Wrapper should instantiate BootstrapEmpiricalModel" -def test_error(): +def test_error(BS_1calc): """Test if BootstrapError is raised when we try to call run before generating bootstrap compute arguments. """ @@ -77,10 +84,11 @@ def test_error(): BS_1calc.run(min_kwargs=min_kwargs) -def test_bootstrap_cas_generator(): +def test_bootstrap_cas_generator(BS_1calc, calc_forces): """Test the shape of generated bootstrap compute arguments.""" BS_1calc.generate_bootstrap_compute_arguments(nsamples) bootstrap_cas = BS_1calc.bootstrap_compute_arguments + _, cas_forces, ncas_forces = calc_forces # Test the shape of bootstrap cas samples with default arguments assert ( len(bootstrap_cas) == nsamples @@ -108,7 +116,7 @@ def test_bootstrap_cas_generator(): ), "Generator doesn't generate the same number of cas as requested for each sample" -def test_callback(): +def test_callback(BS_1calc): """Test if callback function works and can break the loop in run method.""" def callback(_, opt): @@ -121,7 +129,7 @@ def callback(_, opt): ), "Callback function cannot break the loop in run method." -def test_run(): +def test_run(BS_1calc, orig_params): """Test the method to run the calculation.""" BS_1calc.run(min_kwargs=min_kwargs) # Test the shape of ensembles @@ -138,7 +146,7 @@ def test_run(): ), "Parameters are not restored to initial state" -def test_appending_cas(): +def test_appending_cas(BS_1calc): """If we call the generate method again, test if it appends the newly generated bootstrap compute arguments to the previously generated list. """ @@ -148,9 +156,9 @@ def test_appending_cas(): ), "The newly generated cas is not appended to the old list" -def test_save_load_cas(): +def test_save_load_cas(BS_1calc, uq_test_dir): """Test the function to load bootstrap compute arguments.""" - filename = FILE_DIR / "bootstrap_cas.json" + filename = uq_test_dir / "bootstrap_cas.json" BS_1calc.save_bootstrap_compute_arguments(filename) BS_1calc.load_bootstrap_compute_arguments(filename) filename.unlink() @@ -159,7 +167,7 @@ def test_save_load_cas(): ), "Problem with function to load bootstrap cas" -def test_reset(): +def test_reset(BS_1calc, orig_params): """Test the reset method.""" BS_1calc.reset() # Check reset parameters @@ -171,12 +179,14 @@ def test_reset(): assert BS_1calc._nsamples_done == 0, "Reset ensembles failed" -def test_multi_calc_cas_generator(): +def test_multi_calc_cas_generator(BS_2calc, calc_energy, calc_forces): """Test the shape of generated bootstrap compute arguments when we use multiple calculators. """ BS_2calc.generate_bootstrap_compute_arguments(nsamples) bootstrap_cas = BS_2calc.bootstrap_compute_arguments + ncas_energy = calc_energy[1] + ncas_forces = calc_forces[2] assert ( len(bootstrap_cas) == nsamples ), "The number of generated cas is not the same as requested, check the generator" @@ -185,4 +195,7 @@ def test_multi_calc_cas_generator(): ), "For each sample, the generator should generate cas for each calculator" assert ( len(bootstrap_cas[0][0]) + len(bootstrap_cas[0][1]) == ncas_energy + ncas_forces - ), "For each sample, generator should generate the same number of cas in total as the original" + ), ( + "For each sample, generator should generate the same number of cas in total " + "as the original" + ) diff --git a/tests/uq/test_bootstrap_nn.py b/tests/uq/test_bootstrap_nn.py index 27fc337b..2f3fbae8 100644 --- a/tests/uq/test_bootstrap_nn.py +++ b/tests/uq/test_bootstrap_nn.py @@ -1,16 +1,8 @@ -""" -TODO: -- [] Test if the implementation works with model with multiple elements. -""" - -from pathlib import Path - import numpy as np import pytest from kliff import nn from kliff.calculators import CalculatorTorch -from kliff.dataset import Dataset from kliff.descriptors import SymmetryFunction from kliff.loss import Loss from kliff.models import NeuralNetwork @@ -24,49 +16,59 @@ seed = 1717 np.random.seed(seed) -# descriptor -descriptor = SymmetryFunction( - cut_name="cos", cut_dists={"Si-Si": 5.0}, hyperparams="set30", normalize=True -) +# Number of bootstrap samples +nsamples = np.random.randint(1, 5) +# Optimizer settings +min_kwargs = dict(method="Adam", num_epochs=10, batch_size=100, lr=0.001) -# model -N1 = np.random.randint(5, 10) -model = NeuralNetwork(descriptor) -model.add_layers( - nn.Linear(descriptor.get_size(), N1), - nn.Tanh(), - nn.Linear(N1, 1), -) -# training set -FILE_DIR = Path(__file__).absolute().parent # Directory of test file -path = FILE_DIR.parent.joinpath("test_data/configs/Si_4") -data = Dataset(path) -configs = data.get_configs() +@pytest.fixture(scope="session") +def model(): + # descriptor + descriptor = SymmetryFunction( + cut_name="cos", cut_dists={"Si-Si": 5.0}, hyperparams="set30", normalize=True + ) + + # model + N1 = np.random.randint(5, 10) + nn_model = NeuralNetwork(descriptor) + nn_model.add_layers( + nn.Linear(descriptor.get_size(), N1), + nn.Tanh(), + nn.Linear(N1, 1), + ) + return nn_model -# calculators -calc = CalculatorTorch(model) -_ = calc.create(configs, use_energy=False, use_forces=True) -fingerprints = calc.get_fingerprints() -nfingerprints = len(fingerprints) -loss = Loss(calc) -min_kwargs = dict(method="Adam", num_epochs=10, batch_size=100, lr=0.001) -result = loss.minimize(**min_kwargs) +@pytest.fixture(scope="session") +def calc(model, uq_test_configs): + calculator = CalculatorTorch(model) + _ = calculator.create(uq_test_configs, use_energy=False, use_forces=True) + return calculator -orig_state_filename = FILE_DIR / "orig_model.pkl" -BS = Bootstrap(loss, orig_state_filename=orig_state_filename) -nsamples = np.random.randint(1, 5) + +@pytest.fixture(scope="session") +def fingerprints(calc): + fp = calc.get_fingerprints() + nfp = len(fp) + return fp, nfp + + +@pytest.fixture(scope="session") +def BS(calc, uq_nn_orig_state_filename): + loss = Loss(calc) + _ = loss.minimize(**min_kwargs) + return Bootstrap(loss, orig_state_filename=uq_nn_orig_state_filename) -def test_wrapper(): +def test_wrapper(BS): """Test if the Bootstrap class wrapper instantiate the correct class.""" assert isinstance( BS, BootstrapNeuralNetworkModel ), "Wrapper should instantiate BootstrapNeuralNetworkModel" -def test_error(): +def test_error(BS): """Test if BootstrapError is raised when we try to call run before generating bootstrap compute arguments. """ @@ -74,12 +76,12 @@ def test_error(): BS.run(min_kwargs=min_kwargs) -def test_original_state(): +def test_original_state(uq_nn_orig_state_filename): """Test we export the original state when we instantiate the class.""" - assert orig_state_filename.exists(), "Original state is not exported" + assert uq_nn_orig_state_filename.exists(), "Original state is not exported" -def test_bootstrap_cas_generator(): +def test_bootstrap_cas_generator(BS, fingerprints): """Test the generator function generate the same number of bootstrap compute arguments samples as requested. """ @@ -90,23 +92,23 @@ def test_bootstrap_cas_generator(): len(bootstrap_cas) == nsamples ), "The number of generated cas is not the same as requested, check the generator" assert np.all( - [len(bs_cas) == nfingerprints for _, bs_cas in bootstrap_cas.items()] + [len(bs_cas) == fingerprints[1] for _, bs_cas in bootstrap_cas.items()] ), "For each sample, generator should generate the same number of cas as the original" assert ( BS._nsamples_prepared == nsamples ), "`_nsamples_prepared` property doesn't work" # Test the shape of bootstrap cas samples if we specify the number of cas to generate - nfp = nfingerprints - 1 + nfp = fingerprints[1] - 1 bootstrap_cas_2 = bootstrap_cas_generator_neuralnetwork( - nsamples, fingerprints, nfingerprints=nfp + nsamples, fingerprints[0], nfingerprints=nfp ) assert np.all( [len(bs_cas) == nfp for _, bs_cas in bootstrap_cas_2.items()] ), "Generator doesn't generate the same number of cas as requested for each sample" -def test_run(): +def test_run(BS, calc): """Test the method to run the calculation.""" BS.run(min_kwargs=min_kwargs) # Test the shape of ensembles @@ -119,7 +121,7 @@ def test_run(): assert BS._nsamples_done == nsamples -def test_appending_cas(): +def test_appending_cas(BS): """If we call the generate method again, test if it appends the newly generated bootstrap compute arguments to the previously generated list. """ @@ -129,9 +131,9 @@ def test_appending_cas(): ), "The newly generated cas is not appended to the old list" -def test_save_load_cas(): +def test_save_load_cas(BS, uq_test_dir): """Test the function to load bootstrap compute arguments.""" - filename = FILE_DIR / "bootstrap_cas.json" + filename = uq_test_dir / "bootstrap_cas.json" BS.save_bootstrap_compute_arguments(filename) BS.load_bootstrap_compute_arguments(filename) filename.unlink() @@ -140,7 +142,7 @@ def test_save_load_cas(): ), "Problem with function to load bootstrap cas" -def test_reset(): +def test_reset(BS): """Test the reset method.""" BS.reset() # Check reset bootstrap samples diff --git a/tests/uq/test_bootstrap_nn_separate_species.py b/tests/uq/test_bootstrap_nn_separate_species.py index 9b678a8b..306f26c8 100644 --- a/tests/uq/test_bootstrap_nn_separate_species.py +++ b/tests/uq/test_bootstrap_nn_separate_species.py @@ -2,10 +2,10 @@ from pathlib import Path import numpy as np +import pytest from kliff import nn from kliff.calculators.calculator_torch import CalculatorTorchSeparateSpecies -from kliff.dataset import Dataset from kliff.descriptors import SymmetryFunction from kliff.loss import Loss from kliff.models import NeuralNetwork @@ -14,52 +14,68 @@ seed = 1717 np.random.seed(seed) -# descriptor -descriptor = SymmetryFunction( - cut_name="cos", - cut_dists={"Si-Si": 5.0, "C-C": 5.0, "Si-C": 5.0}, - hyperparams="set30", - normalize=True, -) - -# models -# Si -N1 = np.random.randint(5, 10) -model_si = NeuralNetwork(descriptor) -model_si.add_layers( - nn.Linear(descriptor.get_size(), N1), - nn.Tanh(), - nn.Linear(N1, 1), -) - -# C -model_c = NeuralNetwork(descriptor) -model_c.add_layers( - nn.Linear(descriptor.get_size(), N1), - nn.Tanh(), - nn.Linear(N1, 1), -) - -# training set -FILE_DIR = Path(__file__).absolute().parent # Directory of test file -path = FILE_DIR.parent.joinpath("test_data/configs/SiC_4") -data = Dataset(path) -configs = data.get_configs() - -# calculators -calc = CalculatorTorchSeparateSpecies({"Si": model_si, "C": model_c}) -_ = calc.create(configs, use_energy=False, use_forces=True) - -loss = Loss(calc) -min_kwargs = dict(method="Adam", num_epochs=10, batch_size=100, lr=0.001) -result = loss.minimize(**min_kwargs) - -orig_state_filename = FILE_DIR / "orig_model.pkl" -BS = BootstrapNeuralNetworkModel(loss, orig_state_filename=orig_state_filename) +# Number of bootstrap samples nsamples = np.random.randint(1, 5) +# Number of nodes of the model +N = np.random.randint(5, 10) +# Optimizer settings +min_kwargs = dict(method="Adam", num_epochs=10, batch_size=100, lr=0.001) -def test_model(): +@pytest.fixture(scope="session") +def descriptor(): + """Return atomic descriptor.""" + return SymmetryFunction( + cut_name="cos", + cut_dists={"Si-Si": 5.0, "C-C": 5.0, "Si-C": 5.0}, + hyperparams="set30", + normalize=True, + ) + + +@pytest.fixture(scope="session") +def model_si(descriptor): + """Return a model for Si.""" + model = NeuralNetwork(descriptor) + model.add_layers( + nn.Linear(descriptor.get_size(), N), + nn.Tanh(), + nn.Linear(N, 1), + ) + return model + + +@pytest.fixture(scope="session") +def model_c(descriptor): + """Return a model for C.""" + model = NeuralNetwork(descriptor) + model.add_layers( + nn.Linear(descriptor.get_size(), N), + nn.Tanh(), + nn.Linear(N, 1), + ) + return model + + +@pytest.fixture(scope="session") +def calc(uq_test_configs, model_si, model_c): + """Calculator for the test.""" + calculator = CalculatorTorchSeparateSpecies({"Si": model_si, "C": model_c}) + _ = calculator.create(uq_test_configs, use_energy=False, use_forces=True) + return calculator + + +@pytest.fixture(scope="session") +def BS(calc, uq_nn_orig_state_filename): + """Return a Bootstrap object.""" + loss = Loss(calc) + _ = loss.minimize(**min_kwargs) + return BootstrapNeuralNetworkModel( + loss, orig_state_filename=uq_nn_orig_state_filename + ) + + +def test_model(BS): """Test the model property.""" # models assert len(BS.model) == 2, "There should be 2 elements in BS.model" @@ -69,14 +85,14 @@ def test_model(): ), "There are species not recorded" -def test_original_state(): +def test_original_state(BS, uq_nn_orig_state_filename): """Test we export the original states for all models when we instantiate the class.""" # check shape assert ( len(BS.orig_state_filename) == 2 ), "There should be 2 elements in orig_state_filename" # check elements - splitted_path = os.path.splitext(orig_state_filename) + splitted_path = os.path.splitext(uq_nn_orig_state_filename) fnames = [ Path(splitted_path[0] + f"_{el}" + splitted_path[1]) for el in ["Si", "C"] ] @@ -89,7 +105,7 @@ def test_original_state(): ), "Not all original state is not exported" -def test_run(): +def test_run(BS, calc): """Test the method to run the calculation.""" BS.generate_bootstrap_compute_arguments(nsamples) BS.run(min_kwargs=min_kwargs) diff --git a/tests/uq/test_mcmc.py b/tests/uq/test_mcmc.py index aa0d7fff..f0fbef55 100644 --- a/tests/uq/test_mcmc.py +++ b/tests/uq/test_mcmc.py @@ -1,76 +1,59 @@ from multiprocessing import Pool -from pathlib import Path import numpy as np import pytest from kliff.calculators import Calculator -from kliff.dataset import Dataset from kliff.loss import Loss -from kliff.models import KIMModel from kliff.uq import MCMC, get_T0 try: from kliff.uq.mcmc import PtemceeSampler - - ptemcee_avail = True except ImportError: - ptemcee_avail = False + print("ptemcee is not found") try: from kliff.uq.mcmc import EmceeSampler - - emcee_avail = True except ImportError: - emcee_avail = False + print("emcee is not found") seed = 1717 np.random.seed(seed) -# model -modelname = "SW_StillingerWeber_1985_Si__MO_405512056662_006" -model = KIMModel(modelname) -model.set_opt_params(A=[["default"]]) - -# training set -path = Path(__file__).absolute().parents[1].joinpath("test_data/configs/Si_4") -data = Dataset(path) -configs = data.get_configs() - -# calculator -calc = Calculator(model) -ca = calc.create(configs) -# loss -loss = Loss(calc) -loss.minimize(method="lm") - -# dimensionality -ndim = calc.get_num_opt_params() +# dimensionality --- Number of parameters is defined when creating the calculator nwalkers = 2 * np.random.randint(1, 3) ntemps = np.random.randint(2, 4) nsteps = np.random.randint(5, 10) -# samplers -prior_bounds = np.tile([0, 10], (ndim, 1)) -if ptemcee_avail: - ptsampler = MCMC( - loss, - ntemps=ntemps, - nwalkers=nwalkers, - logprior_args=(prior_bounds,), - random=np.random.RandomState(seed), - ) -if emcee_avail: - sampler = MCMC( - loss, nwalkers=nwalkers, logprior_args=(prior_bounds,), sampler="emcee" - ) +@pytest.fixture(scope="module") +def calc(uq_kim_model, uq_test_configs): + """Create calculator instance.""" + model = uq_kim_model + # calculator + calculator = Calculator(model) + ca = calculator.create(uq_test_configs) + return calculator + + +@pytest.fixture(scope="module") +def ndim(calc): + return calc.get_num_opt_params() -@pytest.fixture(scope="session") +@pytest.fixture(scope="module") +def loss(calc): + """Create loss instance.""" + L = Loss(calc) + L.minimize(method="lm") + return L + + +@pytest.fixture(scope="module") def ptemcee_avail(): + """Check if ptemcee is available.""" try: from kliff.uq.mcmc import PtemceeSampler @@ -79,8 +62,9 @@ def ptemcee_avail(): return False -@pytest.fixture(scope="session") +@pytest.fixture(scope="module") def emcee_avail(): + """Check if emcee is available.""" try: from kliff.uq.mcmc import EmceeSampler @@ -89,7 +73,33 @@ def emcee_avail(): return False -def test_T0(): +@pytest.fixture(scope="module") +def prior_bounds(ndim): + """Return prior bounds.""" + return np.tile([0, 10], (ndim, 1)) + + +@pytest.fixture(scope="module") +def ptsampler(ptemcee_avail, loss, prior_bounds): + """Instantiate PTSampler.""" + print("Instantiating PTSampler") + return MCMC( + loss, + ntemps=ntemps, + nwalkers=nwalkers, + logprior_args=(prior_bounds,), + random=np.random.RandomState(seed), + ) + + +@pytest.fixture(scope="module") +def sampler(emcee_avail, loss, prior_bounds): + """Instantiate Emcee Sampler.""" + print("Instantiating Emcee Sampler") + return MCMC(loss, nwalkers=nwalkers, logprior_args=(prior_bounds,), sampler="emcee") + + +def test_T0(calc, loss): """Test if the function to compute T0 works properly. This is done by comparing T0 computed using the internal function and computed manually. """ @@ -103,7 +113,7 @@ def test_T0(): @pytest.mark.skipif(not ptemcee_avail, reason="ptemcee is not found") -def test_MCMC_wrapper1(): +def test_MCMC_wrapper1(ptsampler): """Test if the MCMC wrapper class returns the correct sampler instance.""" assert ( type(ptsampler) == PtemceeSampler @@ -111,12 +121,12 @@ def test_MCMC_wrapper1(): @pytest.mark.skipif(not emcee_avail, reason="emcee is not found") -def test_MCMC_wrapper2(): +def test_MCMC_wrapper2(sampler): assert type(sampler) == EmceeSampler, "MCMC should return ``EmceeSampler`` instance" @pytest.mark.skipif(not ptemcee_avail, reason="ptemcee is not found") -def test_dimensionality1(): +def test_dimensionality1(ptsampler, ndim): """Test the number of temperatures, walkers, steps, and parameters. This is done by comparing the shape of the resulting MCMC chains and the variables used to set these dimensions. @@ -134,7 +144,7 @@ def test_dimensionality1(): @pytest.mark.skipif(not emcee_avail, reason="emcee is not found") -def test_dimensionality2(): +def test_dimensionality2(sampler, ndim): # Test for emcee wrapper p0 = np.random.uniform(0, 10, (nwalkers, ndim)) sampler.run_mcmc(initial_state=p0, nsteps=nsteps) @@ -146,7 +156,7 @@ def test_dimensionality2(): @pytest.mark.skipif(not ptemcee_avail, reason="ptemcee is not found") -def test_pool_exception(): +def test_pool_exception(loss, prior_bounds): """Test if an exception is raised when declaring the pool prior to instantiating ``kliff.uq.MCMC``. """ @@ -160,7 +170,7 @@ def test_pool_exception(): ) -def test_sampler_exception(): +def test_sampler_exception(loss, prior_bounds): """Test if an exception is raised when specifying sampler string other than the built-in ones. """