Skip to content

Latest commit

 

History

History
164 lines (115 loc) · 4.99 KB

4.test.md

File metadata and controls

164 lines (115 loc) · 4.99 KB

Run tests with pytest

Scikit-learn developers use the pytest command line tool to collect the list of all tests written in the scikit-learn source code, execute them and report any failure message.

You can install pytest with conda:

$ conda install pytest

A test is typically written as a Python function with assert statement, for instance:

def test_sum_of_two_integers():
    a = 40
    b = 2
    expected_result = 42
    assert a + b == expected_result

Scikit-learn comes with two types of tests:

  • common tests, generic tests that every estimator should follow;
  • tests specific to each scikit-learn module.

Common tests are stored in the sklearn/tests folder. You can run them using:

$ pytest sklearn/tests/

If you want to run tests on one specific estimator, eg RandomForestClassifier, you can use

$ pytest sklearn/tests -k RandomForestClassifier

Each sklearn subpackage comes with a tests folder that gathers the test files specifically related to it. For instance the folder sklearn/cluster/tests holds all the test file written specifically to test clustering algorithms. In particular, sklearn/cluster/tests/test_k_means.py is the test file for K-Means clustering algorithm.

Use VSCode to open the source code for the RandomForestClassifier class.

  • What is the path of this file?
  • Can you find the folder that holds the tests for RandomForestClassifier in the VSCode file explorer?
  • Can you find the test folder and list its content using the ls command?
  • Locate the file test_forest.py in that folder.

Again use the pwd command if you are lost.

To run the test

$ pytest --verbose ./<path_to_test_folder>/test_forest.py

Edit the source code of RandomForestClassifier to change the predict method always return 0. Rerun the test: what do you observe?

alternatively you can use the vlx flags as follows:

$ pytest -vlx ./<path_to_test_folder>/test_forest.py

Find out about the meaning of those flags with

$ pytest --help

See also the section Useful pytest aliases and flags in the scikit-learn documentation.

Write a new test of your own

We will now do an exercise to add a new test function named test_rf_regressor_prediction_range in the test_forest.py file.

The goal of this new test will be to check that a RandomForestRegressor never predicts numerical values outside of the range of values observed in the training set.

The test will use a training set generated at random by sampling from that Normal distribution.

For a test we can use a small dataset, for instance 100 samples and 10 features.

Here is a code template to get started:

def test_rf_regressor_prediction_range():
    # Create a Random Number Generator with 42 as a fixed seed.
    rng = np.random.RandomState(42)

    # TODO: define the n_samples and n_features variables
    X_train = rng.normal(size=(n_samples, n_features))
    y_train = rng.normal(size=n_samples)

    rfr = RandomForestRegressor(random_state=42)
    rfr.fit(X_train, y_train)

    # TODO: np.min() and np.max() to measure the minimum and
    # maximum values of the training set target values y_train.

    # TODO: generate some random data X_test with the same number
    # of features.

    # TODO: compute the model predictions on the test data:
    # rfr.predict(X_test) and store the results in a variable y_preds.

    # TODO: check that all the values in y_preds lie between the minimum
    # and maximum values of y_train.

After each TODO, check that your code works as expected by running only your test function with the following command:

$ pytest -vl -k test_rf_regressor_prediction_range \
     ./<path_to_test_folder>/test_forest.py

Bonus exercise: try to modify the code of scikit-learn to make your new test fail by making it predict very large or very small values.

Note: do not commit those modifications with git.

Automatically formatting the code with black

It is recommended to format your code with a specific version of black:

$ conda install black==22.1.0

The you can run:

$ black path/to/some_file.py

or to run it for a full source folder:

$ black sklearn/
$ black examples/

and so on.

It's also convenient to run black directly from with-in VS Code:

Ctrl+Shift+P and then "> Format Current Document"

and select the black formatter the first time your use this command.

Reverting your changes

Once the exercise is done, feel free to undo all your changes with:

git reset --hard

Testing docstrings

pytest allows to also test docstrings describing the estimator directly in the source code. Assuming the RandomForestClassifier docstring has been modified, the following command line is used to test the docstring compliancy:

$ pytest --doctest-modules sklearn/ensemble/_forest.py -k RandomForestClassifier