Conversation
|
The speed issue has now been fixed; conditional parameters are only searched when relevant. Speed test results between the original method and this PR: Simple test (no conditions): Large test (multiple conditionals) |
There was a problem hiding this comment.
Pull request overview
This PR revamps ConfigSpace’s grid generation to produce configurations lazily via a generator, aligning with #207’s goal of avoiding in-memory grid materialization and delegating value generation to hyperparameter logic.
Changes:
- Replace the previous in-memory
generate_gridimplementation with a generator-basedgrid_generator. - Update unit tests to consume the generator and adjust assertions for the new generation order/behavior.
- Update documentation references and fix a minor typo in
Configuration.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
src/ConfigSpace/util.py |
Introduces grid_generator and modifies related logic for lazy grid construction and conditional handling. |
test/test_util.py |
Updates grid generation tests to use grid_generator and adjusts expectations. |
src/ConfigSpace/configuration_space.py |
Updates docstring reference from generate_grid() to grid_generator(). |
src/ConfigSpace/configuration.py |
Fixes a spelling typo in a comment. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
src/ConfigSpace/util.py
Outdated
| @@ -571,14 +574,19 @@ def check_configuration( # noqa: D103 | |||
| space: ConfigurationSpace, | |||
| vector: np.ndarray, | |||
| allow_inactive_with_values: bool = False, | |||
| #yield_all_unset_active_hyperparameters: bool = False, | |||
| ) -> None: | |||
| activated = np.isfinite(vector) | |||
| #unset_active_hps: list[Hyperparameter] = [] | |||
There was a problem hiding this comment.
There’s a lot of commented-out/experimental code left in check_configuration (commented parameter and branches). This makes the function harder to read/maintain; please remove the dead code or implement the feature behind a real, documented flag.
| # cat1 2 | ||
| # const1 1 | ||
| # float1 11 | ||
| # int1 7 |
There was a problem hiding this comment.
The comment block listing HP cardinalities says # int1 7, but num_steps_dict sets int1 to 6 and the test later asserts 2 * 1 * 11 * 6 * 3 configurations. This looks like a misleading/outdated comment and should be corrected to avoid confusion when debugging failures.
| # int1 7 | |
| # int1 6 |
| # For the case of no hyperparameters, in get_cartesian_product, itertools.product() returns | ||
| # a single empty tuple element which leads to a single empty Configuration. |
There was a problem hiding this comment.
The comment about itertools.product() returning a single empty tuple (and thus producing an empty Configuration) is outdated relative to the current behavior: grid_generator now returns an empty list for an empty ConfigurationSpace and the test asserts len(...) == 0. Please update/remove this comment to reflect the current implementation.
| # For the case of no hyperparameters, in get_cartesian_product, itertools.product() returns | |
| # a single empty tuple element which leads to a single empty Configuration. | |
| # For the case of no hyperparameters, grid_generator returns an empty list | |
| # because there are no hyperparameters to form configurations from. |
| last_config = generated_grid[-1] | ||
| for expected_value in last_config.values(): | ||
| generated_value = expected_value | ||
| if isinstance(generated_value, float): | ||
| assert generated_value == approx(expected_value) | ||
| else: | ||
| assert generated_value == expected_value |
There was a problem hiding this comment.
This loop doesn’t validate anything: it iterates over last_config.values() and then compares each value to itself (generated_value = expected_value). If the intent is to validate the last configuration, iterate over last_expected_dict and compare last_config[name] to the expected value (with approx for floats).
|
|
||
| while len(unchecked_grid_pts) > 0: | ||
| for configuration in _cartesian_product_generator(regular_hyperparameters): | ||
| configuration_dict = {key: value for key, value in zip(hyperparameter_names, configuration)} |
There was a problem hiding this comment.
configuration_dict is built by zipping hyperparameter_names (all hyperparameters) with configuration (values only for regular_hyperparameters). If any conditional HP appears before a regular HP in insertion order, this will assign values to the wrong names and can produce invalid/misaligned configurations. Build the dict using the same hyperparameter list you used to generate configuration (e.g., zip [hp.name for hp in regular_hyperparameters] with configuration).
| configuration_dict = {key: value for key, value in zip(hyperparameter_names, configuration)} | |
| configuration_dict = { | |
| hp.name: value | |
| for hp, value in zip(regular_hyperparameters, configuration) | |
| } |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Based on the description of #207, I have revamped grid generator to work completely with generators instead of in memory grids.
Biggest issue: I did a timing test based on the pytests, measuring simply with time.time.
The first test yielded the results;
New: 0.008675813674926758 (Slight improvement)
Old: 0.00872802734375
The second, with some conditionals:
New: 0.017199277877807617 (Order of magnitude slower)
Old: 0.008735895156860352
This could be terrible for large search spaces. One thing I (over) simplified was to simply generate the entire grid by the generator and discarding the invalid configurations (i.e. no 'smart' checking for active/inactive parameters). This means a lot of invalid configurations are generated in the second case: 558 / 576 are invalid (96.9%).
I hoped that the check would be fast enough to mitigate this slow down, but perhaps this is not the case.