Skip to content

Improve grid generation for #207#426

Open
thijssnelleman wants to merge 13 commits intomainfrom
improve-grid-generation
Open

Improve grid generation for #207#426
thijssnelleman wants to merge 13 commits intomainfrom
improve-grid-generation

Conversation

@thijssnelleman
Copy link
Collaborator

@thijssnelleman thijssnelleman commented Feb 20, 2026

Based on the description of #207, I have revamped grid generator to work completely with generators instead of in memory grids.

  1. Each hyperparameter gets a constructed tuple or generator (Cannot reduce categorical/ordinal/constant in memory space, their impact should be very limited)
  2. Generators are fed into a itertools.product to do a cartesian product
  3. Conditional parameters get added a "None" value to their possible values, to represent inactivity, which are filtered out when constructing the configuration from the sample
  4. Samples are tested and discarded when invalid.

Biggest issue: I did a timing test based on the pytests, measuring simply with time.time.

The first test yielded the results;
New: 0.008675813674926758 (Slight improvement)
Old: 0.00872802734375

The second, with some conditionals:
New: 0.017199277877807617 (Order of magnitude slower)
Old: 0.008735895156860352

This could be terrible for large search spaces. One thing I (over) simplified was to simply generate the entire grid by the generator and discarding the invalid configurations (i.e. no 'smart' checking for active/inactive parameters). This means a lot of invalid configurations are generated in the second case: 558 / 576 are invalid (96.9%).

I hoped that the check would be fast enough to mitigate this slow down, but perhaps this is not the case.

@thijssnelleman
Copy link
Collaborator Author

The speed issue has now been fixed; conditional parameters are only searched when relevant. Speed test results between the original method and this PR:

Simple test (no conditions):
New Time: 0.008033990859985352 (slight improvement)
Old time: 0.008356094360351562

Large test (multiple conditionals)
New Time: 0.0005440711975097656 (Order of magnitude faster than old method,
Old time: 0.001583099365234375

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR revamps ConfigSpace’s grid generation to produce configurations lazily via a generator, aligning with #207’s goal of avoiding in-memory grid materialization and delegating value generation to hyperparameter logic.

Changes:

  • Replace the previous in-memory generate_grid implementation with a generator-based grid_generator.
  • Update unit tests to consume the generator and adjust assertions for the new generation order/behavior.
  • Update documentation references and fix a minor typo in Configuration.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
src/ConfigSpace/util.py Introduces grid_generator and modifies related logic for lazy grid construction and conditional handling.
test/test_util.py Updates grid generation tests to use grid_generator and adjusts expectations.
src/ConfigSpace/configuration_space.py Updates docstring reference from generate_grid() to grid_generator().
src/ConfigSpace/configuration.py Fixes a spelling typo in a comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +573 to +580
@@ -571,14 +574,19 @@ def check_configuration( # noqa: D103
space: ConfigurationSpace,
vector: np.ndarray,
allow_inactive_with_values: bool = False,
#yield_all_unset_active_hyperparameters: bool = False,
) -> None:
activated = np.isfinite(vector)
#unset_active_hps: list[Hyperparameter] = []
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s a lot of commented-out/experimental code left in check_configuration (commented parameter and branches). This makes the function harder to read/maintain; please remove the dead code or implement the feature behind a real, documented flag.

Copilot uses AI. Check for mistakes.
# cat1 2
# const1 1
# float1 11
# int1 7
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment block listing HP cardinalities says # int1 7, but num_steps_dict sets int1 to 6 and the test later asserts 2 * 1 * 11 * 6 * 3 configurations. This looks like a misleading/outdated comment and should be corrected to avoid confusion when debugging failures.

Suggested change
# int1 7
# int1 6

Copilot uses AI. Check for mistakes.
Comment on lines 549 to 550
# For the case of no hyperparameters, in get_cartesian_product, itertools.product() returns
# a single empty tuple element which leads to a single empty Configuration.
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about itertools.product() returning a single empty tuple (and thus producing an empty Configuration) is outdated relative to the current behavior: grid_generator now returns an empty list for an empty ConfigurationSpace and the test asserts len(...) == 0. Please update/remove this comment to reflect the current implementation.

Suggested change
# For the case of no hyperparameters, in get_cartesian_product, itertools.product() returns
# a single empty tuple element which leads to a single empty Configuration.
# For the case of no hyperparameters, grid_generator returns an empty list
# because there are no hyperparameters to form configurations from.

Copilot uses AI. Check for mistakes.
Comment on lines 609 to 615
last_config = generated_grid[-1]
for expected_value in last_config.values():
generated_value = expected_value
if isinstance(generated_value, float):
assert generated_value == approx(expected_value)
else:
assert generated_value == expected_value
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop doesn’t validate anything: it iterates over last_config.values() and then compares each value to itself (generated_value = expected_value). If the intent is to validate the last configuration, iterate over last_expected_dict and compare last_config[name] to the expected value (with approx for floats).

Copilot uses AI. Check for mistakes.

while len(unchecked_grid_pts) > 0:
for configuration in _cartesian_product_generator(regular_hyperparameters):
configuration_dict = {key: value for key, value in zip(hyperparameter_names, configuration)}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration_dict is built by zipping hyperparameter_names (all hyperparameters) with configuration (values only for regular_hyperparameters). If any conditional HP appears before a regular HP in insertion order, this will assign values to the wrong names and can produce invalid/misaligned configurations. Build the dict using the same hyperparameter list you used to generate configuration (e.g., zip [hp.name for hp in regular_hyperparameters] with configuration).

Suggested change
configuration_dict = {key: value for key, value in zip(hyperparameter_names, configuration)}
configuration_dict = {
hp.name: value
for hp, value in zip(regular_hyperparameters, configuration)
}

Copilot uses AI. Check for mistakes.
thijssnelleman and others added 3 commits March 16, 2026 13:15
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants