Changed datasets/ad_hoc.py to work for n>3 #896

RishiNandha · 2025-03-01T00:44:05Z

Summary

This is related to #330

I have rewritten the ad_hoc dataset to work for n>3. I have also made some of the operations more efficient.

I ran "make lint" and "make style" as suggested in the contributing guidelines, but I got an error I was not able to resolve. This is my first contribution, if there are improvements to be made in the PR, please let me know.

I'm also working on bringing more dataset generators to the repository, so I hope ad_hoc.py can still be kept around as the simpler one in the family not that it's been rewritten

Details and comments

The tensor products such as H^{(x)n} were written in n step loops, I've made them logn step loops. And similar other small optimizations such as precomputing Zi*Zj.
The old implementation did an n-dimensional grid for generating random phase vectors. While this brought uniformity with small number of qubits, it was too much computational overhead for larger numbers. I've have swapped this out for 2 different options of stratified sampling
The old implementation had labelling done with parity() and majority() in n=2 and n=3 respectively. I've used an n-qubit Z matrix directly, which aligns better with the reference paper attached in the documentation and also relieves the constraint an n being 2 or 3
Vectorized the labelling loop. This has given some performance gain

CLAassistant · 2025-03-01T00:44:11Z

All committers have signed the CLA.

qiskit_machine_learning/datasets/ad_hoc.py

woodsp-ibm · 2025-03-03T20:27:49Z

qiskit_machine_learning/datasets/ad_hoc.py

+    train_size: int,
    test_size: int,
    n: int,
    gap: int,
+    divisions: int = 0,
    plot_data: bool = False,
-    one_hot: bool = True,
+    one_hot: bool = False,


In terms of the method signature any changes should be done in a backwards compatible way, i.e. if you already have code that uses this it should still work. To that end the change in name of training_size is problematic, as is the change in the one_hot default - they should really stay as they are. We do/can change things so they would break existing code but such changes are normally done via a deprecation. Having said that I have no idea if the data set produced by this new method, for values that were supported before, ends up the same or not with this new algorithm. Then the additional parameter, divisions, to maintain compatibility that should go at the end.

I guess another way to do this would be to keep the original method there as-is and add a complete new method, not quite sure of name, maybe ad_hoc_dataset, ad_hoc_data_sobol or something, that can be used instead. The current method could then be deprecated and removed later leaving only the new one. The default values and order of the parameters in a new method would be a free choice not constrained by compatibility.

As you can see from CI there are failures due to these changes being incompatible with existing code - in this case unit tests we have here for example where they should still work/pass unchanged,

Okay, Ill fix the parameters to be backward compatible. Quick fix

About if it'll produce the same values as before. It will match in shape and format, but for the same seed it won't be the same values. Because the sampling method is different

So shall I fix the default parameters and order of arguments to support backward compatibility and edit the unit tests to assert the new expected values?

About the depreciation of ad_hoc.py and the renaming of this new generator: I think we can do whatever you'd and the other reviewers would suggest.

If we make this a separate dataset and won't have constraints to work backwards with ad_hoc codes, I can also further explore including hyperparameters such as:

Linear vs Circular vs Full Entanglements

Labelling based on Expected Value vs Labelling based on Measurement

Reasons being:

When we say "Separable by ZZFeaturemap" the most readily available example of ZZFeaturemap is linear entangled while ad_hoc by default is supporting only full entanglement. So giving that option might make the inner workings of this generator more transparent

I'm thinking measurement would make it harder for a classical ML method to work well with while QML or VQA methods won't have a problem. If that is the case, we can demonstrate that Quantum Methods have an advantage in performing better in this dataset. But I'm not sure, I'm very new to this field of study. But I think including this as an option with one of hyperparameters can leave the choice up to the user

I made a commit with the parameter order and default values fixed. Like speculated by you, the unit tests still continue to fail because the arrays generated by the new sampling technique are numerically different from the old. So I wanted to ask what you would suggest.

Shall we proceed with modifying the unit tests? Or shall we proceed with making this an independent dataset generator?

@edoaltamura Do you guys have a take on this as the prime maintainers now? I chipped in based on what I saw and for me I think a new method with this functionality, since it ends up different than the existing one, for the compatible inputs overlap, would be "better". Existing users code would not suddenly end up having a different outcome due to it giving back different data than before - such as we see here with the unit test failures. If we wanted the existing method could be deprecated and removed later giving users a chance to switch over to the new method and sort out whatever they need on their end in using it and the outcome. Also the author has suggested they can look at further hyperparameters more flexibly this way without being constrained.

Thanks for your input @woodsp-ibm! I'll add my comments for @RishiNandha in a separate review in a moment.

TLDR; I agree with @woodsp-ibm: both methods I think should be kept. But we could structure it nicely to make it obvious when to use Numpy random choice and when Sobol. See the other review.

Co-authored-by: Steve Wood <40241007+woodsp-ibm@users.noreply.github.com>

edoaltamura

Finally, you can add a release note (reno), as in the contributing guide, describing the extension of the dataset to higher n. This description will then be included in the v.0.8.3 release notes.

qiskit_machine_learning/datasets/ad_hoc.py

edoaltamura · 2025-03-05T13:53:05Z

qiskit_machine_learning/datasets/ad_hoc.py

+    train_size: int,
    test_size: int,
    n: int,
    gap: int,
+    divisions: int = 0,
    plot_data: bool = False,
-    one_hot: bool = True,
+    one_hot: bool = False,


TLDR; I agree with @woodsp-ibm: both methods I think should be kept. But we could structure it nicely to make it obvious when to use Numpy random choice and when Sobol. See the other review.

qiskit_machine_learning/datasets/ad_hoc.py

RishiNandha · 2025-03-13T21:18:40Z

I've made the edits to include both the old and new sampling methods. I've also included measurement based labelling and linear/circular/full entanglement as options too. I'm yet to add the whitelists to pylint, I'll do it once the unittests are fixed too.

The unittests will still fail because there is a unit test that checked if giving an n>3 raises a value error. @rogue-infinity is collaborating with me on the unittests. I think once that is done, it'll be good to go

Unit tests for the modified Ad_hoc dataset

…learning

This reverts commit eb069cc.

RishiNandha added 3 commits March 1, 2025 05:26

Refactored ad_hoc.py for n>3

7218ef5

PEP8 Style Fixes

9f95761

Update ad_hoc.py

c395e05

RishiNandha requested review from woodsp-ibm, adekusar-drl, smens, edoaltamura, oscar-wallis, OkuyanBoga and Benjamin-Symons as code owners March 1, 2025 00:44

RishiNandha and others added 2 commits March 1, 2025 06:22

Typographic Fixes

2ff7a7a

Merge branch 'main' into main

9cff7d9

woodsp-ibm reviewed Mar 3, 2025

View reviewed changes

edoaltamura and others added 2 commits March 4, 2025 00:41

Update qiskit_machine_learning/datasets/ad_hoc.py

61817af

Co-authored-by: Steve Wood <40241007+woodsp-ibm@users.noreply.github.com>

Fixed Parameters Backward Compatibility

8ae6a93

edoaltamura requested changes Mar 5, 2025

View reviewed changes

Merged Old and New methods

2c2d2fd

RishiNandha and others added 5 commits March 15, 2025 21:12

Bug fixes in _measure()

e3adac3

Add files via upload

3f21e8d

Unit tests for the modified Ad_hoc dataset

Added φ_i, Δ and φ_ij to pylint whitelist

5e36818

Merge branch 'main' of https://github.com/RishiNandha/qiskit-machine-…

183d964

…learning

Fixed make black, lint, html and spell. Added reno

e78829c

RishiNandha requested a review from edoaltamura March 15, 2025 17:08

RishiNandha added 5 commits March 15, 2025 22:43

Updated Nisanth's progress too

eb069cc

Revert "Updated Nisanth's progress too"

e16501c

This reverts commit eb069cc.

Fixed reno

cbe7f78

QSVR tests

26537e0

Made black

68eb826

RishiNandha closed this Mar 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changed datasets/ad_hoc.py to work for n>3 #896

Changed datasets/ad_hoc.py to work for n>3 #896

Uh oh!

RishiNandha commented Mar 1, 2025

Uh oh!

CLAassistant commented Mar 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

woodsp-ibm Mar 3, 2025

Uh oh!

woodsp-ibm Mar 3, 2025

Uh oh!

RishiNandha Mar 4, 2025

Uh oh!

RishiNandha Mar 4, 2025

Uh oh!

RishiNandha Mar 4, 2025

Uh oh!

woodsp-ibm Mar 4, 2025

Uh oh!

edoaltamura Mar 5, 2025

Uh oh!

edoaltamura Mar 5, 2025

Uh oh!

edoaltamura left a comment

Uh oh!

Uh oh!

edoaltamura Mar 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RishiNandha commented Mar 13, 2025

Uh oh!

Uh oh!

Changed datasets/ad_hoc.py to work for n>3 #896

Changed datasets/ad_hoc.py to work for n>3 #896

Uh oh!

Conversation

RishiNandha commented Mar 1, 2025

Summary

Details and comments

Uh oh!

CLAassistant commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edoaltamura left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RishiNandha commented Mar 13, 2025

Uh oh!

Uh oh!

CLAassistant commented Mar 1, 2025 •

edited

Loading