Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a random seed command line argument #36

Open
dwysocki opened this issue Aug 28, 2014 · 0 comments
Open

Add a random seed command line argument #36

dwysocki opened this issue Aug 28, 2014 · 0 comments

Comments

@dwysocki
Copy link
Member

Currently we have no control over the random seed used by scikit-learn. We don't even know how it works. We need to add a command line parameter to set the seed yourself, or by default use numpy.random.randomint(min_seed_value, max_seed_value).

The only problem I can see with this is the use of pmap. When we introduce parallelism, we no longer guarantee that the stars will be processed in any particular order. With the way pmap currently works, it would have to have processes=1 in order to produce the exact same results every time. Although the same set of random numbers is used, they are mapped to different stars depending on race conditions, and therefore may produce different results for the same seed. A solution would be to impose a strict mapping of random numbers to stars. We could somehow associate random numbers with stars in a repeatable way. This way, it will be possible to publish a set of results along with the seed used, and others should be able to reproduce the exact same results, guaranteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants