Add a random seed command line argument #36

dwysocki · 2014-08-28T02:05:26Z

Currently we have no control over the random seed used by scikit-learn. We don't even know how it works. We need to add a command line parameter to set the seed yourself, or by default use numpy.random.randomint(min_seed_value, max_seed_value).

The only problem I can see with this is the use of pmap. When we introduce parallelism, we no longer guarantee that the stars will be processed in any particular order. With the way pmap currently works, it would have to have processes=1 in order to produce the exact same results every time. Although the same set of random numbers is used, they are mapped to different stars depending on race conditions, and therefore may produce different results for the same seed. A solution would be to impose a strict mapping of random numbers to stars. We could somehow associate random numbers with stars in a repeatable way. This way, it will be possible to publish a set of results along with the seed used, and others should be able to reproduce the exact same results, guaranteed.

The text was updated successfully, but these errors were encountered:

dwysocki added enhancement labels Aug 28, 2014

dwysocki added this to the Future goals milestone Aug 28, 2014

earlbellinger modified the milestones: Future goals, Engineering goals Sep 22, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a random seed command line argument #36

Add a random seed command line argument #36

dwysocki commented Aug 28, 2014

Add a random seed command line argument #36

Add a random seed command line argument #36

Comments

dwysocki commented Aug 28, 2014