Implementations of the parallel and sequential cube sampling algorithms presented in the paper "A Scalable Parallel Algorithm for Balanced Sampling" (Alexander Lee, Stefan Walzer-Goldfeld, Shukry Zablah, Matteo Riondato, AAAI'22 Student Abstract).
The student abstract and supplement are linked below:
To use our various algorithms, do the following:
In order to create the virtual environment, install the dependencies, and activate it, use the following commands:
python3 -m venv venv
python3 -m pip install -r cube/requirements.txt
source venv/bin/activate
The virtual environment must be activated in order to run any of the algorithms.
import numpy as np
import cube
if __name__ == "__main__":
pop_size, num_aux_var = 100, 10
data = np.random.random((pop_size, num_aux_var))
init_probs = np.random.random(pop_size)
sample = cube.sample_cube(data, init_probs)
sample_cube()
also has the following optional arguments:
is_pop_size_fixed
:True
to include an auxiliary variable that fixes the population size when samplingis_sample_size_fixed
:True
to include an auxiliary variable that fixes the sample sizeseed
: any integer for replicating samplesuse_internal_timing
:True
to return timings of various stages of the algorithm in addition to the sample
import numpy as np
import cube
if __name__ == "__main__":
pop_size, num_aux_var = 100, 10
num_proc = 4
data = np.random.random((pop_size, num_aux_var))
init_probs = np.random.random(pop_size)
sample = cube.sample_cube_parallel(data, init_probs, num_proc)
sample_cube_parallel()
also has the following optional arguments:
num_strata
: Any integer to specify the number of strata - has a default value ofnum_proc
is_pop_size_fixed
:True
to include an auxiliary variable that fixes the population size when samplingis_sample_size_fixed
:True
to include an auxiliary variable that fixes the sample sizeseed
: any integer for replicating samplesuse_internal_timing
:True
to return timings of various stages of the algorithm in addition to the sample
import numpy as np
import cube
if __name__ == "__main__":
pop_size, num_aux_var = 100, 10
num_strata = 7
data = np.random.random((pop_size, num_aux_var))
init_probs = np.random.random(pop_size)
strata = np.array([i % num_strata for i in range(pop_size)])
sample = cube.stratified_sample_cube(data, init_probs, strata)
stratified_sample_cube()
also has the following optional arguments:
is_pop_size_fixed
:True
to include an auxiliary variable that fixes the population size when samplingseed
: any integer for replicating samples
import numpy as np
import cube
if __name__ == "__main__":
pop_size, num_aux_var = 100, 10
num_strata = 7
num_proc = 4
data = np.random.random((pop_size, num_aux_var))
init_probs = np.random.random(pop_size)
strata = np.array([i % num_strata for i in range(pop_size)])
sample = cube.parallel_stratified_sample_cube(data, init_probs, strata, num_proc)
parallel_stratified_sample_cube()
also has the following optional arguments:
is_pop_size_fixed
:True
to include an auxiliary variable that fixes the population size when samplingseed
: any integer for replicating samples
Our test suite can run using the following command (the virtual environment must be activated):
python3 src/test_cube.py