You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm reading the code and there is a detail I couldn't understand.
When selecting data from each cluster, the corresponding code is
size= (len(dataset) *portion) /K/roundexp_reward_diff=merged_df["exp_reward_diff"]
# random select from K clusters, with p as the weight.select_new_iter=np.random.choice(
K, size=int(size), p=exp_reward_diff, replace=True
)
# Count how many times a cluster is chosenselected_clusters_size=Counter(select_new_iter)
remaining_dataset=dataset.select(set(range(len(dataset))) -set(selected_indices))
remaining_dataset_df=remaining_dataset.to_pandas()
new_indices= []
foriinrange(K):
# get current indices in the remaining datasetindices=remaining_dataset_df[remaining_dataset_df["cluster"] ==i]["index"]
# adjust size if the selected size exceeds the remaining sizesize=min(selected_clusters_size[i], len(indices))
# pick real samples from each clusterindices=np.random.choice(indices, size=size, replace=False)
new_indices.extend(indices)
new_indices=np.array(new_indices)
# update the selected samplesnew_indices=np.concatenate([selected_indices, new_indices])
If I understand correctly, in this iteration, the chosen size is (len(dataset) * portion) / K / round, then the code selects from clusters with weight, and Conuter is used to count how many samples are chosen from a cluster, the subsequent for loop is used to choose samples from K clusters. This results in (len(dataset) * portion) / K / round samples in total. But in the paper, the size for each iteration should be $b_{it} = \frac{b}{N}$, so I guess no division by K?
It would be great if you can help me understand this detail.
The text was updated successfully, but these errors were encountered:
Hi, thanks for the interesting work!
I'm reading the code and there is a detail I couldn't understand.
When selecting data from each cluster, the corresponding code is
If I understand correctly, in this iteration, the chosen size is$b_{it} = \frac{b}{N}$ , so I guess no division by K?
(len(dataset) * portion) / K / round
, then the code selects from clusters with weight, andConuter
is used to count how many samples are chosen from a cluster, the subsequent for loop is used to choose samples from K clusters. This results in(len(dataset) * portion) / K / round
samples in total. But in the paper, the size for each iteration should beIt would be great if you can help me understand this detail.
The text was updated successfully, but these errors were encountered: