-
Notifications
You must be signed in to change notification settings - Fork 10
Description
TL;DR:在实现FLAME的聚类时包含了BN统计量
Hi there,
First of all, thank you very much for sharing this implementation of the FLAME defense! It has been very helpful for studying robust aggregation in Federated Learning.
While I was reading through the code in defense.py, I noticed a small detail in the parameters_dict_to_vector_flt function that I wanted to discuss with you. Currently, the function converts the state_dict into a flat vector for HDBSCAN clustering by only excluding num_batches_tracked
def parameters_dict_to_vector_flt(net_dict) -> torch.Tensor:
vec = []
for key, param in net_dict.items():
# print(key, torch.max(param))
if key.split('.')[-1] == 'num_batches_tracked':
continue
vec.append(param.view(-1))
return torch.cat(vec)
It seems that this approach includes Batch Normalization (BN) buffers, such as running_mean and running_var, in the clustering vector.
In the FLAME paper and standard practices, clustering usually focuses on the "direction of learning" (weights and biases).
Section 3 (Backdoor Characterization, Page 4): The authors state:
“...we represent Neural Networks (NNs) using their weight vectors, in which the extraction of weights is done identically for all models by flattening/serializing the weight/bias matrices in a predetermined order.”
Footnote 1 (Page 2): To clarify what they mean by "weights," they specify:
“Parameters of neural network models typically consist of 'weights' and 'biases'. For the purposes of this paper, however, these parameters can be treated identically and we will refer to them as 'weights' for brevity.”
In Non-IID scenarios, the BN statistics can vary significantly between benign clients because they reflect local data distributions rather than adversarial behavior. Including them might lead HDBSCAN to incorrectly flag benign clients as outliers (False Positives), which could potentially degrade the Main Task Accuracy (MA).
Would you consider filtering the input to only include trainable parameters (those with requires_grad=True) or explicitly excluding BN buffers? Based on my understanding of the original paper, this might make the defense even more robust against data heterogeneity.
I might have misunderstood some parts of your design, so I would love to hear your thoughts on this!
Thank you again for your contribution to the community!