Replies: 3 comments 5 replies
-
Thanks for starting this discussion, @tanwarsh! Federated Analytics is a promising field, but it presents a new set of challenges, esp. regarding the wide spectrum of possible analytics queries. I have a couple of questions from my first read-through of this proposal:
|
Beta Was this translation helpful? Give feedback.
-
Regarding supplementing FA with SecureAggregation, you should take into account that SecAgg mainly works for cases where the local query results are summed arithmetically (as in FedAvg). It will likely not be applicable in other cases where other types of query aggregation is used, such as concatenation, finding maximum or minimum, or any non-linear operation. I would therefore suggest that you add a section in the design where you illustrate the types of federated queries that will be supported, and more importantly - how they will be aggregated. |
Beta Was this translation helpful? Give feedback.
-
I would like to raise an additional question/suggestion. In the current proposal, the query is defined as part of the FL plan. This entails that for each new query, the FL plan would have to be re-distributed among re-approved the participants. For a more streamlined experience, could we consider an approach where the query can be passed as an aggregator CLI parameter? This could follow a similar pattern to how we can currently switch the aggregator's mode between fx aggregator start --task_group evaluation In the case of FA, the query could be entirely removed from the FL plan, and passed via command line. For example: fx aggregator start --task_group analytics --query query.json # ... or directly inline Now, although convenient, this means that the aggregator would be able to run pretty much any query without preliminary approval by the collaborators. These may need to be additionally restricted. @ishaileshpant , what would be your thoughts and suggestions in this regard? |
Beta Was this translation helpful? Give feedback.
-
Summary
Federated Analytics is a data analysis approach where a querier answers a query through collaboration with multiple data owners (clients) who retain their local raw data. Instead of exchanging raw data, intermediate query responses are transferred and aggregated by the querier to answer the query. This documentation provides a high-level design for implementing Federated Analytics using OpenFL
Motivation
FA is a distributed approach to data analysis that enables multiple parties to collaboratively analyze data without sharing the actual data. This approach is particularly valuable in scenarios where data privacy and security are critical, such as in healthcare, finance, and other sensitive fields. Key points about FA include:
High-Level Design
Scope
Technical Details
TaskRunner API Workspace
data:image/s3,"s3://crabby-images/fc311/fc3115b364300d77ce9d1ab2ad6f87a71da7e0e8" alt="FA_workspace (1)"
LocalTensor(col_name='collaborator1_sepal length (cm)', tensor=array([2.2 , 2.42, 2.64, 2.86, 3.08, 3.3 , 3.52, 3.74, 3.96, 4.18, 4.4 ], dtype=float32))
Dataset
Calculate histogram (an estimate of the probability distribution of a continuous variable) on specific columns of Iris dataset.
Types of Supported Queries
1.Statistical queries
2.Set-based queries
3.Matrix transformation queries
Documentation
Open Questions
Phase 2
Beta Was this translation helpful? Give feedback.
All reactions