You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- The first part of Episode 4 was moved (and further refined) to this one (as an introduction to the UI)
- The ER model as presented could be way too complex to be 'conceptualized' by the participants. I created a simplified version focused on what is key for this episode, including other key concepts that are important despite not being part of the v6 data model (algorithm and algorithm store)
- The steps were moved to the solution (with exception of the last challenge), and the questions where reworded to make the exercise more challenging as suggested.
Copy file name to clipboardExpand all lines: episodes/chapter3.md
+60-28Lines changed: 60 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ exercises: 3
14
14
15
15
::::::::::::::::::::::::::::::::::::: objectives
16
16
17
-
- Explore specific data analysis scenarios that further illustrates the concept of collaboration
17
+
- Explore specific data analysis scenarios that further illustrate the concept of collaboration
18
18
- Understand the concept of 'algorithm trustworthiness' in the context of a vantage6 collaboration
19
19
- Understand v6's algorithm-store current and envisioned features
20
20
- Understand the UI-based approach for performing a data analysis through the given scenarios
@@ -26,7 +26,7 @@ exercises: 3
26
26
27
27
To navigate vantage6's UI seamlessly, it's essential to grasp the platform's fundamental concepts and their interconnections, as the UI design reflects these relationships. The following is a simplified model of vantage6 concepts, where a `1-n` relationship means that the entity on the left side of the relationship is related to one or more entities on the right side. For instance, a **collaboration** involves one or more **nodes**, but each **node** can only be linked to exactly one **collaboration**. An `n-n` relationship is a many-to-many relationship: for instance, a **collaboration** can involve multiple **organizations**, and at the same time, each **organization** can participate in multiple **collaborations**.
28
28
29
-

Given the above, the following are the most important concepts to be considered for this episode:
32
32
@@ -50,19 +50,19 @@ Given the above, the following are the most important concepts to be considered
50
50
-**Result**: the output generated by the execution of an **algorithm** as part of a **task**.
51
51
-**Algorithm**: computational models or processes that are executed on data. Compatible algorithms are those that adhere to the Vantage6 framework, enabling them to be securely distributed to **nodes** for execution.
52
52
53
-
### Where are the concepts in the UI?
53
+
### Where are these concepts in the UI?
54
54
55
55
After logging in to the vantage6 UI, you will see the start page.
The start page also contains a button `Administration` in the top right corner. Clicking on this button will redirect you to the administration page.
64
64
65
-
In the administration page, you can manage the entities of vantage6. The entities are divided into tabs: `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes`. You can click on an entity to see more details or to edit the entity. We will get back to this later in more detail.
65
+
On the administration page, you can manage the entities of vantage6. The entities are divided into tabs: `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes`. You can click on an entity to see more details or to edit the entity. We will get back to this later in more detail.
@@ -80,9 +80,9 @@ Can you find the `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes
80
80
81
81
82
82
83
-
## A hypothetical case study using vantage6 collaborations
83
+
## From theory to practice: a hypothetical case study using vantage6 collaborations
84
84
85
-
In the context of vantage6, a collaboration refers to an agreement between two or more parties to participate in a study or to answer a research question together. This concept is central to the Privacy Enhancing Technologies (PETs) that vantage6 supports. Each party involved in a collaboration remains autonomous, meaning they retain control over their data and can decide how much of their data to contribute to the collaboration's global model and which algorithms are allowed for execution.
85
+
As previously discussed, in vantage6 a collaboration refers to an agreement between two or more parties to participate in a study or to answer a research question together. This concept is central to the Privacy Enhancing Technologies (PETs) that vantage6 supports. Each party involved in a collaboration remains autonomous, meaning they retain control over their data and can decide how much of their data to contribute to the collaboration's global model and which algorithms are allowed for execution.
86
86
87
87
To illustrate this, let's analyze a hypothetical scenario: two international research projects relying on vantage6 technology on the same server:
88
88
@@ -96,32 +96,37 @@ Following vantage6's concepts, this scenario would involve two collaborations, o
## Algorithms trustworthiness on a federated setting
99
+
###Algorithms trustworthiness in a federated setting
100
100
101
101
While a vantage6-supported research infrastructure like the one described above offers a strong defense against many data privacy risks, there remains one crucial security aspect that falls outside the platform's scope: the validation of the code that will run on this infrastructure. For instance, the administrators of the nodes running within each organization are responsible for defining which algorithms (i.e., [which container images](https://docs.vantage6.ai/en/main/node/configure.html#all-configuration-options)) will be allowed for execution on the respective collaborations. As this is a critical and complex task that entails activities like code analysis and verification, working with algorithms from trusted sources is the primary line of defense against potential threats.
102
102
103
103
Vantage6's algorithm store feature aims to enhance trustworthiness by offering a centralized platform for managing pre-registered algorithms. This serves as an alternative to using algorithms from unknown authors or those lacking transparency regarding their development process and status. The algorithm store currently allows researchers to explore which algorithms are available and how to run them. This, along with its integration with vantage6's UI, streamlines task execution requests within collaborations.
104
104
105
-
As of the time of writing this tutorial, efforts are underway to integrate additional information to the algorithms metadata such as creators and code reviewers. Moreover, plans are in place to incorporate the algorithm review process into the publication procedure for any algorithms in the store.
105
+
As of the time of writing this tutorial, efforts are underway to integrate additional information to the algorithm metadata such as creators and code reviewers. Moreover, plans are in place to incorporate the algorithm review process into the publication procedure for any algorithms in the store.
106
106
107
-
## Running a PET (privacy-enhancing technology) analysis without programming!
107
+
###Running a PET (privacy-enhancing technology) analysis without programming!
108
108
109
-
In this episode, you will perform a PET analysis on an existing vantage6 collaborations (based on 'dummy' nodes) that resemble the two described above. For reference, the datasets of each organization can be seen here (TODO).
109
+
In this episode, you will perform a PET analysis on an existing vantage6 collaboration (based on 'dummy' nodes) that resembles the two described above. For reference, the datasets of each organization can be seen here (TODO).
110
110
111
111
::::::::::::::::::::::::::::::::::::: challenge
112
112
113
-
## Challenge 1: understanding a simple federated algorithm
113
+
## Challenge 2: understanding a simple federated algorithm
114
114
115
-
First, let's take a look at one of the federated algorithms -available on the vantage6's community store- that will be used in this episode: [a federated average](https://github.com/IKNL/v6-average-py/blob/master/v6-average-py/__init__.py). Based on the code and its comments:
115
+
First, let's take a look at one of the federated algorithms -available on the vantage6's community store- that will be used in this episode: [a federated average](https://github.com/IKNL/v6-average-py/blob/master/v6-average-py/__init__.py).
116
116
117
-
- What is the difference between the 'central_average' and the 'partial_average' functions?
118
-
- What would happen if this analysis is started in a collaboration that has one of its nodes 'offline'?
117
+
Analyze the algorithm based on the code and its comments and answer the following questions:
118
+
119
+
- How are the `central_average` and `partial_average` functions related?
120
+
- Why does the `central_average` function, unlike `partial_average`, doesn't get any data as an input?
121
+
- Analyze and discuss the potential outcomes if a Task to execute `central_average` is initiated within a collaboration where one of the nodes is offline.
119
122
120
123
::::::::::::::::::::::::::::::::::::::::::::::::
121
124
125
+
126
+
122
127
::::::::::::::::::::::::::::::::::::: challenge
123
128
124
-
## Challenge 2: exploring the status of existing collaborations configured on a vantage6 server
129
+
## Challenge 3: exploring the status of existing collaborations configured on a vantage6 server
125
130
126
131
Below are the administrator credential of GHT and PhY24 collaborations (passwords will be given by the instructors).
127
132
@@ -130,7 +135,18 @@ Below are the administrator credential of GHT and PhY24 collaborations (password
130
135
| PhY24-admin | Collaboration Admin | PhY24 |
131
136
| GHT-admin | Collaboration Admin | GHT |
132
137
133
-
Check the status of the nodes of each collaboration:
138
+
Using these credentials check the status of both collaborations. Given this and your algorithm analysis from Challenge #2 answer the following:
139
+
140
+
1. Which collaborations are ready for creating a Task for the __federated average__ algorithm?
141
+
2. If one of the collaborations is not ready, which organization you would need to contact in order to make it ready for executing the algorithm too?
142
+
143
+
::::::::::::::::::::::::::::::::::::::::::::::::
144
+
145
+
:::::::::::::::::::::::: solution
146
+
147
+
## Solution steps
148
+
149
+
To check the status of the nodes of each collaboration:
134
150
135
151
1. Log in to each one with the given credentials
136
152
2. Click on 'Administration' on the top of the UI
@@ -139,8 +155,19 @@ Check the status of the nodes of each collaboration:
- Based on what you see on Challange #1, which collaboration would be ready to request the 'Average' algorithm on it?
143
-
- For the other collaboration, which organization you would need to reach in order to fix the issue?
158
+
:::::::::::::::::::::::::::::::::
159
+
160
+
::::::::::::::::::::::::::::::::::::: challenge
161
+
162
+
## Challenge 4: adding an algorithm store to an organization
163
+
164
+
In order to execute the __average algorithm__ on a given collaboration, considering the previous discussion on algorithm trustwortiness, you need to first register an algorithm store on it first. Use the credentials given for Challenge #4 to register the 'community store', which contains the said algorithm: `https://store.cotopaxi.vantage6.ai`
165
+
166
+
::::::::::::::::::::::::::::::::::::::::::::::::
167
+
168
+
:::::::::::::::::::::::: solution
169
+
170
+
## Solution steps
144
171
145
172
You will now link the 'community-store' to the collaboration whose nodes are ready for it.
146
173
@@ -151,19 +178,20 @@ You will now link the 'community-store' to the collaboration whose nodes are rea
151
178
5. Make sure the store is now shown on the collaboration details:
152
179

153
180
154
-
::::::::::::::::::::::::::::::::::::::::::::::::
181
+
:::::::::::::::::::::::::::::::::
155
182
156
183
::::::::::::::::::::::::::::::::::::: challenge
157
-
## Challenge 3: your first algorithm execution as a researcher
184
+
185
+
## Challenge 5: your first algorithm execution as a researcher
158
186
159
187
Now, you'll take on the role of the researcher within the collaboration for which you've just established the algorithm store. With this role, you will finally request the execution of the algorithm.
160
188
161
189
1. log in as a researcher using the corresponding credentials below:
162
190
163
-
| User | Roles | Collaboration |
164
-
|----|-----|-----|
165
-
|PhY24-rs1 | Researcher |PhY24 |
166
-
|GHT-rs1 | Researcher |GHT |
191
+
| User | Roles| Collaboration |
192
+
|-----------|---------------|------------------|
193
+
|PhY24-rs1 | Researcher |PhY24 |
194
+
|GHT-rs1 | Researcher |GHT|
167
195
168
196
2. Select the collaboration given on the front page, and select 'Tasks' from the panel on the left.
4. Now the UI will let you choose between the two functions you explored in Challenge #1. First, try to run the 'partial_average' on all the nodes individually.
203
+
4. Now the UI will let you choose between the two functions you explored in Challenge #1. First, try to run the `partial_average` on all the nodes individually.
176
204
177
205

178
206
@@ -182,8 +210,12 @@ Now, you'll take on the role of the researcher within the collaboration for whic
182
210

183
211
184
212
185
-
- Based on your understanding of the 'central_average' function, if you create one a new task, which organization nodes should you choose this time in order to actually calculate the overall (across all the datasets) average? Experiment with this and discuss the results with the instructors.
186
-
- What would happen if you select an alpha-numerical column (e.g., 'participant_pseudo_id')? Do this experiment and explore the generated error logs. Discuss with the instructors how these logs can be used to diagnose any task execution issues.
213
+
Based on these results, answer the following:
214
+
215
+
216
+
1. If you repeat the same exercise but with the `central_average` function (refer to Challenge #2 if needed), which organization nodes should you choose this time to actually calculate the overall (across all the datasets) average? Experiment with this and discuss the results with the instructors.
217
+
218
+
2. What would happen if you select an alpha-numerical column (e.g., 'participant_pseudo_id')? Do this experiment and explore the generated error logs. Discuss with the instructors how these logs can be used to diagnose any task execution issues.
0 commit comments