Skip to content

Commit 654df39

Browse files
committed
- The first part of Episode 4 was moved (and further refined) to this one (as an introduction to the UI)
- The ER model as presented could be way too complex to be 'conceptualized' by the participants. I created a simplified version focused on what is key for this episode, including other key concepts that are important despite not being part of the v6 data model (algorithm and algorithm store) - The steps were moved to the solution (with exception of the last challenge), and the questions where reworded to make the exercise more challenging as suggested.
1 parent c38a1ad commit 654df39

File tree

4 files changed

+124
-44
lines changed

4 files changed

+124
-44
lines changed

episodes/chapter3.md

Lines changed: 60 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ exercises: 3
1414

1515
::::::::::::::::::::::::::::::::::::: objectives
1616

17-
- Explore specific data analysis scenarios that further illustrates the concept of collaboration
17+
- Explore specific data analysis scenarios that further illustrate the concept of collaboration
1818
- Understand the concept of 'algorithm trustworthiness' in the context of a vantage6 collaboration
1919
- Understand v6's algorithm-store current and envisioned features
2020
- Understand the UI-based approach for performing a data analysis through the given scenarios
@@ -26,7 +26,7 @@ exercises: 3
2626

2727
To navigate vantage6's UI seamlessly, it's essential to grasp the platform's fundamental concepts and their interconnections, as the UI design reflects these relationships. The following is a simplified model of vantage6 concepts, where a `1-n` relationship means that the entity on the left side of the relationship is related to one or more entities on the right side. For instance, a **collaboration** involves one or more **nodes**, but each **node** can only be linked to exactly one **collaboration**. An `n-n` relationship is a many-to-many relationship: for instance, a **collaboration** can involve multiple **organizations**, and at the same time, each **organization** can participate in multiple **collaborations**.
2828

29-
![vantage6 relations between entities](fig/chapter3/v6_entitites_simplified.png)
29+
![Vantage6 core concepts](fig/chapter3/v6_entitites_simplified.png)
3030

3131
Given the above, the following are the most important concepts to be considered for this episode:
3232

@@ -50,19 +50,19 @@ Given the above, the following are the most important concepts to be considered
5050
- **Result**: the output generated by the execution of an **algorithm** as part of a **task**.
5151
- **Algorithm**: computational models or processes that are executed on data. Compatible algorithms are those that adhere to the Vantage6 framework, enabling them to be securely distributed to **nodes** for execution.
5252

53-
### Where are the concepts in the UI?
53+
### Where are these concepts in the UI?
5454

5555
After logging in to the vantage6 UI, you will see the start page.
5656

5757
![vantage6 UI start page](fig/chapter3/ui_start_page.png)
5858

59-
There are some collbarations displayed on the start page. Clicking one of the collaborations will show the tasks of that collaboration.
59+
There are some collaborations displayed on the start page. Clicking one of the collaborations will show the tasks of that collaboration.
6060

6161
![vantage6 UI tasks page](fig/chapter3/ui_task_page.png)
6262

6363
The start page also contains a button `Administration` in the top right corner. Clicking on this button will redirect you to the administration page.
6464

65-
In the administration page, you can manage the entities of vantage6. The entities are divided into tabs: `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes`. You can click on an entity to see more details or to edit the entity. We will get back to this later in more detail.
65+
On the administration page, you can manage the entities of vantage6. The entities are divided into tabs: `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes`. You can click on an entity to see more details or to edit the entity. We will get back to this later in more detail.
6666

6767
![vantage6 UI administration page](fig/chapter3/ui_admin_page.png)
6868

@@ -80,9 +80,9 @@ Can you find the `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes
8080

8181

8282

83-
## A hypothetical case study using vantage6 collaborations
83+
## From theory to practice: a hypothetical case study using vantage6 collaborations
8484

85-
In the context of vantage6, a collaboration refers to an agreement between two or more parties to participate in a study or to answer a research question together. This concept is central to the Privacy Enhancing Technologies (PETs) that vantage6 supports. Each party involved in a collaboration remains autonomous, meaning they retain control over their data and can decide how much of their data to contribute to the collaboration's global model and which algorithms are allowed for execution.
85+
As previously discussed, in vantage6 a collaboration refers to an agreement between two or more parties to participate in a study or to answer a research question together. This concept is central to the Privacy Enhancing Technologies (PETs) that vantage6 supports. Each party involved in a collaboration remains autonomous, meaning they retain control over their data and can decide how much of their data to contribute to the collaboration's global model and which algorithms are allowed for execution.
8686

8787
To illustrate this, let's analyze a hypothetical scenario: two international research projects relying on vantage6 technology on the same server:
8888

@@ -96,32 +96,37 @@ Following vantage6's concepts, this scenario would involve two collaborations, o
9696
![Hypothetical collaborations scenario](fig/chapter3/orgs_n_collabs_scenario.png)
9797

9898

99-
## Algorithms trustworthiness on a federated setting
99+
### Algorithms trustworthiness in a federated setting
100100

101101
While a vantage6-supported research infrastructure like the one described above offers a strong defense against many data privacy risks, there remains one crucial security aspect that falls outside the platform's scope: the validation of the code that will run on this infrastructure. For instance, the administrators of the nodes running within each organization are responsible for defining which algorithms (i.e., [which container images](https://docs.vantage6.ai/en/main/node/configure.html#all-configuration-options)) will be allowed for execution on the respective collaborations. As this is a critical and complex task that entails activities like code analysis and verification, working with algorithms from trusted sources is the primary line of defense against potential threats.
102102

103103
Vantage6's algorithm store feature aims to enhance trustworthiness by offering a centralized platform for managing pre-registered algorithms. This serves as an alternative to using algorithms from unknown authors or those lacking transparency regarding their development process and status. The algorithm store currently allows researchers to explore which algorithms are available and how to run them. This, along with its integration with vantage6's UI, streamlines task execution requests within collaborations.
104104

105-
As of the time of writing this tutorial, efforts are underway to integrate additional information to the algorithms metadata such as creators and code reviewers. Moreover, plans are in place to incorporate the algorithm review process into the publication procedure for any algorithms in the store.
105+
As of the time of writing this tutorial, efforts are underway to integrate additional information to the algorithm metadata such as creators and code reviewers. Moreover, plans are in place to incorporate the algorithm review process into the publication procedure for any algorithms in the store.
106106

107-
## Running a PET (privacy-enhancing technology) analysis without programming!
107+
### Running a PET (privacy-enhancing technology) analysis without programming!
108108

109-
In this episode, you will perform a PET analysis on an existing vantage6 collaborations (based on 'dummy' nodes) that resemble the two described above. For reference, the datasets of each organization can be seen here (TODO).
109+
In this episode, you will perform a PET analysis on an existing vantage6 collaboration (based on 'dummy' nodes) that resembles the two described above. For reference, the datasets of each organization can be seen here (TODO).
110110

111111
::::::::::::::::::::::::::::::::::::: challenge
112112

113-
## Challenge 1: understanding a simple federated algorithm
113+
## Challenge 2: understanding a simple federated algorithm
114114

115-
First, let's take a look at one of the federated algorithms -available on the vantage6's community store- that will be used in this episode: [a federated average](https://github.com/IKNL/v6-average-py/blob/master/v6-average-py/__init__.py). Based on the code and its comments:
115+
First, let's take a look at one of the federated algorithms -available on the vantage6's community store- that will be used in this episode: [a federated average](https://github.com/IKNL/v6-average-py/blob/master/v6-average-py/__init__.py).
116116

117-
- What is the difference between the 'central_average' and the 'partial_average' functions?
118-
- What would happen if this analysis is started in a collaboration that has one of its nodes 'offline'?
117+
Analyze the algorithm based on the code and its comments and answer the following questions:
118+
119+
- How are the `central_average` and `partial_average` functions related?
120+
- Why does the `central_average` function, unlike `partial_average`, doesn't get any data as an input?
121+
- Analyze and discuss the potential outcomes if a Task to execute `central_average` is initiated within a collaboration where one of the nodes is offline.
119122

120123
::::::::::::::::::::::::::::::::::::::::::::::::
121124

125+
126+
122127
::::::::::::::::::::::::::::::::::::: challenge
123128

124-
## Challenge 2: exploring the status of existing collaborations configured on a vantage6 server
129+
## Challenge 3: exploring the status of existing collaborations configured on a vantage6 server
125130

126131
Below are the administrator credential of GHT and PhY24 collaborations (passwords will be given by the instructors).
127132

@@ -130,7 +135,18 @@ Below are the administrator credential of GHT and PhY24 collaborations (password
130135
| PhY24-admin | Collaboration Admin | PhY24 |
131136
| GHT-admin | Collaboration Admin | GHT |
132137

133-
Check the status of the nodes of each collaboration:
138+
Using these credentials check the status of both collaborations. Given this and your algorithm analysis from Challenge #2 answer the following:
139+
140+
1. Which collaborations are ready for creating a Task for the __federated average__ algorithm?
141+
2. If one of the collaborations is not ready, which organization you would need to contact in order to make it ready for executing the algorithm too?
142+
143+
::::::::::::::::::::::::::::::::::::::::::::::::
144+
145+
:::::::::::::::::::::::: solution
146+
147+
## Solution steps
148+
149+
To check the status of the nodes of each collaboration:
134150

135151
1. Log in to each one with the given credentials
136152
2. Click on 'Administration' on the top of the UI
@@ -139,8 +155,19 @@ Check the status of the nodes of each collaboration:
139155

140156
![Collaboration status](fig/chapter3/collab-status-offline.png)
141157

142-
- Based on what you see on Challange #1, which collaboration would be ready to request the 'Average' algorithm on it?
143-
- For the other collaboration, which organization you would need to reach in order to fix the issue?
158+
:::::::::::::::::::::::::::::::::
159+
160+
::::::::::::::::::::::::::::::::::::: challenge
161+
162+
## Challenge 4: adding an algorithm store to an organization
163+
164+
In order to execute the __average algorithm__ on a given collaboration, considering the previous discussion on algorithm trustwortiness, you need to first register an algorithm store on it first. Use the credentials given for Challenge #4 to register the 'community store', which contains the said algorithm: `https://store.cotopaxi.vantage6.ai`
165+
166+
::::::::::::::::::::::::::::::::::::::::::::::::
167+
168+
:::::::::::::::::::::::: solution
169+
170+
## Solution steps
144171

145172
You will now link the 'community-store' to the collaboration whose nodes are ready for it.
146173

@@ -151,19 +178,20 @@ You will now link the 'community-store' to the collaboration whose nodes are rea
151178
5. Make sure the store is now shown on the collaboration details:
152179
![Community store entry on the collaboration details](fig/chapter3/community-store-entry.png)
153180

154-
::::::::::::::::::::::::::::::::::::::::::::::::
181+
:::::::::::::::::::::::::::::::::
155182

156183
::::::::::::::::::::::::::::::::::::: challenge
157-
## Challenge 3: your first algorithm execution as a researcher
184+
185+
## Challenge 5: your first algorithm execution as a researcher
158186

159187
Now, you'll take on the role of the researcher within the collaboration for which you've just established the algorithm store. With this role, you will finally request the execution of the algorithm.
160188

161189
1. log in as a researcher using the corresponding credentials below:
162190

163-
| User | Roles | Collaboration |
164-
|----|-----|-----|
165-
|PhY24-rs1 | Researcher |PhY24 |
166-
|GHT-rs1 | Researcher |GHT |
191+
| User | Roles | Collaboration |
192+
|-----------|---------------|------------------|
193+
|PhY24-rs1 | Researcher |PhY24 |
194+
|GHT-rs1 | Researcher |GHT |
167195

168196
2. Select the collaboration given on the front page, and select 'Tasks' from the panel on the left.
169197
![Collaboration researcher view](fig/chapter3/collab-researcher-view.png)
@@ -172,7 +200,7 @@ Now, you'll take on the role of the researcher within the collaboration for whic
172200

173201
![Algorithm selection](fig/chapter3/task-alg-selection.png)
174202

175-
4. Now the UI will let you choose between the two functions you explored in Challenge #1. First, try to run the 'partial_average' on all the nodes individually.
203+
4. Now the UI will let you choose between the two functions you explored in Challenge #1. First, try to run the `partial_average` on all the nodes individually.
176204

177205
![Running a function on all nodes](fig/chapter3/task-partial-on-individial-orgs.png)
178206

@@ -182,8 +210,12 @@ Now, you'll take on the role of the researcher within the collaboration for whic
182210
![alt text](fig/chapter3/task-results.png)
183211

184212

185-
- Based on your understanding of the 'central_average' function, if you create one a new task, which organization nodes should you choose this time in order to actually calculate the overall (across all the datasets) average? Experiment with this and discuss the results with the instructors.
186-
- What would happen if you select an alpha-numerical column (e.g., 'participant_pseudo_id')? Do this experiment and explore the generated error logs. Discuss with the instructors how these logs can be used to diagnose any task execution issues.
213+
Based on these results, answer the following:
214+
215+
216+
1. If you repeat the same exercise but with the `central_average` function (refer to Challenge #2 if needed), which organization nodes should you choose this time to actually calculate the overall (across all the datasets) average? Experiment with this and discuss the results with the instructors.
217+
218+
2. What would happen if you select an alpha-numerical column (e.g., 'participant_pseudo_id')? Do this experiment and explore the generated error logs. Discuss with the instructors how these logs can be used to diagnose any task execution issues.
187219

188220
::::::::::::::::::::::::::::::::::::::::::::::::
189221

Loading

0 commit comments

Comments
 (0)