Skip to content

Commit

Permalink
- The first part of Episode 4 was moved (and further refined) to this…
Browse files Browse the repository at this point in the history
… one (as an introduction to the UI)

- The ER model as presented could be way too complex to be 'conceptualized' by the participants. I created a simplified version focused on what is key for this episode, including other key concepts that are important despite not being part of the v6 data model (algorithm and algorithm store)
- The steps were moved to the solution (with exception of the last challenge), and the questions where reworded to make the exercise more challenging as suggested.
  • Loading branch information
hcadavid authored and dsmits committed Jun 4, 2024
1 parent e4917c9 commit 8ecd0bc
Show file tree
Hide file tree
Showing 4 changed files with 124 additions and 44 deletions.
88 changes: 60 additions & 28 deletions episodes/chapter3.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ exercises: 3

::::::::::::::::::::::::::::::::::::: objectives

- Explore specific data analysis scenarios that further illustrates the concept of collaboration
- Explore specific data analysis scenarios that further illustrate the concept of collaboration
- Understand the concept of 'algorithm trustworthiness' in the context of a vantage6 collaboration
- Understand v6's algorithm-store current and envisioned features
- Understand the UI-based approach for performing a data analysis through the given scenarios
Expand All @@ -26,7 +26,7 @@ exercises: 3

To navigate vantage6's UI seamlessly, it's essential to grasp the platform's fundamental concepts and their interconnections, as the UI design reflects these relationships. The following is a simplified model of vantage6 concepts, where a `1-n` relationship means that the entity on the left side of the relationship is related to one or more entities on the right side. For instance, a **collaboration** involves one or more **nodes**, but each **node** can only be linked to exactly one **collaboration**. An `n-n` relationship is a many-to-many relationship: for instance, a **collaboration** can involve multiple **organizations**, and at the same time, each **organization** can participate in multiple **collaborations**.

![vantage6 relations between entities](fig/chapter3/v6_entitites_simplified.png)
![Vantage6 core concepts](fig/chapter3/v6_entitites_simplified.png)

Given the above, the following are the most important concepts to be considered for this episode:

Expand All @@ -50,19 +50,19 @@ Given the above, the following are the most important concepts to be considered
- **Result**: the output generated by the execution of an **algorithm** as part of a **task**.
- **Algorithm**: computational models or processes that are executed on data. Compatible algorithms are those that adhere to the Vantage6 framework, enabling them to be securely distributed to **nodes** for execution.

### Where are the concepts in the UI?
### Where are these concepts in the UI?

After logging in to the vantage6 UI, you will see the start page.

![vantage6 UI start page](fig/chapter3/ui_start_page.png)

There are some collbarations displayed on the start page. Clicking one of the collaborations will show the tasks of that collaboration.
There are some collaborations displayed on the start page. Clicking one of the collaborations will show the tasks of that collaboration.

![vantage6 UI tasks page](fig/chapter3/ui_task_page.png)

The start page also contains a button `Administration` in the top right corner. Clicking on this button will redirect you to the administration page.

In the administration page, you can manage the entities of vantage6. The entities are divided into tabs: `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes`. You can click on an entity to see more details or to edit the entity. We will get back to this later in more detail.
On the administration page, you can manage the entities of vantage6. The entities are divided into tabs: `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes`. You can click on an entity to see more details or to edit the entity. We will get back to this later in more detail.

![vantage6 UI administration page](fig/chapter3/ui_admin_page.png)

Expand All @@ -80,9 +80,9 @@ Can you find the `Organizations`, `Collaborations`, `Roles`, `Users`, and `Nodes



## A hypothetical case study using vantage6 collaborations
## From theory to practice: a hypothetical case study using vantage6 collaborations

In the context of vantage6, a collaboration refers to an agreement between two or more parties to participate in a study or to answer a research question together. This concept is central to the Privacy Enhancing Technologies (PETs) that vantage6 supports. Each party involved in a collaboration remains autonomous, meaning they retain control over their data and can decide how much of their data to contribute to the collaboration's global model and which algorithms are allowed for execution.
As previously discussed, in vantage6 a collaboration refers to an agreement between two or more parties to participate in a study or to answer a research question together. This concept is central to the Privacy Enhancing Technologies (PETs) that vantage6 supports. Each party involved in a collaboration remains autonomous, meaning they retain control over their data and can decide how much of their data to contribute to the collaboration's global model and which algorithms are allowed for execution.

To illustrate this, let's analyze a hypothetical scenario: two international research projects relying on vantage6 technology on the same server:

Expand All @@ -96,32 +96,37 @@ Following vantage6's concepts, this scenario would involve two collaborations, o
![Hypothetical collaborations scenario](fig/chapter3/orgs_n_collabs_scenario.png)


## Algorithms trustworthiness on a federated setting
### Algorithms trustworthiness in a federated setting

While a vantage6-supported research infrastructure like the one described above offers a strong defense against many data privacy risks, there remains one crucial security aspect that falls outside the platform's scope: the validation of the code that will run on this infrastructure. For instance, the administrators of the nodes running within each organization are responsible for defining which algorithms (i.e., [which container images](https://docs.vantage6.ai/en/main/node/configure.html#all-configuration-options)) will be allowed for execution on the respective collaborations. As this is a critical and complex task that entails activities like code analysis and verification, working with algorithms from trusted sources is the primary line of defense against potential threats.

Vantage6's algorithm store feature aims to enhance trustworthiness by offering a centralized platform for managing pre-registered algorithms. This serves as an alternative to using algorithms from unknown authors or those lacking transparency regarding their development process and status. The algorithm store currently allows researchers to explore which algorithms are available and how to run them. This, along with its integration with vantage6's UI, streamlines task execution requests within collaborations.

As of the time of writing this tutorial, efforts are underway to integrate additional information to the algorithms metadata such as creators and code reviewers. Moreover, plans are in place to incorporate the algorithm review process into the publication procedure for any algorithms in the store.
As of the time of writing this tutorial, efforts are underway to integrate additional information to the algorithm metadata such as creators and code reviewers. Moreover, plans are in place to incorporate the algorithm review process into the publication procedure for any algorithms in the store.

## Running a PET (privacy-enhancing technology) analysis without programming!
### Running a PET (privacy-enhancing technology) analysis without programming!

In this episode, you will perform a PET analysis on an existing vantage6 collaborations (based on 'dummy' nodes) that resemble the two described above. For reference, the datasets of each organization can be seen here (TODO).
In this episode, you will perform a PET analysis on an existing vantage6 collaboration (based on 'dummy' nodes) that resembles the two described above. For reference, the datasets of each organization can be seen here (TODO).

::::::::::::::::::::::::::::::::::::: challenge

## Challenge 1: understanding a simple federated algorithm
## Challenge 2: understanding a simple federated algorithm

First, let's take a look at one of the federated algorithms -available on the vantage6's community store- that will be used in this episode: [a federated average](https://github.com/IKNL/v6-average-py/blob/master/v6-average-py/__init__.py). Based on the code and its comments:
First, let's take a look at one of the federated algorithms -available on the vantage6's community store- that will be used in this episode: [a federated average](https://github.com/IKNL/v6-average-py/blob/master/v6-average-py/__init__.py).

- What is the difference between the 'central_average' and the 'partial_average' functions?
- What would happen if this analysis is started in a collaboration that has one of its nodes 'offline'?
Analyze the algorithm based on the code and its comments and answer the following questions:

- How are the `central_average` and `partial_average` functions related?
- Why does the `central_average` function, unlike `partial_average`, doesn't get any data as an input?
- Analyze and discuss the potential outcomes if a Task to execute `central_average` is initiated within a collaboration where one of the nodes is offline.

::::::::::::::::::::::::::::::::::::::::::::::::



::::::::::::::::::::::::::::::::::::: challenge

## Challenge 2: exploring the status of existing collaborations configured on a vantage6 server
## Challenge 3: exploring the status of existing collaborations configured on a vantage6 server

Below are the administrator credential of GHT and PhY24 collaborations (passwords will be given by the instructors).

Expand All @@ -130,7 +135,18 @@ Below are the administrator credential of GHT and PhY24 collaborations (password
| PhY24-admin | Collaboration Admin | PhY24 |
| GHT-admin | Collaboration Admin | GHT |

Check the status of the nodes of each collaboration:
Using these credentials check the status of both collaborations. Given this and your algorithm analysis from Challenge #2 answer the following:

1. Which collaborations are ready for creating a Task for the __federated average__ algorithm?
2. If one of the collaborations is not ready, which organization you would need to contact in order to make it ready for executing the algorithm too?

::::::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::: solution

## Solution steps

To check the status of the nodes of each collaboration:

1. Log in to each one with the given credentials
2. Click on 'Administration' on the top of the UI
Expand All @@ -139,8 +155,19 @@ Check the status of the nodes of each collaboration:

![Collaboration status](fig/chapter3/collab-status-offline.png)

- Based on what you see on Challange #1, which collaboration would be ready to request the 'Average' algorithm on it?
- For the other collaboration, which organization you would need to reach in order to fix the issue?
:::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: challenge

## Challenge 4: adding an algorithm store to an organization

In order to execute the __average algorithm__ on a given collaboration, considering the previous discussion on algorithm trustwortiness, you need to first register an algorithm store on it first. Use the credentials given for Challenge #4 to register the 'community store', which contains the said algorithm: `https://store.cotopaxi.vantage6.ai`

::::::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::: solution

## Solution steps

You will now link the 'community-store' to the collaboration whose nodes are ready for it.

Expand All @@ -151,19 +178,20 @@ You will now link the 'community-store' to the collaboration whose nodes are rea
5. Make sure the store is now shown on the collaboration details:
![Community store entry on the collaboration details](fig/chapter3/community-store-entry.png)

::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: challenge
## Challenge 3: your first algorithm execution as a researcher

## Challenge 5: your first algorithm execution as a researcher

Now, you'll take on the role of the researcher within the collaboration for which you've just established the algorithm store. With this role, you will finally request the execution of the algorithm.

1. log in as a researcher using the corresponding credentials below:

| User | Roles | Collaboration |
|----|-----|-----|
|PhY24-rs1 | Researcher |PhY24 |
|GHT-rs1 | Researcher |GHT |
| User | Roles | Collaboration |
|-----------|---------------|------------------|
|PhY24-rs1 | Researcher |PhY24 |
|GHT-rs1 | Researcher |GHT |

2. Select the collaboration given on the front page, and select 'Tasks' from the panel on the left.
![Collaboration researcher view](fig/chapter3/collab-researcher-view.png)
Expand All @@ -172,7 +200,7 @@ Now, you'll take on the role of the researcher within the collaboration for whic

![Algorithm selection](fig/chapter3/task-alg-selection.png)

4. Now the UI will let you choose between the two functions you explored in Challenge #1. First, try to run the 'partial_average' on all the nodes individually.
4. Now the UI will let you choose between the two functions you explored in Challenge #1. First, try to run the `partial_average` on all the nodes individually.

![Running a function on all nodes](fig/chapter3/task-partial-on-individial-orgs.png)

Expand All @@ -182,8 +210,12 @@ Now, you'll take on the role of the researcher within the collaboration for whic
![alt text](fig/chapter3/task-results.png)


- Based on your understanding of the 'central_average' function, if you create one a new task, which organization nodes should you choose this time in order to actually calculate the overall (across all the datasets) average? Experiment with this and discuss the results with the instructors.
- What would happen if you select an alpha-numerical column (e.g., 'participant_pseudo_id')? Do this experiment and explore the generated error logs. Discuss with the instructors how these logs can be used to diagnose any task execution issues.
Based on these results, answer the following:


1. If you repeat the same exercise but with the `central_average` function (refer to Challenge #2 if needed), which organization nodes should you choose this time to actually calculate the overall (across all the datasets) average? Experiment with this and discuss the results with the instructors.

2. What would happen if you select an alpha-numerical column (e.g., 'participant_pseudo_id')? Do this experiment and explore the generated error logs. Discuss with the instructors how these logs can be used to diagnose any task execution issues.

::::::::::::::::::::::::::::::::::::::::::::::::

Binary file modified episodes/fig/chapter3/v6_entitites_simplified.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 8ecd0bc

Please sign in to comment.