Skip to content

Commit

Permalink
markdown source builds
Browse files Browse the repository at this point in the history
Auto-generated via {sandpaper}
Source  : ee8a42a
Branch  : main
Author  : Djura <djura.smits@gmail.com>
Time    : 2024-06-04 09:03:41 +0000
Message : Merge pull request #17 from vantage6/004_episode_2_draft

004 episode 2 draft
  • Loading branch information
actions-user committed Jun 4, 2024
1 parent 386750a commit 988810c
Show file tree
Hide file tree
Showing 8 changed files with 265 additions and 79 deletions.
183 changes: 183 additions & 0 deletions 2-understanding-v6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
---
title: "vantage6 basics"
---

::: questions
- Why to use vantage6?
- How does vantage6 work?
- How do federated algorithms run in vantage6?
- What will be available in vantage6 in the future?
:::

::: objectives
- List the high-level infrastructure components of vantage6 (server, client, node)
- Understand the added value of vantage6
- Understand that there are different actors in the vantage6 network
- Understand that the vantage6 server does not run algorithms
- Explain how a simple analysis runs on vantage6
- Understand the future of vantage6 (policies, etc.)
:::

# Unique selling points of vantage6

vantage6 is a platform to execute privacy enhancing techniques (PETs). Several alternative platforms for PETS are available, but vantage6 provides some unique features:

- Open source.
- Container orchestration for privacy enhancing techniques.
- Easily extensible to different types of data sources.
- Algorithms can be developed in any language.
- Other applications can connect to vantage6 using the API.
- Managing and enforcing collaboration policies
- Minimal network requirements at data stations

# The vantage6 infrastructure

In vantage6, a **client** can pose a question to the central **server**. Each organization with sensitive data contributes one **node** to the network. The nodes collect the computation request from the server and fetches the **algorithm** to answer it. When the algorithm completes, the node sends the aggregated results back to the server.

![High level overview of the vantage6 infrastructure. Client(s) and Node(s) communicate through the Server. Nodes are able to communicate directly with each other when the optional VPN feature is enabled.](fig/vantage6_basic_schema.svg)

On a technical level, vantage6 may be seen as a container orchestration tool for privacy preserving analyses. It deploys a network of containerized applications that together ensure insights can be exchanged without sharing record-level data.

Lets explain in some more detail what these network actors are responsible for, and which subcomponents they contain.

### Server

The A (central) **server** that acts as communication hub between clients and nodes. The [server](https://docs.vantage6.ai/en/main/server/index.html) tracks the status of the computation requests and handles administrative functions such as authentication and authorization.It consists of multiple applications:

- **Vantage6 server**: Contains the users, organizations, collaborations, tasks and their results. It handles authentication and authorization to the system and acts as the communication hub for clients and nodes.

- **Docker registry**: Contains algorithms stored in images which can be used by clients to request a computation. The node will retrieve the algorithm from this registry and execute it. It is possible to use public registries for this purpose like [Docker hub](https://hub.docker.com/) or [Github Containers](https://ghcr.io). However it is also possible to host your own registry, for example a [Harbor](https://goharbor.io/) instance.

- **Algorithm store**: Is intended to be used as a repository for trusted algorithms within a certain project. [Algorithm stores](https://docs.vantage6.ai/en/main/algorithm_store/index.html) can be coupled to specific collaborations or to all collaborations on a given server.

- [**EduVPN instance**](https://docs.vantage6.ai/en/main/server/optional.html#eduvpn): If algorithms need to be able to engage in peer-to-peer communication, a VPN server can be set up to help them do so.

- [**RabbitMQ**](https://docs.vantage6.ai/en/main/server/optional.html#rabbitmq): Is used to synchronize the messages between multiple vantage6 server instances.

### Data Station

The data station hosts the [node](https://docs.vantage6.ai/en/main/node/index.html) (vantage6-node), that have access to the local data and execute algorithms, and a database.

- **Vantage6 node**: The node is responsible for executing the algorithms on the local data. It protects the data by allowing only specified algorithms to be executed after verifying their origin. The node is responsible for picking up the task, executing the algorithm and sending the results back to the server. The node needs access to local data. For more details see the technical documentation of the node.

- **Database**: The database may be in any format that the algorithms relevant to your use case support. The currently supported database types are listed here.

### Client

A user or application who interacts with the vantage6-server. They create tasks, retrieve their results, or manage entities at the server (i.e. creating or editing users, organizations and collaborations).

The vantage6 server is an API, which means that there are many ways to interact with it programatically. There are however a number of applications available that make is easier for users to interact with the vantage6 server:

- **User interface** The [user interface](https://docs.vantage6.ai/en/main/user/ui.html) is a web application (hosted at the server) that allows users to interact with the server. It is used to create and manage organizations, collaborations, users, tasks and algorithms. It also allows users to view and download the results of tasks. Use of the user interface recommended for ease of use.

- **Python client** The [vantage6 python client](https://docs.vantage6.ai/en/main/user/pyclient.html) <python-client> is a Python package that allows users to interact with the server from a Python environment. This is especially usefull for data scientists who want to integrate vantage6 into their existing Python workflow.

- **API** It is also possible to interact with the vantage6-server using the [API](https://docs.vantage6.ai/en/main/user/api.html) directly.


## How algorithms run in vantage6

Federated algorithms can be split in a **federated** and a **central** part:

- **Central**: The central part of the algorithm is responsible for orchestration and aggregation of the partial results.

- **Federated**: The partial tasks are executing computations on the local privacy sensitive data.

![vantage6 central and federated tasks.](fig/algorithm_central_and_subtasks.png)

Now, let’s see how this works in vantage6. The user creates a task for the central part of the algorithm. This is registered at the server, and leads to the creation of a central algorithm container on one of the nodes. The central algorithm then creates subtasks for the federated parts of the algorithm, which again are registered at the server. All nodes for which the subtask is intended start their work by executing the federated part of the algorithm. The nodes send the results back to the server, from where they are picked up by the central algorithm. The central algorithm then computes the final result and sends it to the server, where the user can retrieve it


::: callout

## vantage6-server vs central part of an algorithm

It is easy to confuse the central server with the central part of the algorithm: the server is the central part of the infrastructure but not the place where the central part of the algorithm is executed. The central part is actually executed at one of the nodes, because it gives more flexibility: for instance, an algorithm may need heavy compute resources to do the aggregation, and it is better to do this at a node that has these resources rather than having to upgrade the server whenever a new algorithm needs more resources.
:::
::: challenge

Two centers $A$ and $B$ have the following data regarding the age of a set of patients:
$a = [34, 42, 28, 49]$

$b = [51, 23, 44]$

Each center has a data station and We want to compute the overall average age of the patients.

![Architecture.](fig/schema_exercise.png)

Given that we that the the central average can be computed using the following equation:

$\overline{x} =\dfrac{1}n \sum_{i=1}^{n} x_i$

It can be written as follow, to make it ready for a federate computation:

$\overline{x} =\dfrac{1}{n_a+n_b} (\sum_{i=1}^{n_a} a_i+\sum_{i=1}^{n_b} b_i)$

Can you determine which part of the infrastructure will execute each part of the computation, and which is the result returned by the different parts?

::: solution

The Server starts the central task on one of the two nodes (e.g. Data station A).

The node A starts two subtasks, one per node. Node A will run the following computation:

$S_a =\sum_{i=1}^{n_a} a_i$

and it will return the following results to the central task:

$S_a=153$

$n_a=4$

Node B will run the following computation:

$S_b =\sum_{i=1}^{n_b} a_i$

and it will return the following results to the central task:

$S_b=118$

$n_b=3$

The central task receives $S_a$ and $n_a$ from node A and $S_b$ and $n_b$ from node B, and will run the following computation:

$\overline{x} =\dfrac{S_a+S_b}{n_a+n_b}=\dfrac{153+118}{4+3}=38.71$

![vantage6 algorithm workflow.](fig/algorithm_workflow.png)

:::

:::

# Future developments of vantage6
Back in 2018 when the development of vantage6 started, the focus was on Federated Learning. Since then, vantage6 has been extended to support different types of data sources, different types of algorithms and improved its usability. Privacy Enhancing Technologies (PET) are a rapidly evolving field. To keep up with the latest developments, the vantage6 platform is designed to be flexible and to adapt to new developments in the field.

From the development team we are working towards making vantage6 the PETOps platform for all your (distributed) analysis needs.

[Image of the PETOps cycle]

We identified a number of areas where we want to improve and extend vantage6 in order to achieve this goal:

## Policies
Currently, vantage6 lets you set several policies, such as the organizations that are allowed to participate in a collaboration, the algorithms that are allowed to run on the nodes, and the data that is allowed to be used in a collaboration. We want to extend this to a more generic policy framework in which any aspect of the vantage6 platform can be controlled by policies. This will maximize the flexibility of the platform and make it easier to adapt to new use cases.

For example, it would be possible:

* Define the version of vantage6 that is allowed to be used in a collaboration
* Which users is allowed to run a certain algorithm
* Which algorithms are allowed in a collaboration/study
* Define privacy guards at algorithm level

In order to avoid that policies need to be set manually at the nodes, we envision a distributed policy system (possibly using Blocakchain) in which policies are distributed to the nodes by the server.

## Model Repository
Currently vantage6 is focused on privacy enhancing techniques. Some of these techniques result in a model that can be used to make predictions. We want to extend vantage6 with a model repository in which these models can be stored, shared and used. This will make it easier to reuse models and to compare the performance of different models.


## Build Services
Algorithms in vantage6 are shipped as container images. Currently, this image can be build by the user or some external process. We want to extend vantage6 with a build service that can build the container image for you. This will make it easier to develop and deploy algorithms in vantage6 but more importantly, it will enhance the security of the platform as they are build in a controlled environment.


::: keypoints
These are the keypoints
:::
157 changes: 79 additions & 78 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -1,78 +1,79 @@
#------------------------------------------------------------
# Values for this lesson.
#------------------------------------------------------------

# Which carpentry is this (swc, dc, lc, or cp)?
# swc: Software Carpentry
# dc: Data Carpentry
# lc: Library Carpentry
# cp: Carpentries (to use for instructor training for instance)
# incubator: The Carpentries Incubator
carpentry: 'incubator'

# Overall title for pages.
title: 'Introduction to vantage6'

# Date the lesson was created (YYYY-MM-DD, this is empty by default)
created: 2024-03-26

# Comma-separated list of keywords for the lesson
keywords: 'federated learning, privacy enhancing technology, python'

# Life cycle stage of the lesson
# possible values: pre-alpha, alpha, beta, stable
life_cycle: 'pre-alpha'

# License of the lesson
license: 'CC-BY 4.0'

# Link to the source repository for this lesson
source: 'https://github.com/vantage6/vantage6-workshop'

# Default branch of your lesson
branch: 'main'

# Who to contact if there are any issues
contact: 'd.smits@esciencecenter.nl'

# Navigation ------------------------------------------------
#
# Use the following menu items to specify the order of
# individual pages in each dropdown section. Leave blank to
# include all pages in the folder.
#
# Example -------------
#
# episodes:
# - introduction.md
# - first-steps.md
#
# learners:
# - setup.md
#
# instructors:
# - instructor-notes.md
#
# profiles:
# - one-learner.md
# - another-learner.md

# Order of episodes in your lesson
episodes:
- introduction.md

# Information for Learners
learners:

# Information for Instructors
instructors:

# Learner Profiles
profiles:

# Customisation ---------------------------------------------
#
# This space below is where custom yaml items (e.g. pinning
# sandpaper and varnish versions) should live


#------------------------------------------------------------
# Values for this lesson.
#------------------------------------------------------------

# Which carpentry is this (swc, dc, lc, or cp)?
# swc: Software Carpentry
# dc: Data Carpentry
# lc: Library Carpentry
# cp: Carpentries (to use for instructor training for instance)
# incubator: The Carpentries Incubator
carpentry: 'incubator'

# Overall title for pages.
title: 'Introduction to vantage6'

# Date the lesson was created (YYYY-MM-DD, this is empty by default)
created: 2024-03-26

# Comma-separated list of keywords for the lesson
keywords: 'federated learning, privacy enhancing technology, python'

# Life cycle stage of the lesson
# possible values: pre-alpha, alpha, beta, stable
life_cycle: 'pre-alpha'

# License of the lesson
license: 'CC-BY 4.0'

# Link to the source repository for this lesson
source: 'https://github.com/vantage6/vantage6-workshop'

# Default branch of your lesson materials (recommended CC-BY 4.0)
branch: 'main'

# Who to contact if there are any issues
contact: 'd.smits@esciencecenter.nl'

# Navigation ------------------------------------------------
#
# Use the following menu items to specify the order of
# individual pages in each dropdown section. Leave blank to
# include all pages in the folder.
#
# Example -------------
#
# episodes:
# - introduction.md
# - first-steps.md
#
# learners:
# - setup.md
#
# instructors:
# - instructor-notes.md
#
# profiles:
# - one-learner.md
# - another-learner.md

# Order of episodes in your lesson
episodes:
- introduction.md
- 2-understanding-v6.md

# Information for Learners
learners:

# Information for Instructors
instructors:

# Learner Profiles
profiles:

# Customisation ---------------------------------------------
#
# This space below is where custom yaml items (e.g. pinning
# sandpaper and varnish versions) should live


Binary file added fig/algorithm_central_and_subtasks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/algorithm_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/schema_exercise.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/v6_basic_schema.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions fig/v6_basic_schema.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 988810c

Please sign in to comment.