differences for PR #17

vantage6 · May 8, 2024 · 3b42592 · 3b42592
1 parent 7986be7
commit 3b42592
Show file tree

Hide file tree

Showing 8 changed files with 229 additions and 79 deletions.
diff --git a/2-understanding-v6.md b/2-understanding-v6.md
@@ -0,0 +1,147 @@
+---
+title: "2-understanding-v6"
+---
+
+::: questions
+-   Why to use vantage6?
+-   How does vantage6 work?
+-   How do federated algorithms run in vantage6?
+-   What will be available in vantage6 in the future?
+:::
+
+::: objectives
+-   List the high-level infrastructure components of v6 (server, client, node)
+-   Understand the added value of v6
+-   Understand that there is different actors in algorithms
+-   Understand that the v6 server does not run algorithms
+-   Explain how a simple analysis runs on v6
+-   Understand the future of vantage6 (policies, etc.)
+:::
+
+# Unique selling points of vantage6
+
+vantage6 is a platform to perform  privacy preserving federated learning. Other platofmrs for federalte learning are current available, but vantage6 provides some unique features:
+-   Open source.
+-   Container orchestration for privacy enhancing techniques.
+-   Easily extensible to different types of data sources.
+-   Algorithms can be developed in any language.
+-   Other applications can connect to vantage6 using the API.
+
+
+# The vantage6 infrastructure
+
+In vantage6, a **client** can pose a question to the central **server**. Each organization with sensitive data contributes one **node** to the network. The nodes collects the research question from the server and fetches the **algorithm to answer** it. When the algorithm completes, the node sends the aggregated results back to the server.
+
+The roles of these vantage6 components are as follows:
+
+-   A (central) **server** coordinates communication with clients and nodes. The server tracks the status of the computation requests and handles administrative functions such as authentication and authorization.
+
+-   **Node(s)** have access to data and execute algorithms
+
+-   **Clients** (i.e. users or applications) request computations from the nodes via the client
+
+-   **Algorithms** are scripts that are run on the sensitive data. Each algorithm is packaged in a Docker image; the node pulls the image from a Docker registry and runs it on the local data. Note that the node owner can control which algorithms are allowed to run on their data.
+
+![v6 basic schema.](fig/v6_basic_schema.svg)
+
+On a technical level, vantage6 may be seen as a (Docker) container orchestration tool for privacy preserving analyses. It deploys a network of containerized applications that together ensure insights can be exchanged without sharing record-level data.
+
+## The vantage 6 infrastructure components
+
+As we saw before, the vantage6 network consists of a central server, a number of nodes and a client. This section explains in some more detail what these network actors are responsible for, and which subcomponents they contain.
+
+### Server
+
+The vantage6 server is the central point of contact for all communication in vantage6. However, it also relies on other infrastructure components to function properly. It consists of:
+
+-   **Vantage6 server**: Contains the users, organizations, collaborations, tasks and their results. It handles authentication and authorization to the system and is the central point of contact for clients and nodes.
+
+-   **Docker registry**: Contains algorithms stored in images which can be used by clients to request a computation. The node will retrieve the algorithm from this registry and execute it.
+
+-   **Algorithm store**: Is intended to be used as a repository for trusted algorithms within a certain project. Algorithm stores can be coupled to specific collaborations or to all collaborations on a given server.
+
+### Data Station
+
+The data station hosts the node (vantage6-node) and a database.
+
+-   **Vantage6 node** The node is responsible for executing the algorithms on the local data. It protects the data by allowing only specified algorithms to be executed after verifying their origin. The node is responsible for picking up the task, executing the algorithm and sending the results back to the server. The node needs access to local data. For more details see the technical documentation of the node.
+
+-   **Database** The database may be in any format that the algorithms relevant to your use case support. The currently supported database types are listed here.
+
+### Client
+
+A user or application who interacts with the vantage6-server. They create tasks, retrieve their results, or manage entities at the server (i.e. creating or editing users, organizations and collaborations).
+
+The vantage6 server is an API, which means that there are many ways to interact with it programatically. There are however a number of applications available that make is easier for users to interact with the vantage6 server:
+
+-   **User interface** The user interface is a web application that allows users to interact with the server. It is used to create and manage organizations, collaborations, users, tasks and algorithms. It also allows users to view and download the results of tasks. Use of the user interface recommended for ease of use.
+
+-   **Python client** The vantage6 python client <python-client> is a Python package that allows users to interact with the server from a Python environment. This is helpful for data scientists who want to integrate vantage6 into their existing Python workflow.
+
+-   **API** It is also possible to interact with the server using the API directly.
+
+
+## How algorithms run in vantage6
+
+Federated algorithms can be split in a **federated** and a **central** part:
+
+-   **Central**: The central part of the algorithm is responsible for orchestration and aggregation of the partial results. 
+
+-   **Federated**: The partial tasks are executing computations on the local privacy sensitive data. 
+
+![v6 central and federated tasks.](fig/algorithm_central_and_subtasks.png)
+
+Now, let’s see how this works in vantage6. The user creates a task for the central part of the algorithm. This is registered at the server, and leads to the creation of a central algorithm container on one of the nodes. The central algorithm then creates subtasks for the federated parts of the algorithm, which again are registered at the server. All nodes for which the subtask is intended start their work by executing the federated part of the algorithm. The nodes send the results back to the server, from where they are picked up by the central algorithm. The central algorithm then computes the final result and sends it to the server, where the user can retrieve it
+
+
+::: callout
+
+## central server vs central part of an algorithm
+
+It is easy to confuse the central server with the central part of the algorithm: the server is the central part of the infrastructure but not the place where the central part of the algorithm is executed. The central part is actually executed at one of the nodes, because it gives more flexibility: for instance, an algorithm may need heavy compute resources to do the aggregation, and it is better to do this at a node that has these resources rather than having to upgrade the server whenever a new algorithm needs more resources.
+:::
+::: challenge
+
+Two centers $a$ and $b$ have data regarding the age of a set of patients. Each center has a data station and We want to compute the overall average age of the patients. 
+
+![Architecture.](fig/schema_exercise.png)
+
+Given that we that the the central average can be computed using the following equation:
+
+$\overline{x} =\dfrac{1}n \sum_{i=1}^{n} x_i$
+
+It can be written as follow, to make it ready for a federate computation:
+
+$\overline{x} =\dfrac{1}{n_a+n_b} (\sum_{i=1}^{n_a} a_i+\sum_{i=1}^{n_b} b_i)$
+
+Can you determine which part of the infrastructure will execute each part of the computation?
+
+::: solution
+
+The Server starts the central task on one of the two nodes (e.g. Data station A).
+
+The node A starts two subtasks, one per node. Node A will run the following computation:
+
+$S_a =\sum_{i=1}^{n_a} a_i$
+
+Node B will run the following computation:
+
+$S_b =\sum_{i=1}^{n_b} a_i$
+
+The central task receives $S_a$ and $n_a$ from node A and $S_b$ and $n_b$ from node B, and will run the following computation:
+
+$\overline{x} =\dfrac{S_a+S_b}{n_a+n_b}$
+
+![v6 algorithm workflow.](fig/algorithm_workflow.png)
+
+:::
+
+:::
+
+# Future developments of vantage6
+
+TODO
+
+::: keypoints
+These are the keypoints
+:::
diff --git a/config.yaml b/config.yaml
@@ -1,78 +1,79 @@
-#------------------------------------------------------------
-# Values for this lesson.
-#------------------------------------------------------------
-
-# Which carpentry is this (swc, dc, lc, or cp)?
-# swc: Software Carpentry
-# dc: Data Carpentry
-# lc: Library Carpentry
-# cp: Carpentries (to use for instructor training for instance)
-# incubator: The Carpentries Incubator
-carpentry: 'incubator'
-
-# Overall title for pages.
-title: 'Introduction to vantage6'
-
-# Date the lesson was created (YYYY-MM-DD, this is empty by default)
-created: 2024-03-26
-
-# Comma-separated list of keywords for the lesson
-keywords: 'federated learning, privacy enhancing technology, python'
-
-# Life cycle stage of the lesson
-# possible values: pre-alpha, alpha, beta, stable
-life_cycle: 'pre-alpha'
-
-# License of the lesson
-license: 'CC-BY 4.0'
-
-# Link to the source repository for this lesson
-source: 'https://github.com/vantage6/vantage6-workshop'
-
-# Default branch of your lesson
-branch: 'main'
-
-# Who to contact if there are any issues
-contact: 'd.smits@esciencecenter.nl'
-
-# Navigation ------------------------------------------------
-#
-# Use the following menu items to specify the order of
-# individual pages in each dropdown section. Leave blank to
-# include all pages in the folder.
-#
-# Example -------------
-#
-# episodes:
-# - introduction.md
-# - first-steps.md
-#
-# learners:
-# - setup.md
-#
-# instructors:
-# - instructor-notes.md
-#
-# profiles:
-# - one-learner.md
-# - another-learner.md
-
-# Order of episodes in your lesson
-episodes: 
-- introduction.md
-
-# Information for Learners
-learners: 
-
-# Information for Instructors
-instructors: 
-
-# Learner Profiles
-profiles: 
-
-# Customisation ---------------------------------------------
-#
-# This space below is where custom yaml items (e.g. pinning
-# sandpaper and varnish versions) should live
-
-
+#------------------------------------------------------------
+# Values for this lesson.
+#------------------------------------------------------------
+
+# Which carpentry is this (swc, dc, lc, or cp)?
+# swc: Software Carpentry
+# dc: Data Carpentry
+# lc: Library Carpentry
+# cp: Carpentries (to use for instructor training for instance)
+# incubator: The Carpentries Incubator
+carpentry: 'incubator'
+
+# Overall title for pages.
+title: 'Introduction to vantage6'
+
+# Date the lesson was created (YYYY-MM-DD, this is empty by default)
+created: 2024-03-26
+
+# Comma-separated list of keywords for the lesson
+keywords: 'federated learning, privacy enhancing technology, python'
+
+# Life cycle stage of the lesson
+# possible values: pre-alpha, alpha, beta, stable
+life_cycle: 'pre-alpha'
+
+# License of the lesson
+license: 'CC-BY 4.0'
+
+# Link to the source repository for this lesson
+source: 'https://github.com/vantage6/vantage6-workshop'
+
+# Default branch of your lesson materials (recommended CC-BY 4.0)
+branch: 'main'
+
+# Who to contact if there are any issues
+contact: 'd.smits@esciencecenter.nl'
+
+# Navigation ------------------------------------------------
+#
+# Use the following menu items to specify the order of
+# individual pages in each dropdown section. Leave blank to
+# include all pages in the folder.
+#
+# Example -------------
+#
+# episodes:
+# - introduction.md
+# - first-steps.md
+#
+# learners:
+# - setup.md
+#
+# instructors:
+# - instructor-notes.md
+#
+# profiles:
+# - one-learner.md
+# - another-learner.md
+
+# Order of episodes in your lesson
+episodes: 
+- introduction.md
+- 2-understanding-v6.md
+
+# Information for Learners
+learners: 
+
+# Information for Instructors
+instructors: 
+
+# Learner Profiles
+profiles: 
+
+# Customisation ---------------------------------------------
+#
+# This space below is where custom yaml items (e.g. pinning
+# sandpaper and varnish versions) should live
+
+
diff --git a/fig/algorithm_central_and_subtasks.png b/fig/algorithm_central_and_subtasks.png
diff --git a/fig/algorithm_workflow.png b/fig/algorithm_workflow.png
diff --git a/fig/schema_exercise.png b/fig/schema_exercise.png
diff --git a/fig/v6_basic_schema.png b/fig/v6_basic_schema.png
diff --git a/fig/v6_basic_schema.svg b/fig/v6_basic_schema.svg
diff --git a/md5sum.txt b/md5sum.txt
@@ -1,11 +1,12 @@
 "file" "checksum" "built" "date"
 "CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2022-08-05"
 "LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2023-04-07"
-"config.yaml" "aad678198b3608a7fafced2a32ce155d" "site/built/config.yaml" "2024-03-27"
+"config.yaml" "0428addc862b83b97d9fcf4c0cc09b1a" "site/built/config.yaml" "2024-05-08"
 "index.md" "a02c9c785ed98ddd84fe3d34ddb12fcd" "site/built/index.md" "2022-04-22"
 "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2022-04-22"
 "notes.md" "7f1c9fbd8d8ae2649784ae52350da772" "site/built/notes.md" "2024-03-27"
 "episodes/introduction.md" "6c55d31b41d322729fb3276f8d4371fc" "site/built/introduction.md" "2023-07-24"
+"episodes/2-understanding-v6.md" "9cf5c1846c5375507186d3200d5cee5b" "site/built/2-understanding-v6.md" "2024-05-08"
 "instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2023-03-16"
 "learners/reference.md" "1c7cc4e229304d9806a13f69ca1b8ba4" "site/built/reference.md" "2023-03-16"
 "learners/setup.md" "5456593e4a75491955ac4a252c05fbc9" "site/built/setup.md" "2024-01-26"