8. Glossary

The following is a list of definitions used in vantage6.

A

  • Algorithm: a piece of code executed at the node to perform a specific task. The most well-known usage is to compute an analysis on the local date, but algorithm functions are also used to extract data from the node data sources and to preprocess the data.

  • Algorithm store: A repository of algorithms, which can be coupled to specific collaborations or all collaborations on a vantage6 HQ.

  • API: Application Programming Interface, a set of routines, protocols, and tools for building software applications.

  • Authentication: the process of verifying the identity of a user.

  • Authorization: the process of verifying the permissions of a user.

  • Autonomy: the ability of a party to be in charge of the control and management of its own data.

C

  • Central function: The orchestration part of an algorithm that coordinates and aggregates results from partial functions.

  • Child container: A container created by an algorithm container to perform subtasks, typically a federated function

  • Client: A vantage6 user or application that communicates with the vantage6 hub.

  • Collaboration: an agreement between two or more parties to participate in a project (i.e., to answer a research question).

  • Computation task: A task that performs a computation on a DataFrame.

  • Container: A lightweight, standalone, executable package of software that includes everything needed to run it.

D

  • Dataframe: A standardized representation of data in a session that can be used for computation tasks.

  • Data extraction task: A task that extracts data from a database. This data can be used for computation tasks (e.g., a preprocessing task or a computation task).

  • Data Station: A vantage6 node that has access to the local data.

  • Distributed learning: see Federated Learning and Federated Analytics

  • Docker: a platform that uses operating system virtualization to deliver software in packages called containers. It is worth noting that although they are often confused, Docker containers are not virtual machines.

  • Docker registry: A repository of images. In vantage6, both algorithms and the infrastructure itself are stored as images in the Docker registry. Images are used to create containers.

E

  • End-to-end encryption: A method of encoding data so that it can only be decoded by the intended recipient. In vantage6, end-to-end encryption is used to encrypt data in transit between two nodes or between a node and a client.

F

  • FAIR data: data that are Findable, Accessible, Interoperable, and Reusable. For more information, see the original paper.

  • Federated analytics: an approach for analyzing data that are spread across different parties using traditional statistical methods. The main idea is that parties run computations on their local data, yielding aggregated parameters. These are then shared to generate a global (statistical) model.

  • Federated learning: an approach for analyzing data that are spread across different parties using machine learning methods. The main idea is that parties run computations on their local data, yielding aggregated parameters. These are then shared to generate a global (statistical) model.

  • Federated function: A function that is executed on the local data of a party. It is a part of a federated analytics or federated learning algorithm.

H

  • Heterogeneity: the condition in which in a federated learning scenario, parties are allowed to have differences in hardware and software (e.g., operating systems).

  • Horizontally-partitioned data: data spread across different parties where the latter have the same features of different instances (e.g., patients). See also vertically-partitioned data.

Horizontally partitioned data

Fig. 8.1 Horizontally-partitioned data

  • Horizontal scaling: the ability of a system to handle an increasing amount of requests by creating more instances of itself.

  • HQ: Public access point of the vantage6 infrastructure. Contains at least the vantage6-hq application. Optional components such as RabbitMQ message broker and Prometheus monitoring can be made available in the same deployment.

  • Headquarters: see HQ

  • Hub: All central components of the vantage6 infrastructure. Together with the nodes, it forms the vantage6 network. The Hub always contains the vantage6 HQ application and an authentication service. It may also contain other components, such as the algorithm store and the user interface.

I

  • Image (as in container image). A blueprint for a container, which can be stored in an image registry (e.g. a Docker registry).

J

  • JWT Token: A JSON Web Token used for authentication and authorization in the system.

K

  • Kubernetes: An open-source system for automating deployment, scaling, and operations of application containers across clusters of hosts. Vantage6 uses Kubernetes’ container orchestration capabilities to deploy the hub and the nodes.

N

  • Node: vantage6 node application that runs at a Data Station which has access to the local data.

M

  • Multi-party computation: an approach to perform analyses across different parties by performing operations on encrypted data.

P

  • Partial function: The federated part of an algorithm that runs on local data at nodes.

  • Party: an entity that takes part in one (or more) collaborations. In vantage6 a party is an organization.

  • Permission scope: The level of access granted to users for viewing and modifying dataframes (personal, organization, or collaboration level).

  • Preprocessing task: A task that modifies dataframes by adding or removing columns, or filtering rows.

  • Privacy-enhancing Technology (PET): Technologies that enable privacy-preserving analyses on federated data. This includes technologies such as differential privacy, secure multi-party computation, and federated analytics/learning.

  • Python: a high-level general purpose programming language. It aims to help programmers write clear, logical code. Vantage6 is mostly written in Python.

R

  • RSA keys: Cryptographic keys used for encryption and decryption of data between organizations.

S

  • Secure multi-party computation: see Multi-party computation

  • Session: A way to prepare a dataset that can be reused in many computation tasks, especially useful for large datasets and flexible preprocessing.

  • Study: A study is a subgroup of organizations within a collaboration.

T

  • Task: A task is a request from a client to the vantage6 HQ to execute an algorithm. Is is the main unit of work in vantage6.

  • TOTP (Time-based One-Time Password): A form of two-factor authentication where users generate time-based codes using an authenticator app.

  • Two-factor authentication: A method of authentication that requires two forms of identification.

U

  • UI: User Interface, the web application that allows users to interact with the vantage6 hub.

V

  • vantage6: an infrastructure for executing federated learning analyses. It can also be used as a FAIR data station and as a model repository.

  • Vertically-partitioned data: data spread across different parties where the latter have different features of the same instances (i.e., patients). See also horizontally-partitioned data.

Vertically partitioned data

Fig. 8.2 Vertically partitioned data

W

  • Wrapper: A library that simplifies and standardizes the interaction between the node and algorithm container, handling data reading and writing operations.

  • Whitelist: A list of allowed domains, ports, and IP addresses that algorithms can access.