Glossary
========
The following is a list of definitions used in vantage6.
**A**
- **Algorithm**: a piece of code that performs a specific task.
- **Algorithm store**: A repository of algorithms, which can be coupled to specific
collaborations or all collaborations on a server.
- **API**: Application Programming Interface, a set of routines, protocols, and tools
for building software applications.
- **Authentication**: the process of verifying the identity of a user.
- **Authorization**: the process of verifying the permissions of a user.
- **Autonomy:** the ability of a party to be in charge of the control and management of
its own data.
**C**
- **Central function**: The orchestration part of an algorithm that coordinates and
aggregates results from partial functions.
- **Child container**: A container created by an algorithm container to perform
subtasks, typically a *federated function*
- **Client**: A vantage6 user or application that communicates with the
vantage6-server.
- **Collaboration**: an agreement between two or more parties to participate in a study
(i.e., to answer a research question).
- **Computation task**: A (vantage6) task that performs a computation on a DataFrame.
- **Container**: A lightweight, standalone, executable package of software that
includes everything needed to run it.
**D**
- **DataFrame**: A standardized representation of data in a session that can be used
for computation tasks.
- **Data extraction task**: A (vantage6) task that extracts data from a database.
This data can be used for computation tasks (e.g., a preprocessing task or a
computation task).
- **Data Station**: A vantage6 *node* that has access to the local data.
- **Distributed learning**: see *Federated Learning* and *Federated Analytics*
- **Docker:** a platform that uses operating system virtualization to deliver software
in packages called *containers*. It is worth noting that although they are often
confused, `Docker containers are not virtual machines `__.
- **Docker registry**: A repository of *images*. In vantage6, both algorithms and
the infrastructure itself are stored as *images* in the *Docker registry*. Images
are used to create *containers*.
**E**
- **End-to-end encryption**: A method of encoding data so that it can only be decoded
by the intended recipient. In vantage6, end-to-end encryption is used to encrypt
data in transit between two nodes or between a node and a client.
**F**
- **FAIR data**: data that are Findable, Accessible, Interoperable, and
Reusable. For more information, see `the original
paper `__.
- **Federated Analytics**: an approach for analyzing data that are
spread across different parties using traditional statistical methods. The main
idea is that parties run computations on their local data, yielding aggregated
parameters. These are then shared to generate a global (statistical) model.
- **Federated learning**: an approach for analyzing data that are
spread across different parties using machine learning methods. The main
idea is that parties run computations on their local data, yielding
aggregated parameters. These are then shared to generate a global (statistical)
model.
- **Federated function**: A function that is executed on the local data of a party.
It is a part of a *federated Analytics* or *federated learning* algorithm.
**H**
- **Heterogeneity**: the condition in which in a federated learning scenario, parties
are allowed to have differences in hardware and software (e.g., operating systems).
- **Horizontally-partitioned data**: data spread across different parties where the
latter have the same features of different instances (e.g., patients). See also
vertically-partitioned data.
.. figure:: /images/horizontal_partition.png
:alt: Horizontally partitioned data
:align: center
Horizontally-partitioned data
- **Horizontal scaling**: the ability of a system to handle an increasing amount of
requests by creating more instances of itself.
**I**
- **Image**: A blueprint for a *container*, which can be stored in a *Docker registry*.
**J**
- **JWT Token**: A JSON Web Token used for authentication and authorization in the
system.
**K**
- **Kubernetes**: An open-source system for automating deployment, scaling, and
operations of application containers across clusters of hosts. Vantage6 is
built on top of Kubernetes to run the vantage6-server, nodes and algorithm
containers.
**N**
- **Node**: vantage6 node application that runs at a **Data Station** which has access
to the local data.
**M**
- **Multi-party computation**: an approach to perform analyses across
different parties by performing operations on encrypted data.
**P**
- **Partial function**: The federated part of an algorithm that runs on local data
at nodes.
- **Party**: an entity that takes part in one (or more) collaborations. In vantage6
a party is an organization.
- **Permission scope**: The level of access granted to users for viewing and modifying
dataframes (personal, organization, or collaboration level).
- **Pre-processing task**: A task that modifies dataframes by adding or removing
columns, or filtering rows.
- **Privacy-enhancing Technology (PET)**: Technologies that enable privacy-preserving
analyses on federated data. This includes technologies such as differential
privacy, secure multi-party computation, and federated analytics/learning.
- **Python**: a high-level general purpose programming language. It
aims to help programmers write clear, logical code. vantage6 is
`written in Python `__.
**R**
- **RSA keys**: Cryptographic keys used for encryption and decryption of data between
organizations.
**S**
- **Secure multi-party computation**: see *Multi-party computation*
- **Server**: Public access point of the vantage6 infrastructure. Contains at
least the **vantage6-server** application but can also host the optional
components: Docker registry, VPN server and RabbitMQ. In this documentation
space we try to be explicit when we talk about *server* and
*vantage6 server*, however you might encounter *server* where
*vantage6 server* should have been.
- **Session**: A way to prepare a dataset that can be reused in many computation
tasks, especially useful for large datasets and flexible preprocessing.
- **Study**: A study is a subgroup of organizations within a collaboration.
**T**
- **Task**: A task is a request from a client to the vantage6-server to execute an
algorithm. Is is the main unit of work in vantage6.
- **TOTP (Time-based One-Time Password)**: A form of two-factor authentication where
users generate time-based codes using an authenticator app.
- **Two-factor authentication**: A method of authentication that requires two
forms of identification.
**V**
- **vantage6**: priVAcy preserviNg federaTed leArninG infrastructurE
for Secure Insight eXchange. In short, vantage6 is an infrastructure
for executing federated learning analyses. However, it can also be
used as a FAIR data station and as a model repository.
- **Vertically-partitioned data**: data spread across different parties
where the latter have different features of the same instances (i.e.,
patients). See also horizontally-partitioned data.
.. figure:: /images/vertical_partition.png
:alt: Vertically partitioned data
:align: center
Vertically partitioned data
**W**
- **Wrapper**: A library that simplifies and standardizes the interaction between the
node and algorithm container, handling data reading and writing operations.
- **Whitelist**: A list of allowed domains, ports, and IP addresses that algorithms
can access.
.. todo Add references to sections of the docs where to find info on them