9. Glossary#

The following is a list of definitions used in vantage6.

A

  • Autonomy: the ability of a party to be in charge of the control and management of its own data.

C

  • Collaboration: an agreement between two or more parties to participate in a study (i.e., to answer a research question).

D

  • Distributed learning: see Federated Learning

  • Docker: a platform that uses operating system virtualization to deliver software in packages called containers. It is worth noting that although they are often confused, Docker containers are not virtual machines.

  • Data Station: Virtual Machine containing the vantage6-node application and a database.

F

  • FAIR data: data that are Findable, Accessible, Interoperable, and Reusable. For more information, see the original paper.

  • Federated learning: an approach for analyzing data that are spread across different parties. Its main idea is that parties run computations on their local data, yielding either aggregated parameters or encrypted values. These are then shared to generate a global (statistical) model. In other words, instead of bringing the data to the algorithms, federated learning brings the algorithms to the data. This way, patient-sensitive information is not disclosed. Federated learning is some times known as distributed learning. However, we try to avoid this term, since it can be confused with distributed computing, where different computers share their processing power to solve very complex calculations.

H

  • Heterogeneity: the condition in which in a federated learning scenario, parties are allowed to have differences in hardware and software (i.e., operating systems).

  • Horizontally-partitioned data: data spread across different parties where the latter have the same features of different instances (i.e., patients). See also vertically-partitioned data.

Horizontally partitioned data

Fig. 9.1 Horizontally-partitioned data#

N

  • Node: vantage6 node application that runs at a Data Station which has access to the local data.

M

  • Multi-party computation: an approach to perform analyses across different parties by performing operations on encrypted data.

P

  • Party: an entity that takes part in one (or more) collaborations

  • Python: a high-level general purpose programming language. It aims to help programmers write clear, logical code. vantage6 is written in Python.

S

  • Secure multi-party computation: see Multi-party computation

  • Server: Public access point of the vantage6 infrastructure. Contains at least the vantage6-server application but can also host the optional components: Docker registry, VPN server and RabbitMQ. In this documentation space we try to be explicit when we talk about server and vantage6 server, however you might encounter server where vantage6 server should have been.

V

  • vantage6: priVAcy preserviNg federaTed leArninG infrastructurE for Secure Insight eXchange. In short, vantage6 is an infrastructure for executing federated learning analyses. However, it can also be used as a FAIR data station and as a model repository.

  • Vertically-partitioned data: data spread across different parties where the latter have different features of the same instances (i.e., patients). See also horizontally-partitioned data.

Vertically partitioned data

Fig. 9.2 Vertically partitioned data#