Store¶
The vantage6 store is a component that stores the algorithms that can be used by the nodes. This allows the user to easily find the algorithm they need, and know how to use it.
There is a community algorithm store hosted at https://store.uluru.vantage6.ai. This store is maintained by the vantage6 community and allows you to easily reuse algorithms developed by others. You can also create your own algorithm store. This allows you to create a private algorithm store, which is only available to your own collaborations.
If you would like to contribute to the community store, you should first check the production-ready algorithm guidelines to see if you meet the requirements. If you do, you can send an email to Frank Martin or Bart van Beusekom to acquire an account and upload your algorithm. The algorithm will go through a review process before it is added to the community store.
Linking algorithm store to HQ¶
Algorithm stores can be linked to a vantage6 HQ or to a specific collaboration on an HQ. That way, the algorithms in the store become easily available to the users - either to all users registered on that HQ, or to all users in the specific collaboration.
Users can link algorithm stores to a collaboration if they have permission to modify
that collaboration. Algorithm stores can only be linked to an HQ by users that have
permission to modify all collaborations on the HQ. To link an algorithm store, go to
the collaboration settings page on the UI or use the Python client function
client.store.create().
Configuration options¶
The algorithm store requires a configuration file to run. This is a yaml file with
a specific format.
The next sections describes how to configure the algorithm store. It first provides a few quick answers on setting up your store, then shows an example of all configuration file options, and finally explains where your configuration files are stored.
How to create a configuration file¶
The easiest way to create an initial configuration file is via:
v6 algorithm-store new. This allows you to configure the
basic settings. For more advanced configuration options, which are listed below,
you can view the example configuration file.
Where is my configuration file?¶
To see where your configuration file is located, you can use the following command
v6 algorithm-store files
Warning
This command will only work for if the algorithm store has been deployed
using the v6 commands.
Also, note that on local deployments you may need to specify the
--user flag if you put your configuration file in the
user folder.
You can also create and edit this file manually.
All configuration options¶
The following configuration file is an example that intends to list all possible configuration options.
You can download this file here: algorithm_store_config.yaml
# The namespace in which the algorithm store is deployed. By default, the namespace is
# set to Release.Namespace (which is vantage6 by default).
namespace: vantage6-store
store:
# Whether the algorithm store is deployed standalone or as part of a vantage6 hub.
standalone: false
# Human readable description of the algorithm store instance. This is to help
# your peers to identify the store
description: Vantage algorithm store version 5
# The number of replicas of the algorithm store. By default, the number of replicas is
# set to 1.
replications: 1
# API path prefix. Store may be reached at https://mystore.org/<api_path>/<endpoint>.
api_path: /store
# The image used to start the algorithm store.
image: ghcr.io/vantage6/infrastructure/algorithm-store:latest
# Node image pull policy. Possible values are: Always (default), IfNotPresent, Never
imagePullPolicy: Always
# The internal port on the container pod that the store runs on
internal:
port: 7602
# The port to expose the store on in the cluster
port: 7602
# Keycloak configuration. Values here override the global Keycloak settings
# in the hub configuration (`global.keycloak`) for the algorithm store.
keycloak:
url: https://my-vantage6-auth.org
realm: vantage6
adminUsername: admin
adminPassword: admin
adminClient: backend-admin-client
adminClientSecret: myadminclientsecret
# The URI of the vantage6 HQ that provides the admin user that will be admin
# user in the store on startup. Note that if set, this overrides the global HQ address
# in the hub configuration
vantage6HQUri: https://my_hq.org/my_api_path
policies:
# Set who is allowed to view the algorithms in the store. Possible values are:
# - "public": everyone can view the algorithms
# - "authenticated": users with a token valid for this store can view the algorithms
# - "private": only users with explicit permission in the algorithm store can view
# the algorithms
algorithmView: public
# Set the minimum number of reviewers that need to approve an algorithm before it
# is available in the store. In case this number is lower than
# min_reviewing_organizations, the min_reviewing_organizations still has to be met
# to proceed with the review process.
minReviewers: 2
# Define whether or not developers are able to assign reviewers to their own
# algorithms.
assignReviewOwnAlgorithm: false
# Define the minimum amount of organizations that must be involved in the review
# process.
minReviewingOrganizations: 2
# Specify the users that are allowed to review algorithms. This is a list of
# usernames that identify unique users. This policy works in combination with the
# permission system. If this policy is not set, all users with the right permissions
# are allowed to review algorithms.
allowedReviewers:
- some_username
# Specify the users that are allowed to assign reviews. This is a list of usernames
# that identify unique users. This policy works in combination with the permission
# system. If this policy is not set, all users with the right permissions are
# allowed to assign reviews.
allowedReviewAssigners:
- some_username
# set up with which origins the store should allow CORS requests. The default
# is to allow all origins. If you want to restrict this, you can specify a list
# of origins here. Below are examples to allow requests from the Uluru UI, and
# port 7600 on localhost. Usually, only the UI needs to access the store.
cors_allowed_origins:
- https://portal.uluru.vantage6.ai
- http://localhost:7600
# Credentials used to login to private Docker registries. These credentials are used
# to e.g. find the digests of the algorithm.
private_docker_registries:
- registry: docker-registry.org
username: docker-registry-user
password: docker-registry-password
# development mode settings. Only use when running both the algorithm store and
# the HQ that it communicates with locally
dev:
# Specify the URI to the host. This is used to generate the correct URIs to
# communicate with HQ. On Windows and Mac, when using Docker Desktop, you
# can use the special hostname `host.docker.internal` to refer to the host machine.
# On Linux, you should normally use http://172.17.0.1.
host_uri: http://host.docker.internal
# disable review process - all submitted algorithms are automatically accepted, which
# can be useful while developing algorithms locally. By default, the review process
# is enabled.
disable_review: false
# Define whether or not developers are able to review their own algorithms. For
# production, this is not recommended, but it can facilitate development. By default,
# this is disabled.
review_own_algorithm: false
# Whether to forward ports to host locally using kubernetes or not, using the
# NodePort feature of kubernetes. If set to false, the ports will
# not be forwarded. In production, this should be false and you are responsible
# for forwarding the ports with your own ingress, gateway or load balancer.
# In `v6 dev`, this is also set to false as ports are forwarded by devspace.
forward_ports: true
# If forward_ports is true, this is the port that will be exposed on the host.
local_port_to_expose: 30762
logging:
# Controls the logging output level. Could be one of the following
# levels: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET
level: DEBUG
# Location to the log file
file: store.log
# Size in kb of a single log file
max_size: 1024
use_console: true
# Date format for the log file
datefmt: "%Y-%m-%d %H:%M:%S"
# Format for the log file
format: "%(asctime)s - %(name)-14s - %(levelname)-8s - %(message)s"
# Storage configuration for logs. Storage size is set to 128M by default,
# and storage class is k8s storage class that is used.
storageSize: "128M"
storageClass: "local-storage"
# Host path for storing the logs (required for local-storage PV)
volumeHostPath: "/var/log/vantage6-store"
# Maximum number of log files to keep. Log files are rotated when the size of the
# current log file exceeds `max_size`.
backup_count: 5
# Loggers to include in the log file
loggers:
- level: warning
name: urllib3
- level: warning
name: socketIO-client
- level: warning
name: socketio.server
- level: warning
name: engineio.server
- level: warning
name: sqlalchemy.engine
database:
# whether or not to use an external database
external: false
# the URI of the external database. Only used if external is set to true.
uri: postgres://vantage6:vantage6@localhost:5432/vantage6
# The image of the database. Should only be set if external is set to false.
image:
repository: postgres
tag: 13
# The username of the database. Should only be set if external is set to false.
username: vantage6
# The password of the database. Should only be set if external is set to false.
password: vantage6
# The name of the database. Should only be set if external is set to false.
name: vantage6
# Hostpath of the database mount. Only used if external is set to false.
volumePath: /mnt/data_store
# The name of the k8s node where the database is running
k8sNodeName: docker-desktop
Configuration file location¶
The directory where to store the configuration file depends on your
operating system (OS). It is possible to store the configuration file at
system or at user level. At the user level, configuration files are only
available for your user. By default, algorithm store configuration files are
stored at system level - except if you have created a sandbox environment using the
v6 sandbox commands.
The default directories per OS are as follows:
OS |
System |
User |
|---|---|---|
Windows |
|
|
MacOS |
|
|
Linux |
|
|
Warning
The command v6 algorithm-store looks in certain directories by default. It is
possible to use any directory and specify the location with the --config
flag. However, note that using a different directory requires you to specify
the --config flag every time!
Similarly, you can put your algorithm store configuration file in the user folder
by using the --user flag. Note that in that case, you have to specify
the --user flag for all v6 algorithm-store commands.
Permissions¶
Policies¶
Algorithm store policies are defined by the algorithm store administrator in the configuration file and determine the general permission and access rules for the algorithm store. Arguably, the most important policy is who is allowed to view the algorithms in the store. For the community store, this is set to public, meaning that anyone can view the algorithms. For a private store, this can be set to private, meaning that only authorized users can view the algorithms.
Other examples of algorithm store policies are e.g. setting how many reviewers are required to approve an algorithm, or whether algorithm reviewers should be from a different organization as the algorithm developer or not.
Permission management¶
Apart from the policies, there are also access rules at the user level. Rules are used to determine the actions that a user is allowed to take in the algorithm store.
In order to perform operations in the algorithm store, a user must be registered in the algorithm store and must be authenticated. Then, rules can be assigned to the user to give them the necessary permissions.
Just like in the vantage6 HQ, in the algorithm store rules are used to allow or prevent a user from performing an operation. An operation is an action that can be performed on a resource of the algorithm store. The following operations are defined:
Create
Delete
Edit
View
These operations can be performed on the available resources according to the following schema:
Resource |
View |
Create |
Edit |
Delete |
Algorithm |
✅ |
✅ |
✅ |
✅ |
User |
✅ |
✅ |
✅ |
✅ |
Role |
✅ |
✅ |
✅ |
✅ |
Review |
✅ |
✅ |
✅ |
✅ |
Rules can be assigned to a user by another user who has at least the same permission level as the rules assigned. Single rules can be assigned, but default combinations of rules are available, as roles. The following default roles available in the algorithms store:
Root: Has all permissions.
Developer: Can submit new algorithms to the store and edit them before they are reviewed.
Algorithm Manager: Can assign reviewers to new algorithms, and manage algorithms. Whenever a new algorithm is submitted, users with permission to register new reviews are alerted, so users with this role as well as the root role will be alerted to assign reviewers (if an email server has been set up).
Reviewer: Can approve or reject algorithms that they have been requested to review.
Viewer: Can view all resources in the store.
Store Manager: Can manage the store’s users and their permissions.
Note that all default roles have permission to view all resources. To give an example, the permissions of a reviewer are shown below.
Resource |
View |
Create |
Edit |
Delete |
Algorithm |
✅ |
❌ |
❌ |
❌ |
User |
✅ |
❌ |
❌ |
❌ |
Role |
✅ |
❌ |
❌ |
❌ |
Review |
✅ |
❌ |
✅ |
❌ |
Store processes¶
The algorithm store manages the lifecycle of vantage6 algorithms, from its initial submission by the algorithm developer to the running of the algorithm and finally its replacement by a newer version. Here, we give an overview of these processes.
Algorithm submission¶
The first step in the lifecycle of an algorithm is its submission to the algorithm store.
An algorithm developer can do this via the algorithm store section of the UI or by using
the Python client’s command client.algorithm.create(). The algorithm developer needs
to provide data such as a name, description, where to find the code and the algorithm
image, and which functions the algorithm provides and how to call them.
Each function of the algorithm is described, apart from its name and description, by the following fields:
Parameters: A list of parameters that the function expects. Each parameter has a name, a description, and a type. For example, if you want to compute an average, a parameter could be a column name. Apart from standard data types like integers, strings and booleans, vantage6 also supports organizations and columns as parameter types. When using these types, the user interface knows to show a dropdown with the available organizations or columns.
Databases: A list of databases that the function expects. Most algorithms use a single database, but some algorithms might need multiple databases (e.g. one with patient data and another with population data). Each database has a name and a description. The user interface will show a dropdown with the available databases when the user needs to select a database.
Visualizations: A list of visualizations that the function can produce. Each visualization has a name, a description, and a type. When viewing the results of an algorithm run in the UI, the UI will attempt to plot the results if a visualization is available. Depending on the visualization type, additional data might be required. For instance, for a line graph, the algorithm developer can set the x-axis and y-axis columns that should be visualized.
Note that there is a utility command in the CLI v6 algorithm generate-store-json that
can be used to generate a JSON file that can be used to import the algorithm into the
store. This is easier than manually submitting the algorithm, but to be most useful, it
requires you to have proper docstrings in your algorithm code.
Algorithm review¶
After an algorithm is submitted, it needs to undergo a review process. At this point, a store manager will assign reviewers for this particular algorithm. Reviewers are recommended to assess the algorithm using the algorithm-review-checklist. The reviewers can then view the algorithm and provide feedback. If the algorithm is approved, it will be easily runnable by researchers using the UI. While the algorithm is under review, it is not yet available for running tasks via the UI. If any of the reviewers rejects the algorithm, or the store manager does not assign reviewers, the algorithm will not become available. Note that an algorithm may still be run via the Python client, but a node configured to allow only algorithms from a certain algorithm store will not accept algorithms that are not yet approved in that store.
The reviewer can provide comments to the developer when rejecting an algorithm. If the algorithm is rejected, the process is repeated as soon as the developer submits an improved version of the algorithm.
If your algorithm store has been configured with an email SMTP server, emails will be sent all along the process to alert users that, for instance, their review is requested.
Regularly, a developer has submitted an update to an algorithm that was already approved. In such cases, when the changes are approved, the algorithm store will invalidate the previous version of the algorithm. This means that the previous version can then no longer be used to run tasks.
At any point, the store manager can also invalidate an approved algorithm without replacing it with a new version. This is a safeguard to ascertain that algorithms can quickly be removed from the store if necessary.
Everyone involved in the process (developers, store manager, reviewers, and researchers) can only execute their actions after logging into vantage6. Note that one user may fulfill several roles: a store manager may also be an algorithm reviewer. There are however some restrictions, that can be set up in the algorithm store policies.
Recommended algorithm store policies¶
Setting up an algorithm store is useful to collect all algorithms that are relevant to your project. By having an algorithm store, you can choose to only make algorithms from this store available to the researchers in your project.
The algorithm store allows for the definition of several policies around algorithm review. The following policies are recommended when using the algorithm on sensitive data:
Each algorithm must be reviewed by at least two reviewers.
The reviewers must not be involved in the development of the algorithm.
At least one reviewer is a member of a different organization than the developer.
If the developer also has the store manager role, they should not be allowed to assign reviews for their own algorithms.
If a store manager also has the reviewer role, they may be allowed to assign themselves as a reviewer.
To ascertain that these policies are followed, they can be enforced by the algorithm store wherever possible. For instance, if an algorithm must be reviewed by at least two reviewers, an algorithm will simply not become available when just a single reviewer approves the algorithm. However, not all policies can be fully enforced from the software: if developers A and B collaborate on an algorithm, and A submits it, the algorithm store does not know that B was involved in developing the algorithm, so it will not be able to prevent assignment of B as a reviewer.
Together, the recommended policies ensure that at least three authenticated researchers (a developer and two reviewers) of at least two different institutes in your project must be involved to approve an algorithm. By involving three trusted researchers in the process, the risk of approving an inadequate algorithm is minimized.
Note
The policies above could be implemented in the algorithm store configuration file as follows:
policies:
min_reviewers: 2
assign_review_own_algorithm: false
min_reviewing_organizations: 2
# ... <other policies>
See the algorithm store configuration file section for more details on the available policies.