The vantage6-node requires a configuration file to run. This is a
yaml file with a specific format.
The next sections describes how to configure the node. It first provides a few quick answers on setting up your node, then shows an example of all configuration file options, and finally explains where your vantage6 configuration files are stored.
2.4.1. How to create a configuration file#
The easiest way to create an initial
configuration file is via:
vnode new. This allows you to configure the
basic settings. For more advanced configuration options, which are listed below,
you can view the example configuration file.
2.4.2. Where is my configuration file?#
To see where your configuration file is located, you can use the following command
This command will not work if you have put your configuration file in a
custom location. Also, you may need to specify the
if you put your configuration file in the
2.4.3. All configuration options#
The following configuration file is an example that intends to list all possible configuration options.
You can download this file here:
application: # API key used to authenticate at the server. api_key: *** # URL of the vantage6 server server_url: https://petronas.vantage6.ai # port the server listens to port: 443 # API path prefix that the server uses. Usually '/api' or an empty string api_path: '' # subnet of the VPN server vpn_subnet: 10.76.0.0/16 # add additional environment variables to the algorithm containers. # this could be usefull for passwords or other things that algorithms # need to know about the node it is running on # OPTIONAL algorithm_env: # in this example the environment variable 'player' has # the value 'Alice' inside the algorithm container player: Alice # specify custom Docker images to use for starting the different # components. # OPTIONAL images: node: harbor2.vantage6.ai/infrastructure/node:petronas alpine: harbor2.vantage6.ai/infrastructure/alpine vpn_client: harbor2.vantage6.ai/infrastructure/vpn_client network_config: harbor2.vantage6.ai/infrastructure/vpn_network # path or endpoint to the local data source. The client can request a # certain database by using its label. The type is used by the # auto_wrapper method used by algorithms. This way the algorithm wrapper # knows how to read the data from the source. The auto_wrapper currently # supports: 'csv', 'parquet', 'sql', 'sparql', 'excel', 'omop'. If your # algorithm does not use the wrapper and you have a different type of # data source you can specify 'other'. databases: - label: default uri: D:\data\datafile.csv type: csv # end-to-end encryption settings encryption: # whenever encryption is enabled or not. This should be the same # as the `encrypted` setting of the collaboration to which this # node belongs. enabled: false # location to the private key file private_key: /path/to/private_key.pem # Define who is allowed to run which algorithms on this node. policies: # Control which algorithm images are allowed to run on this node. This is # expected to be a valid regular expression. allowed_algorithms: - ^harbor2.vantage6.ai/[a-zA-Z]+/[a-zA-Z]+ - myalgorithm.ai/some-algorithm # Define which users are allowed to run algorithms on your node by their ID allowed_users: - 2 # Define which organizations are allowed to run images on your node by # their ID or name allowed_organizations: - 6 - root # credentials used to login to private Docker registries docker_registries: - registry: docker-registry.org username: docker-registry-user password: docker-registry-password # Create SSH Tunnel to connect algorithms to external data sources. The # `hostname` and `tunnel:bind:port` can be used by the algorithm # container to connect to the external data source. This is the address # you need to use in the `databases` section of the configuration file! ssh-tunnels: # Hostname to be used within the internal network. I.e. this is the # hostname that the algorithm uses to connect to the data source. Make # sure this is unique and the same as what you specified in the # `databases` section of the configuration file. - hostname: my-data-source # SSH configuration of the remote machine ssh: # Hostname or ip of the remote machine, in case it is the docker # host you can use `host.docker.internal` for Windows and MacOS. # In the case of Linux you can use `172.17.0.1` (the ip of the # docker bridge on the host) host: host.docker.internal port: 22 # fingerprint of the remote machine. This is used to verify the # authenticity of the remote machine. fingerprint: "ssh-rsa ..." # Username and private key to use for authentication on the remote # machine identity: username: username key: /path/to/private_key.pem # Once the SSH connection is established, a tunnel is created to # forward traffic from the local machine to the remote machine. tunnel: # The port and ip on the tunnel container. The ip is always # 0.0.0.0 as we want the algorithm container to be able to # connect. bind: ip: 0.0.0.0 port: 8000 # The port and ip on the remote machine. If the data source runs # on this machine, the ip most likely is 127.0.0.1. dest: ip: 127.0.0.1 port: 8000 # Settings for the logger logging: # Controls the logging output level. Could be one of the following # levels: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET level: DEBUG # Filename of the log-file, used by RotatingFileHandler file: my_node.log # whenever the output needs to be shown in the console use_console: true # The number of log files that are kept, used by RotatingFileHandler backup_count: 5 # Size kb of a single log file, used by RotatingFileHandler max_size: 1024 # format: input for logging.Formatter, format: "%(asctime)s - %(name)-14s - %(levelname)-8s - %(message)s" datefmt: "%Y-%m-%d %H:%M:%S" # directory where local task files (input/output) are stored task_dir: C:\Users\<your-user>\AppData\Local\vantage6\node\mydir # Whether or not your node shares some configuration (e.g. which images are # allowed to run on your node) with the central server. This can be useful # for other organizations in your collaboration to understand why a task # is not completed. Obviously, no sensitive data is shared. Default true share_config: true
We use DTAP for key environments. In short:
dev: Development environment. It is ok to break things here
test: Testing environment. Here, you can verify that everything works as expected. This environment should resemble the target environment where the final solution will be deployed as much as possible.
acc: Acceptance environment. If the tests were successful, you can try this environment, where the final user will test his/her analysis to verify if everything meets his/her expectations.
prod: Production environment. The version of the proposed solution where the final analyses are executed.
You can also specify the key
application if you do not want to specify
one of the environments. This is also done in the example configuration
2.4.4. Configuration file location#
The directory where the configuration file is stored depends on your operating system (OS). It is possible to store the configuration file at system or at user level. By default, node configuration files are stored at user level, which makes this configuration available only for your user.
The default directories per OS are as follows:
vnode looks in these directories by default. However, it is
possible to use any directory and specify the location with the
flag. But note that doing that requires you to specify the
flag every time you execute a
Similarly, you can put your node configuration file in the system folder
by using the
--system flag. Note that in that case, you have to specify
--system flag for all
As a data owner it is important that you take the necessary steps to protect your data. Vantage6 allows algorithms to run on your data and share the results with other parties. It is important that you review the algorithms before allowing them to run on your data.
Once you approved the algorithm, it is important that you can verify that the approved algorithm is the algorithm that runs on your data. There are two important steps to be taken to accomplish this:
Set the (optional)
allowed_imagesoption in the node-configuration file. You can specify a list of regex expressions here. Some examples of what you could define:
^harbor2.vantage6.ai/[a-zA-Z]+/[a-zA-Z]+: allow all images from the vantage6 registry
^harbor2.vantage6.ai/algorithms/glm: only allow the GLM image, but all builds of this image
a1e597d83a47fac21d6af3: allows only this specific build from the GLM image to run on your data
DOCKER_CONTENT_TRUSTto verify the origin of the image. For more details see the documentation from Docker.
DOCKER_CONTENT_TRUST you might not be able to use
certain algorithms. You can check this by verifying that the images you want
to be used are signed.
In case you are using our Docker repository you need to use harbor2.vantage6.ai as harbor.vantage6.ai does not have a notary.
To configure the logger, look at the logging section in the example configuration file in All configuration options.
vnode files: shows you where the log file is stored
vnode attach: shows live logs of a running server in your current console. This can also be achieved when starting the node with
vnode start --attach