1 - Getting Started with Omni

A short guide on setting up a Talos Linux cluster with Omni.

In this Getting Started guide we will create a high availability Kubernetes cluster in Omni. This guide will use UTM/QEMU, but the same process will work with bare metal machines, cloud instances, and edge devices.

Prerequisites

Network access

If your machines have outgoing access, you are all set. At a minimum all machines should have outgoing access to the Wireguard endpoint shown on the Home panel, which lists the IP address and UDP port that machines should be able to reach. Machines need to be able to reach that address both on the UDP port specified, and on TCP port 443.

Some virtual or physical machines

The simplest way to experience Omni is to be able to fire up virtual machines. For this tutorial, we suggest any virtualization platform that can boot off an ISO (UTM, ProxMox, Fusion, etc) although any cloud platform can also be used with minor adjustments. Bare metal can also be used, of course, but is often slower to boot and not everyone has spare physical servers around.

talosctl

talosctl is the command line tool for issuing API calls and operating system commands to machines in an Omni cluster. It is not required - cluster management is done via the Omni UI or omnictl, but talosctl can be useful to investigate the state of the nodes and explore functionality.

Download talosctl:

curl -sL https://talos.dev/install | sh

You can also download talosctl from within Omni, by selecting the “Download talosctl” button on the right hand side of the Home screen, then selecting the version and platform of talosctl desired. You should rename the downloaded file to talosctl, make it executable, and copy it to a location on your PATH.

kubectl

The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You use kubectl to deploy applications, inspect and manage cluster resources, view logs, etc.

Download kubectl via one of methods outlined in the documentation.

Omni integrates all operations (for Omni itself, Kubernetes, and Talos Linux) against the authentication configured for Omni (which may be GitHub, Google, enterprise SAML, etc.) Thus in order to use kubectl with Omni, you need to install the oidc-login plugin per the documentation.

Note: When using HomeBrew on Macs with M1 chips, there have been reports of issues with the plugin being installed to the wrong path and not being found. You may find it simpler to copy the file from GitHub and manually put the kubelogin binary on your path under the name kubectl-oidc_login so that the kubectl plugin mechanism can find it.

omnictl

omnictl is also an optional binary. Almost all cluster operations can be done via the Omni Web UI, but omnictl is used for advanced operations, to integrate Omni into CI workflows, or simply if you prefer a CLI to a UI.

Download omnictl from within Omni: on the Home tab, click the “Download omnictl” button on the right hand side, select the appropriate platform, and the “Download” button. Then ensure to rename the binary, make it executable, and copy to a location on your path. For example:

Downloads % mv omnictl-darwin-arm64 omnictl
Downloads % chmod +x omnictl
Downloads % mv omnictl /usr/local/bin

Download Installation Media

Omni is a BYO Machine platform - the only thing you need to do is boot your machines off an Omni image. The Omni image will have the necessary credentials and endpoints built in to it, that you can use to boot all your machines. To download the installation media, go to the Home screen in Omni, and select “Download Installation Media” from the right hand side. Select the appropriate media and platform type - e.g. I will select ISO (arm64) as I am going to boot a virtual machine within UTM on an apple M1.

Images exist for many platforms, but you will have to follow the specific installation instructions for that platform (which often involve copying the image to S3 type storage, creating a machine image from it, etc.)

Boot machines off the downloaded image

Create at least 1 virtual machine with 2GB of memory, but 4 are suggested, using your Hypervisor. Have each virtual machine boot off the ISO image you just downloaded, and start the virtual machines.

After a few seconds, the machines should show in the Machines panel of Omni, with the available tag. They will also have tags showing their architecture, memory, cores and other information.

Create Cluster

Click “Clusters” on the left navigation panel, then “Create Cluster” in the top right. You can give your cluster a name, select the version of Talos Linux to install, and the version of Kubernetes. You can also specify any Patches that should be applied in creating your cluster, but in most cases these are not needed to get started. There are other options on this screen - encryption, backups, machine sets, etc - but we will skip those for this tutorial.

In the section headed “Available Machines”, select at least one machine to be the control plane, by clicking CP. (Ideally, you will have 3 control plane nodes.) Select one machine to be a worker, by clicking W0 next to the machine.

Then click Create Cluster. Your cluster is now being created, and you will be taken to the Cluster Overview page. From this page you can download the kubeconfig and talosconfig files for your cluster, by clicking the buttons on the right hand side.

Access Kubernetes

You can query your Kubernetes cluster using normal kubernetes operations:

kubectl --kubeconfig ./talos-default-kubeconfig.yaml get nodes

Note: you will have to change the referenced kubeconfig file depending on the name of the cluster you created.

The first time you use the kubectl command to query a cluster, a browser window will open requiring you to authenticate with your identity provider (Google or GitHub most commonly.) If you get a message error: unknown command "oidc-login" for "kubectl" Unable to connect to the server: then you need to install the oidc-login plugin as noted above.

Access Talos commands

You can explore Talos API commands. Again, the first time you access the Talos API, a browser window will start to authenticate your request. The downloaded talosconfig file for the cluster includes the Omni endpoint, so you do not need to specify endpoints, just nodes.

talosctl --talosconfig ./talos-default-talosconfig.yaml --nodes 10.5.0.2 get members

In the above example you will need to change the name of the talosconfig file, if you changed the cluster name from the default, also the node IP, using the actual IP or name of the nodes you created (which are shown in Omni.)

Explore Omni

Now you have a complete cluster, with a high-availability Kubernetes API endpoint running on the Omni infrastructure, where all authentication is tied in to your enterprise identity provider. It’s a good time to explore all that Omni can offer, including other areas of the UI such as:

  • etcd backup and restores
  • simple cluster upgrades of Kubernetes and Operating System
  • proxying of workload HTTP access
  • simple scaling up and down of clusters
  • the concept of Machine Sets, that let you manage your infrastructure by classes

And if you are wanting to declaratively manage your clusters and infrastructure declaratively, as code, check out Cluster Templates.

Destroy the Cluster

When you are all done, you can remove the cluster by clicking “Destroy Cluster”, in the bottom right of the Cluster Overview panel. This will wipe the machine and return them to the Available state.

Cluster example

We have a complete example of a managed cluster complete with a monitoring stack and application management. It can be found in our community contrib repo.

Components

The contrib example includes:

Use

You will need to copy the contents of the omni directory to a git repository that can be accessed by the cluster you create. You will need to update the ArgoCD ApplicationSet template to reference your new git repo, and regenerate the ArgoCD bootstrap patch.

sed -i 's|https://github.com/siderolabs/contrib.git|<your-git-repo>|' apps/argocd/argocd/bootstrap-app-set.yaml
kustomize build apps/argocd/argocd | yq -i 'with(.cluster.inlineManifests.[] | select(.name=="argocd"); .contents=load_str("/dev/stdin"))' infra/patches/argocd.yaml

With these changes made you should commit the new values and push them to the git repo.

Next you should register your machines with Omni (see guides for AWS, GCP, Azure, Hetzner, and bare metal) and create machine classes to match your hardware. By default, the example cluster template is configured to use 3 instances of machine classes named omni-contrib-controlplane, and all instances that match a machines class called omni-contrib-workers. You can modify these settings in the cluster-template.yaml, but keep in mind that for Rook/Ceph to work you will need to use at least 3 instances with additional block devices for storage.

Once machines are registered you can create the cluster using the cluster template in the infra directory.

omnictl cluster template sync --file cluster-template.yaml

This should create the cluster as described, bootstrap ArgoCD, and begin installing applications from your repo. Depending on your infrastructure, it should take 5-10 mins for the cluster to come fully online with all applications working and healthy. Monitoring can be viewed directly from Omni using the workload proxy feature, with links to Grafana and Hubble found on the left-hand side of the Omni UI.

2 - Upgrading Omni Clusters

A guide to keeping your clusters up to date with Omni.

Introduction

Omni makes keeping your cluster up-to-date easy - which is good, as it is important to stay current with Talos Linux and Kubernetes releases, to ensure you are not exposed to already fixed security issues and bugs. Keeping your clusters up-to-date involves updating both the underlying operating system (Talos Linux) and Kubernetes.

Upgrading the Operating System

In order to update the Talos Linux version of all nodes in a cluster, navigate to the overview of the cluster you wish to update. (For example, click the cluster name in the Clusters panel.) If newer Talos Linux versions are available, there will be an indication in the far right, where the current cluster Talos version is listed. Clicking that icon, or the “Update Talos” button in the lower right, will allow you to select the new version of Talos Linux that should be deployed across all nodes of the cluster.

Select the new version, and then “Upgrade” (or “Downgrade”, if you are selecting an older version than currently deployed.) (Omni will ensure that the Kubernetes version running in the cluster is compatible with the selected version of Talos Linux.)

Note: the recommended upgrade path is to always upgrade to the latest patch release of all intermediate minor releases.
For example, if upgrading from Talos 1.5.0 to Talos 1.6.2, the recommended upgrade path would be:
upgrade from 1.5.0 to latest patch of 1.5 - to v1.5.5
upgrade from v1.5.5 to latest patch of 1.6 - to v1.6.2

Omni will then cycle through all nodes in the cluster, safely updating them to the selected version of Talos Linux. Omni will update the control plane nodes first. (Omni ensures the etcd cluster is healthy and will remain healthy after the node being updated leaves the etcd cluster, before allowing a control plane node to be upgraded.)

Omni will drain and cordon each node, update the OS, and then un-cordon the node. Omni always updates nodes with the Talos Linux flag --preserve=true, keeping ephemeral data.

NOTE: If any of your workloads are sensitive to being shut down ungracefully, be sure to use the lifecycle.preStop Pod spec.

Kubernetes Upgrades

As with the Talos Linux version, Omni will notify you on the right hand side of the cluster overview if there is a new version of Kubernetes available. You may click either the Upgrade icon next to the Kubernetes version, or the Update Kubernetes button on the lower right of the cluster overview. Kubernetes upgrades are done non-disruptively to workloads and are run in several phases:

  • Images for new Kubernetes components are pre-pulled to the nodes to minimize downtime and test for image availability.
  • New static pod definitions are rendered on the configuration update which is picked up by the kubelet. The command waits for the change to propagate to the API server state.
  • The command updates the kube-proxy daemonset with the new image version.
  • On every node in the cluster, the kubelet version is updated.

Note: The upgrade operation never deletes any resources from the cluster: obsolete resources should be deleted manually.

Applying changed Kubernetes Manifests

Unlike the Talos Linux command talosctl upgrade-k8s, Omni does not automatically apply updates to Kubernetes bootstrap manifests on a Kubernetes upgrade. This is to prevent Omni overwriting changes to the bootstrap manifests that you applied manually. (Talos Linux has a --dry-run feature on the upgrade command that shows you changes before the upgrade - Omni shows you the changes after the upgrade, but before they are applied.) Thus after each Kubernetes upgrade, it is recommended to examine the BootStrap Manifests of the cluster (as shown in the left hand navigation) and apply the changes, if they are appropriate.

Locking nodes

Omni allows you to control which nodes are upgraded during Talos or Kubernetes upgrade operations. You can lock nodes, which prevents them from receiving configuration updates, upgrades and downgrades. This allows you to ensure that new versions of Talos Linux or Kubernetes, or new config patches, are rolled out in a safe and controlled manner. If you cannot do a blue/green deployment with different clusters, you can roll out a new Kubernetes or Talos Linux release, or config patch, to just some of the nodes in your cluster. Once you have validated your applications perform correctly on the new versions, you can unlock all the nodes, and allow them to be updated also.

Note: you cannot lock control plane nodes, as it is not supported to have the Kubernetes version of a worker higher than that of the control plane nodes in a cluster - this may result in API version incompatibility.

To lock a node, simply select the Lock icon to the right of the node on the Cluster Overview screen, or use the omnictl cluster machine lock command. Upgrade and config patch operations will apply to all other nodes in the cluster, but locked nodes will retain their configuration at the time of locking. Unlock the nodes to allow pending cluster updates to complete.

3 - Installing Airgapped Omni

A tutorial on installing Omni in an airgapped environment.

Prerequisites

DNS server NTP server TLS certificates Installed on machine running Omni

  • genuuid
    • Used to generate a unique account ID for Omni.
  • Docker
    • Used for running the suite of applications
  • Wireguard
    • Used by Siderolink

Overview

Gathering Dependencies

In this package, we will be installing:

  • Gitea
  • Keycloak
  • Omni

To keep everything organized, I am using the following directory structure to store all the dependencies and I will move them to the airgapped network all at once.

NOTE: The empty directories will be used for the persistent data volumes when we deploy these apps in Docker.

airgap
├── certs
├── gitea
├── keycloak
├── omni
└── registry

Generate Certificates

TLS Certificates

This tutorial will involve configuring all of the applications to be accessed via https with signed .pem certificates generated with certbot. There are many methods of configuring TLS certificates and this guide will not cover how to generate your own TLS certificates, but there are many resources available online to help with this subject if you do not have certificates already.

Omni Certificate

Omni uses etcd to store the data for our installation and we need to give it a private key to use for encryption of the etcd database.

  1. First, Generate a GPG key.
gpg --quick-generate-key "Omni (Used for etcd data encryption) how-to-guide@siderolabs.com" rsa4096 cert never

This will generate a new GPG key pair with the specified properties.

What’s going on here?

  • quick-gnerate-key allows us to quickly generate a new GPG key pair. -"Omni (Used for etcd data encryption) how-to-guide@siderolabs.com" is the user ID associated with the key which generally consists of the real name, a comment, and an email address for the user.
  • rsa4096 specifies the algorithm type and key size.
  • cert means this key can be used to certify other keys.
  • never specifies that this key will never expire.
  1. Add an encryption subkey

We will use the fingerprint of this key to create an encryption subkey.

To find the fingerprint of the key we just created, run:

gpg --list-secret-keys

Next, run the following command to create the encryption subkey, replacing $FPR with your own keys fingerprint.

gpg --quick-add-key $FPR rsa4096 encr never

In this command:

  • $FPR is the fingprint of the key we are adding the subkey to.
  • rsa4096 and encr specify that the new subkey will be an RSA encryption key with a size of 4096 bits.
  • never means this subkey will never expire.
  1. Export the secret key

Lastly we’ll export this key into an ASCII formatted file so Omni can use it.

gpg --export-secret-key --armor how-to-guide@siderolabs.com > certs/omni.asc
  • --armor is an option which creates the output in ASCII format. Without it, the output would be binary.

Save this file to the certs directory in our package.

Create the app.ini File

Gitea uses a configuration file named app.ini which we can use to pre-configure with the necessary information to run Gitea and bypass the intitial startup page. When we start the container, we will mount this file as a volume using Docker.

Create the app.ini file

vim gitea/app.ini

Replace the DOMAIN, SSH_DOMAIN, and ROOT_URL values with your own hostname:

APP_NAME=Gitea: Git with a cup of tea
RUN_MODE=prod
RUN_USER=git
I_AM_BEING_UNSAFE_RUNNING_AS_ROOT=false

[server]
CERT_FILE=cert.pem
KEY_FILE=key.pem
APP_DATA_PATH=/data/gitea
DOMAIN=${GITEA_HOSTNAME}
SSH_DOMAIN=${GITEA_HOSTNAME}
HTTP_PORT=3000
ROOT_URL=https://${GITEA_HOSTNAME}:3000/
HTTP_ADDR=0.0.0.0
PROTOCOL=https
LOCAL_ROOT_URL=https://localhost:3000/

[database]
PATH=/data/gitea/gitea.db
DB_TYPE=sqlite3
HOST=localhost:3306
NAME=gitea
USER=root
PASSWD=

[security]
INSTALL_LOCK=true # This is the value which tells Gitea not to run the intitial configuration wizard on start up

NOTE: If running this in a production environment, you will also want to configure the database settings for a production database. This configuration will use an internal sqlite database in the container.

Gathering Images

Next we will gather all the images needed installing Gitea, Keycloak, Omni, and the images Omni will need for creating and installing Talos.

I’ll be using the following images for the tutorial:

Gitea

  • docker.io/gitea/gitea:1.19.3 Keycloak
  • quay.io/keycloak/keycloak:21.1.1 Omni
  • ghcr.io/siderolabs/omni:v0.11.0
    • Contact Us if you would like the image used to deploy Omni in an airgapped, or on-prem environement.
  • ghcr.io/siderolabs/imager:v1.4.5
    • pull this image to match the version of Talos you would like to use. Talos
  • ghcr.io/siderolabs/flannel:v0.21.4
  • ghcr.io/siderolabs/install-cni:v1.4.0-1-g9b07505
  • docker.io/coredns/coredns:1.10.1
  • gcr.io/etcd-development/etcd:v3.5.9
  • registry.k8s.io/kube-apiserver:v1.27.2
  • registry.k8s.io/kube-controller-manager:v1.27.2
  • registry.k8s.io/kube-scheduler:v1.27.2
  • registry.k8s.io/kube-proxy:v1.27.2
  • ghcr.io/siderolabs/kubelet:v1.27.2
  • ghcr.io/siderolabs/installer:v1.4.5
  • registry.k8s.io/pause:3.6

NOTE: The Talos images needed may be found using the command talosctl images. If you do not have talosctl installed, you may find the instructions on how to install it here.

Package the images

  1. Pull the images to load them locally into Docker.
  • Run the following command for each of the images listed above except for the Omni image which will be provided to you as an archive file already.
sudo docker pull registry/repository/image-name:tag
  1. Verify all of the images have been downloaded
sudo docker image ls
  1. Save all of the images into an archive file.
  • All of the images can be saved as a single archive file which can be used to load all at once on our airgapped machine with the following command.
docker save -o image-tarfile.tar \
  list \
  of \
  images

Here is an an example of the command used for the images in this tutuorial:

docker save -o registry/all_images.tar \
  docker.io/gitea/gitea:1.19.3 \
  quay.io/keycloak/keycloak:21.1.1 \
  ghcr.io/siderolabs/imager:v1.4.5 \
  ghcr.io/siderolabs/flannel:v0.21.4 \
  ghcr.io/siderolabs/install-cni:v1.4.0-1-g9b07505 \
  docker.io/coredns/coredns:1.10.1 \
  gcr.io/etcd-development/etcd:v3.5.9 \
  registry.k8s.io/kube-apiserver:v1.27.2 \
  registry.k8s.io/kube-controller-manager:v1.27.2 \
  registry.k8s.io/kube-scheduler:v1.27.2 \
  registry.k8s.io/kube-proxy:v1.27.2 \
  ghcr.io/siderolabs/kubelet:v1.27.2 \
  ghcr.io/siderolabs/installer:v1.4.5 \
  registry.k8s.io/pause:3.6

Move Dependencies

Now that we have all the packages necessary for the airgapped deployment of Omni, we’ll create a compressed archive file and move it to our airgapped network.

The directory structure should look like this now:

airgap
├── certs
│   ├── fullchain.pem
│   ├── omni.asc
│   └── privkey.pem
├── gitea
│   └── app.ini
├── keycloak
├── omni
└── registry
    ├── omni-image.tar # Provided to you by Sidero Labs
    └── all_images.tar

Create a compressed archive file to move to our airgap machine.

cd ../
tar czvf omni-airgap.tar.gz airgap/

Now I will use scp to move this file to my machine which does not have internet access. Use whatever method you prefer to move this file.

scp omni-airgap.tar.gz $USERNAME@$AIRGAP_MACHINE:/home/$USERNAME/

Lastly, I will log in to my airgapped machine and extract the compressed archive file in the home directory

cd ~/
tar xzvf omni-airgap.tar.gz

Log in Airgapped Machine

From here on out, the rest of the tutorial will take place from the airgapped machine we will be installing Omni, Keycloak, and Gitea on.

Gitea

Gitea will be used as a container registry for storing our images, but also many other functionalities including Git, Large File Storage, and the ability to store packages for many different package types. For more information on what you can use Gitea for, visit their documentation.

Install Gitea

Load the images we moved over. This will load all the images into Docker on the airgapped machine.

docker load -i registry/omni-image.tar
docker load -i registry/all_images.tar

Run Gitea using Docker:

  • The app.ini file is already configured and mounted below with the - v argument.
sudo docker run -it \
    -v $PWD/certs/privkey.pem:/data/gitea/key.pem \
    -v $PWD/certs/fullchain.pem:/data/gitea/cert.pem \
    -v $PWD/gitea/app.ini:/data/gitea/conf/app.ini \
    -v $PWD/gitea/data/:/data/gitea/ \
    -p 3000:3000 \
    gitea/gitea:1.19.3

You may now log in at the https://${GITEA_HOSTNAME}:3000 to begin configuring Gitea to store all the images needed for Omni and Talos.

Gitea setup

This is just the bare minimum setup to run Omni. Gitea has many additional configuration options and security measures to use in accordance with your industry’s security standards. More information on the configuration of Gitea can be found (here)[https://docs.gitea.com/].

Create a user

Click the Register button at the top right corner. The first user created will be created as an admin and permissions which can be adjusted accordingly afterwards if you like.

Create organizations

After registering an admin user, the organizations, can be created which will act as the package repositories for storing images. Create the following organizations:

  • siderolabs
  • keycloak
  • coredns
  • etcd-development
  • registry-k8s-io-proxy

NOTE: If you are using self-signed certs and would like to push images to your local Gitea using Docker, you will also need to configure your certs.d directory as described (here)[https://docs.docker.com/engine/security/certificates/].

Push Images to Gitea

Now that all of our organizations have been created, we can push the images we loaded into our Gitea for deploying Keycloak, Omni, and storing images used by Talos.

For all of the images loaded, we first need to tag them for our Gitea.

sudo docker tag original-image:tag gitea:3000/new-image:tag

For example, if I am tagging the kube-proxy image it will look like this:

NOTE: Don’t forget to tag all of the images from registry.k8s.io to go to the registry-k8s-io-proxy organization created in Gitea.

docker tag registry.k8s.io/kube-proxy:v1.27.2 ${GITEA_HOSTNAME}:3000/registry-k8s-io-proxy/kube-proxy:v1.27.2

Finally, push all the images into Gitea.

docker push ${GITEA_HOSTNAME}:3000/registry-k8s-io-proxy/kube-proxy:v1.27.2

Keycloak

Install Keycloak

The image used for keycloak is already loaded into Gitea and there are no files to stage before starting it so I’ll run the following command to start it. Replace KEYCLOAK_HOSTNAME and GITEA_HOSTNAME with your own hostnames.

sudo docker run -it \
    -p 8080:8080 \
    -p 8443:8443 \
    -v $PWD/certs/fullchain.pem:/etc/x509/https/tls.crt \
    -v $PWD/certs/privkey.pem:/etc/x509/https/tls.key \
    -v $PWD/keycloak/data:/opt/keycloak/data \
    -e KEYCLOAK_ADMIN=admin \
    -e KEYCLOAK_ADMIN_PASSWORD=admin \
    -e KC_HOSTNAME=${KEYCLOAK_HOSTNAME} \
    -e KC_HTTPS_CERTIFICATE_FILE=/etc/x509/https/tls.crt \
    -e KC_HTTPS_CERTIFICATE_KEY_FILE=/etc/x509/https/tls.key \
    ${GITEA_HOSTNAME}:3000/keycloak/keycloak:21.1.1 \
    start

Once Keycloak is installed, you can reach it in your browser at `https://${KEYCLOAK_HOSTNAME}:3000

Configuring Keycloak

For details on configuring Keycloak as a SAML Identity Provider to be used with Omni, follow this guide: Configuring Keycloak SAML

Omni

With Keycloak and Gitea installed and configured, we’re ready to start up Omni and start creating and managing clusters.

Install Omni

To install Omni, first generate a UUID to pass to Omni when we start it.

export OMNI_ACCOUNT_UUID=$(uuidgen)

Next run the following command, replacing hostnames for Omni, Gitea, or Keycloak with your own.

sudo docker run \
  --net=host \
  --cap-add=NET_ADMIN \
  -v $PWD/etcd:/_out/etcd \
  -v $PWD/certs/fullchain.pem:/fullchain.pem \
  -v $PWD/certs/privkey.pem:/privkey.pem \
  -v $PWD/certs/omni.asc:/omni.asc \
  ${GITEA_HOSTNAME}:3000/siderolabs/omni:v0.12.0 \
    --account-id=${OMNI_ACCOUNT_UUID} \
    --name=omni \
    --cert=/fullchain.pem \
    --key=/privkey.pem \
    --siderolink-api-cert=/fullchain.pem \
    --siderolink-api-key=/privkey.pem \
    --private-key-source=file:///omni.asc \
    --event-sink-port=8091 \
    --bind-addr=0.0.0.0:443 \
    --siderolink-api-bind-addr=0.0.0.0:8090 \
    --k8s-proxy-bind-addr=0.0.0.0:8100 \
    --advertised-api-url=https://${OMNI_HOSTNAME}:443/ \
    --siderolink-api-advertised-url=https://${OMNI_HOSTNAME}:8090/ \
    --siderolink-wireguard-advertised-addr=${OMNI_HOSTNAME}:50180 \
    --advertised-kubernetes-proxy-url=https://${OMNI_HOSTNAME}:8100/ \
    --auth-auth0-enabled=false \
    --auth-saml-enabled \
    --talos-installer-registry=${GITEA_HOSTNAME}:3000/siderolabs/installer \
    --talos-imager-image=${GITEA_HOSTNAME}:3000/siderolabs/imager:v1.4.5 \
    --kubernetes-registry=${GITEA_HOSTNAME}:3000/siderolabs/kubelet \
    --auth-saml-url "https://${KEYCLOAK_HOSTNAME}:8443/realms/omni/protocol/saml/descriptor"

What’s going on here:

  • --auth-auth0-enabled=false tells Omni not to use Auth0.
  • --auth-saml-enabled enables SAML authentication.
  • --talos-installer-registry, --talos-imager-image and --kubernetes-registry allow you to set the default images used by Omni to point to your local repository.
  • --auth-saml-url is the URL we saved earlier in the configuration of Keycloak.
    • --auth-saml-metadata may also be used if you would like to pass it as a file instead of a URL and can be used if using self-signed certificates for Keycloak.

Creating a cluster

Guides on creating a cluster on Omni can be found here:

Because we’re working in an airgapped environment we will need the following values added to our cluster configs so they know where to pull images from. More information on the Talos MachineConfig.registries can be found here.

NOTE: In this example, cluster discovery is also disabled. You may also configure cluster discovery on your network. More information on the Discovery Service can be found here

machine:
  registries:
    mirrors:
    docker.io:
      endpoints:
      - https://${GITEA_HOSTNAME}:3000
    gcr.io:
      endpoints:
      - https://${GITEA_HOSTNAME}:3000
    ghcr.io:
      endpoints:
      - https://${GITEA_HOSTNAME}:3000
    registry.k8s.io:
      endpoints:
      - https://${GITEA_HOSTNAME}:3000/v2/registry-k8s-io-proxy
      overridePath: true
cluster:
  discovery:
    enabled: false

Specifics on patching machines can be found here:

Closure

With Omni, Gitea, and Keycloak set up, you are ready to start managing and installing Talos clusters on your network! The suite of applications installed in this tutorial is an example of how an airgapped environment can be set up to make the most out of the Kubernetes clusters on your network. Other container registries or authentication providers may also be used with a similar setup, but this suite was chosen to give you starting point and an example of what your environment could look like.

4 - Using SAML and ACLs

A tutorial on using SAML and ACLs in Omni.

Using SAML and ACLs for fine-grained access control

In this tutorial we will use SAML and ACLs to control fine-grained access to Kubernetes clusters.

Let’s assume that at our organization:

  • We run a Keycloak instance as the SAML identity provider.
  • Have our Omni instance already configured to use Keycloak as the SAML identity provider.
  • Our Omni instance has 2 types of clusters:
    • Staging clusters with the name prefix staging-: staging-1, staging-2, etc.
    • Production clusters with the name prefix prod-: prod-1, prod-2, etc.
  • We want the users with the SAML role omni-cluster-admin to have full access to all clusters.
  • We want the users with the SAML role omni-cluster-support to have full access to staging clusters and read-only access to production clusters.

Sign in as the initial SAML User

If our Omni instance has no users yet, the initial user who signs in via SAML will be automatically assigned to the Omni Admin role.

We sign in as the user admin@example.org and get the Omni Admin role.

Configuring the AccessPolicy

We need to configure the ACL to assign the omni-cluster-support role to the users with the SAML role omni-cluster-support and the omni-cluster-admin role to the users with the SAML role omni-cluster-admin.

Create the following YAML file acl.yaml:

metadata:
  namespace: default
  type: AccessPolicies.omni.sidero.dev
  id: access-policy
spec:
  usergroups:
    support:
      users:
        - labelselectors:
            - saml.omni.sidero.dev/role/omni-cluster-support=
    admin:
      users:
        - labelselectors:
            - saml.omni.sidero.dev/role/omni-cluster-admin=
  clustergroups:
    staging:
      clusters:
        - match: staging-*
    production:
      clusters:
        - match: prod-*
    all:
      clusters:
        - match: "*"
  rules:
    - users:
        - group/support
      clusters:
        - group/staging
      role: Operator
    - users:
        - group/support
      clusters:
        - group/production
      role: Reader
      kubernetes:
        impersonate:
          groups:
            - read-only
    - users:
        - group/admin
      clusters:
        - group/all
      role: Operator
  tests:
    - name: support engineer has Operator access to staging cluster
      user:
        name: support-eng@example.org
        labels:
          saml.omni.sidero.dev/role/omni-cluster-support: ""
      cluster:
        name: staging-1
      expected:
        role: Operator
    - name: support engineer has Reader access to prod cluster and impersonates read-only group
      user:
        name: support-eng@example.org
        labels:
          saml.omni.sidero.dev/role/omni-cluster-support: ""
      cluster:
        name: prod-1
      expected:
        role: Reader
        kubernetes:
          impersonate:
            groups:
              - read-only
    - name: admin has Operator access to staging cluster
      user:
        name: admin-1@example.org
        labels:
          saml.omni.sidero.dev/role/omni-cluster-admin: ""
      cluster:
        name: staging-1
      expected:
        role: Operator
    - name: admin has Operator access to prod cluster
      user:
        name: admin-1@example.org
        labels:
          saml.omni.sidero.dev/role/omni-cluster-admin: ""
      cluster:
        name: prod-1
      expected:
        role: Operator

As the admin user admin@example.org, apply this ACL using omnictl:

$ omnictl apply -f acl.yaml

Accessing the Clusters

Now, in an incognito window, log in as a support engineer, cluster-support-1@example.org. Since the user is not assigned to any Omni role yet, they cannot use Omni Web.

Download omnictl and omniconfig from the UI, and try to list the clusters by using it:

$ omnictl --omniconfig ./support-omniconfig.yaml get cluster
NAMESPACE   TYPE   ID   VERSION
Error: rpc error: code = PermissionDenied desc = failed to validate: 1 error occurred:
	* rpc error: code = PermissionDenied desc = unauthorized: access denied: insufficient role: "None"

You won’t be able to list the clusters because the user is not assigned to any Omni role.

Now try to get the cluster staging-1:

$ omnictl --omniconfig ./support-omniconfig.yaml get cluster staging-1
NAMESPACE   TYPE      ID          VERSION
default     Cluster   staging-1   5

You can get the cluster staging-1 because the ACL allows the user to access the cluster.

Finally, try to delete the cluster staging-1:

$ omnictl --omniconfig ./support-omniconfig.yaml delete cluster staging-1
torn down Clusters.omni.sidero.dev staging-1
destroyed Clusters.omni.sidero.dev staging-1

The operation will succeed, because the ACL allows Operator-level access to the cluster for the user.

Try to do the same operations with the cluster prod-1:

$ omnictl --omniconfig ./support-omniconfig.yaml get cluster prod-1
NAMESPACE   TYPE      ID          VERSION
default     Cluster   prod-1   5

$ omnictl --omniconfig ./support-omniconfig.yaml delete cluster prod-1
Error: rpc error: code = PermissionDenied desc = failed to validate: 1 error occurred:
	* rpc error: code = PermissionDenied desc = unauthorized: access denied: insufficient role: "Reader"

The user will be able to get the cluster but not delete it, because the ACL allows only Reader-level access to the cluster for the user.

If you do the same operations as the admin user, you’ll notice that you are able to both get and delete staging and production clusters.

Assigning Omni roles to Users

If you want to allow SAML users to use Omni Web, you need to assign them at least the Reader role. As the admin, sign in to Omni Web and assign the role Reader to both cluster-support-1@example.org and cluster-admin-1@example.org.

Now, as the support engineer, you can sign out & sign in again to Omni Web and see the clusters staging-1 and prod-1 in the UI.