configurations of hq services https://nixos.c3d2.de/
Go to file
2024-12-10 23:57:58 +01:00
backgrounds Add backgrounds from glotzbert 2023-04-25 00:39:57 +02:00
config Drop nox 2024-12-10 23:28:11 +01:00
doc README: update list-upgradable documentation 2021-09-08 01:45:28 +02:00
hosts hydra: upgrade database, fix hydra db collation 2024-12-10 23:57:58 +01:00
keys sops: add marcel gpg key 2024-05-26 18:02:26 +02:00
lib Fix typo 2024-10-25 00:22:53 +02:00
modules Redeploy engel 2024-12-10 23:47:58 +01:00
overlays renovate: bump hash 2024-12-10 23:33:55 +01:00
.git-blame-ignore-revs git-blame-ignore-revs: add format commit 2023-05-16 23:16:11 +02:00
.gitattributes Fix sops diff filter 2023-12-28 22:24:01 +01:00
.gitignore Ignore my submodule 2024-03-12 01:37:03 +01:00
.sops.yaml Redeploy engel 2024-12-10 23:47:58 +01:00
book.toml nixos-misc: generate book 2024-07-04 22:13:30 +02:00
flake.lock flake.lock: Update 2024-12-10 15:53:06 +01:00
flake.nix jabber: fix eval 2024-12-10 22:27:07 +01:00
packages.nix Revert "Use apply script to switch to" 2024-12-07 02:19:44 +01:00
README.md Add lshw hint for driver 2024-11-18 23:12:57 +01:00
renovate.json Update renovate.json 2024-11-05 22:16:11 +01:00
ssh-public-keys.nix add additional ssh keys for nek0 2024-12-05 09:25:55 +01:00

C3D2 infrastructure based on NixOS

Further documentation

Helpful tools

Setup

Enable nix flakes user wide

Add the setting to the user nix.conf. Only do this once!

echo 'experimental-features = nix-command flakes' >> ~/.config/nix/nix.conf

Enable nix flakes system wide (preferred for NixOS)

add this to your NixOS configuration:

nix.settings.experimental-features = [ "nix-command" "flakes" ];

nixpkgs/nixos

The nixpkgs/nixos input used lives at https://github.com/supersandro2000/nixpkgs/tree/nixos-24.05. We are using a fork managed by sandro to make backports, cherry-picks and custom fixes dead easy. If you want to have an additional backport, cherry-pick or other change, please contact sandro.

nixos-modules repo

The nixos-modules repo lives at https://github.com/NuschtOS/nixos-modules and is mirrored to https://gitea.c3d2.de/c3d2/nixos-modules. Auto generated documentation about all options is available at https://nuschtos.github.io/nixos-modules/. It contains options sandro shares between his private nixos configs and the C3D2 one and which others also started to use. It sets many options by default and when searching for a particular setting you should always grep this repo, too. In question ask sandro and consider improving the documentation about this with comments and readme explanations. Something should be changed/added/removed/etc? Please create a PR or start a conversations with your ideas.

SSH access

If people should get root access to all machines, their keys should be added to ssh-public-keys.nix.

Deployment

Deploy to a remote NixOS system

For every host that has a nixosConfiguration in our Flake, there are two scripts that can be run for deployment via ssh.

  • nix run .#HOSTNAME-nixos-rebuild switch

    Copies the current state to build on the target system. This may fail due to resource limits on eg. Raspberry Pis.

  • nix run .#HOSTNAME-nixos-rebuild-local switch

    Builds everything locally, then uses nix copy to transfer the new NixOS system to the target.

    To use the cache from hydra set the following nix options similar to enabling flakes:

    trusted-public-keys = hydra.hq.c3d2.de:KZRGGnwOYzys6pxgM8jlur36RmkJQ/y8y62e52fj1ps=
    extra-substituters = https://hydra.hq.c3d2.de
    

    This can also be set with the c3d2.addBinaryCache option from the c3d2-user-module.

Checking for updates

nix run .#list-upgradable

list-upgradable output

Checks all hosts with a nixosConfiguration in flake.nix.

Update from Hydra build

The fastest way to update a system, a manual alternative to setting c3d2.autoUpdate = true;

Just run:

update-from-hydra

Deploy a MicroVM

Build a microvm remotely and deploy

nix run .#microvm-update-HOSTNAME

Build microvm locally and deploy

nix run .#microvm-update-HOSTNAME-local

Update MicroVM from our Hydra

Our Hydra runs nix flake update daily in the updater.timer, pushing it to the flake-update branch so that it can build fresh systems. This branch is setup as the source flake in all the MicroVMs, so the following is all that is needed on a MicroVM-hosting server:

microvm -Ru $hostname

Cluster deployment with Skyflake

About

Skyflake provides Hyperconverged Infrastructure to run NixOS MicroVMs on a cluster. Our setup unifies networking with one bridge per VLAN. Persistent storage is replicated with Cephfs.

Recognize nixosConfiguration for our Skyflake deployment by the self.nixosModules.cluster-options module being included.

User interface

We use the less-privileged c3d2@ user for deployment. This flake's name on the cluster is config. Other flakes can coexist in the same user so that we can run separately developed projects like dump-dvb. leon and potentially other users can deploy Flakes and MicroVMs without name clashes.

Deploying

git push this repo to any machine in the cluster, preferably to Hydra because there building won't disturb any services.

You don't deploy all MicroVMs at once. Instead, Skyflake allows you to select NixOS systems by the branches you push to. You must commit before you push!

Example: deploy nixosConfigurations mucbot and sdrweb (HEAD is your current commit)

git push c3d2@hydra.serv.zentralwerk.org:config HEAD:mucbot HEAD:sdrweb

This will:

  1. Build the configuration on Hydra, refusing the branch update on broken builds (through a git hook)
  2. Copy the MicroVM package and its dependencies to the binary cache that is accessible to all nodes with Cephfs
  3. Submit one job per MicroVM into the Nomad cluster

Deleting a nixosConfiguration's branch will stop the MicroVM in Nomad.

Updating

TODO: how would you like it?

MicroVM status

ssh c3d2@hydra.serv.zentralwerk.org status

Debugging for cluster admins

Nomad

Check the cluster state
nomad server members

Nomad servers coordinate the cluster.

Nomad clients run the tasks.

Browse in the terminal

wander and damon are nice TUIs that are preinstalled on our cluster nodes.

Browse with a browser

First, tunnel TCP port :4646 from a cluster server:

ssh -L 4646:localhost:4646 root@server10.cluster.zentralwerk.org

Then, visit https://localhost:4646 for for full klickibunti.

Reset the Nomad state on a node

After upgrades, Nomad servers may fail rejoining the cluster. Do this to make a Nomad server behave like a newborn:

systemctl stop nomad
rm -rf /var/lib/nomad/server/raft/
systemctl start nomad

Secrets management

Secrets Management Using sops-nix

Adding a new host

Edit .sops.yaml:

  1. Add an AGE key for this host. Comments in this file tell you how to do it.
  2. Add a creation_rules section for host/$host/*.yaml files

Editing a hosts secrets

Edit .sops.yaml to add files for a new host and its SSH pubkey.

# Get sops
nix develop
# Decrypt, start en EDITOR, encrypt
sops hosts/.../secrets.yaml
# Push
git commit -a -m "Add new secrets for new server"
git push

Secrets management with PGP

Add your gpg-id to the .gpg-id file in secrets and let somebody re-encrypt it for you. Maybe this works for you, maybe not. I did it somehow:

PASSWORD_STORE_DIR=$(pwd) tr '\n' ' ' < .gpg-id | xargs -I{} pass init {}

Your gpg key has to have the Authenticate flag set. If not update it and push it to a keyserver and wait. This is necessary, so you can login to any machine with your gpg key.

Laptops / Desktops

This repo could be used in the past as a module. While still technically possible, it is not recommended because the amounts of flake inputs highly increased and the modules are not designed with that in mind.

For end user modules take a look at the c3d2-user-module.

For the deployment options take a look at deployment.

File system setup

Set the disko options for the machine and run:

$(nix --extra-experimental-features "flakes nix-command" build --print-out-paths --no-link -L '.#nixosConfigurations.HOSTNAME.config.system.build.diskoScript')

When adding new disks the paths under /dev/disk/by-id/ should be used, so that the script is idempotent across device restarts.

Install new server

  • Copy the nix files from an existing, similar host.
  • Disable all secrets until after the installation is finished.
  • Set simd.arch option to the output of nix --extra-experimental-features "flakes nix-command" shell nixpkgs#gcc -c gcc -march=native -Q --help=target | grep march and update the comment next to it
    • If that returns x86_64 search on a search engine for the ark.intel.com entry for the processor which can be found by catting /proc/cpuinfo
  • Generate networking.hostId with head -c4 /dev/urandom | od -A none -t x4 according to the options description.
  • Boot live ISO
    • If your ssh key is not baked into the iso, set a password for the nixos with passwd to be able to log in over ssh.
  • rsync the this directory into the live system.
  • Stop the raid if one booted up mdadm --stop /dev/md126
  • generate and apply disk layout with disko (see above).
  • Generate hardware-configuration.nix with sudo nixos-generate-config --root /mnt.
    • If luks disks should be decrypted in initrd over ssh then:
      • Make sure boot.initrd.luks.devices.*.devices is set.
      • Enable boot.initrd.network.enable and boot.initrd.network.ssh.enable
      • Add the required kernel modules, which can be found with lshw -C network (look for driver=), for the network interfaces in initrd to boot.initrd.availableKernelModules.
      • Enable DHCP in the hardware-configuration.nix for all interfaces
  • Install nixos system with sudo nixos-install --root /mnt --no-channel-copy --flake .#HOSTNAME.
  • After a reboot add the age key to sops-nix with nix shell nixpkgs#ssh-to-age and ssh-to-age < /etc/ssh/ssh_host_ed25519_key.pub.
  • Add /etc/machine-id and luks password to sops secrets.
  • Enable and deploy secrets again.
  • Improve new machine setup by automating easy to automate steps and document others.
  • Commit everything and push