Sandro Jäckel
9c69a08dc1
Flake lock file updates: • Updated input 'caveman': 'git+https://gitea.c3d2.de/astro/caveman.git?ref=refs/heads/main&rev=73227f85d3327b8e1394c0324ddc80a5bafbc001' (2024-10-04) → 'git+https://gitea.c3d2.de/astro/caveman.git?ref=refs/heads/main&rev=da7e8fae31706a8bad9bb5ba060e533f4969c383' (2024-10-04) • Updated input 'nixos-modules': 'github:NuschtOS/nixos-modules/b1747bbf72f08dae218766cf3ac5852dd15e8ebf?narHash=sha256-7T3y5BwRhEfsmrFm8HuZknqs/ZxLk3ZmoOD8YFKO5aE%3D' (2024-10-02) → 'github:NuschtOS/nixos-modules/849a4044c5481e378c8050ba8327021775bc0abd?narHash=sha256-GGdV1Wcdzrs820pQUh9qXnOUg7ji1UoAZTut%2BVGOqHA%3D' (2024-10-04) • Updated input 'nuschtos-search': 'github:NuschtOS/search/c3c3928b8de7d300c34e9d90fdc19febd1a32062?narHash=sha256-0R%2B1ih0Rfqrz/lcduvpNSnUw3uthUHiaGh0aWPyIqeQ%3D' (2024-09-29) → 'github:NuschtOS/search/ba81d9c1eae20fc3a1cd066062a05ac2e799e629?narHash=sha256-ofWYux/uUAv8wq7sWw8XWke0sh8p4qYxSOn8d%2BEaJ8c%3D' (2024-10-04) • Updated input 'openwrt-imagebuilder': 'github:astro/nix-openwrt-imagebuilder/5d8b69e1f549fdc876e4ccc9518236b7f2b28c29?narHash=sha256-6FZqlNUCudLXzXi38jM0k4NTEUsSyZI4CmjydWNdN3o%3D' (2024-10-03) → 'github:astro/nix-openwrt-imagebuilder/0f51b419ea854b7b969b0924d0441e605bdaaea2?narHash=sha256-5tkHOjy6O4Xp9OjS2LGZY4wNx86zS8f6Y0hjpuAMK%2BA%3D' (2024-10-04) • Updated input 'zentralwerk': 'git+https://gitea.c3d2.de/zentralwerk/network.git?ref=refs/heads/master&rev=cdd740f3d51e7db9526a2c9e6a1a10cd90f63b8a' (2024-10-03) → 'git+https://gitea.c3d2.de/zentralwerk/network.git?ref=refs/heads/master&rev=f3b7bf24a7da3a5c05fa15825b6087b9f5651f26' (2024-10-04) |
||
---|---|---|
backgrounds | ||
config | ||
doc | ||
hosts | ||
keys | ||
lib | ||
modules | ||
overlays | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.sops.yaml | ||
book.toml | ||
flake.lock | ||
flake.nix | ||
packages.nix | ||
README.md | ||
ssh-public-keys.nix |
C3D2 infrastructure based on NixOS
Further documentation
Helpful tools
- NixOS option search
- Code search (including Nix repos we use)
Setup
Enable nix flakes user wide
Add the setting to the user nix.conf. Only do this once!
echo 'experimental-features = nix-command flakes' >> ~/.config/nix/nix.conf
Enable nix flakes system wide (preferred for NixOS)
add this to your NixOS configuration:
nix.settings.experimental-features = [ "nix-command" "flakes" ];
nixpkgs/nixos
The nixpkgs/nixos input used lives at https://github.com/supersandro2000/nixpkgs/tree/nixos-24.05. We are using a fork managed by sandro to make backports, cherry-picks and custom fixes dead easy. If you want to have an additional backport, cherry-pick or other change, please contact sandro.
nixos-modules repo
The nixos-modules repo lives at https://github.com/NuschtOS/nixos-modules and is mirrored to https://gitea.c3d2.de/c3d2/nixos-modules. Auto generated documentation about all options is available at https://nuschtos.github.io/nixos-modules/. It contains options sandro shares between his private nixos configs and the C3D2 one and which others also started to use. It sets many options by default and when searching for a particular setting you should always grep this repo, too. In question ask sandro and consider improving the documentation about this with comments and readme explanations. Something should be changed/added/removed/etc? Please create a PR or start a conversations with your ideas.
SSH access
If people should get root access to all machines, their keys should be added to ssh-public-keys.nix
.
Deployment
Deploy to a remote NixOS system
For every host that has a nixosConfiguration
in our Flake, there are two scripts that can be run for deployment via ssh.
-
nix run .#HOSTNAME-nixos-rebuild switch
Copies the current state to build on the target system. This may fail due to resource limits on eg. Raspberry Pis.
-
nix run .#HOSTNAME-nixos-rebuild-local switch
Builds everything locally, then uses
nix copy
to transfer the new NixOS system to the target.To use the cache from hydra set the following nix options similar to enabling flakes:
trusted-public-keys = hydra.hq.c3d2.de:KZRGGnwOYzys6pxgM8jlur36RmkJQ/y8y62e52fj1ps= extra-substituters = https://hydra.hq.c3d2.de
This can also be set with the
c3d2.addBinaryCache
option from the c3d2-user-module.
Checking for updates
nix run .#list-upgradable
Checks all hosts with a nixosConfiguration
in flake.nix
.
Update from Hydra build
The fastest way to update a system, a manual alternative to setting
c3d2.autoUpdate = true;
Just run:
update-from-hydra
Deploy a MicroVM
Build a microvm remotely and deploy
nix run .#microvm-update-HOSTNAME
Build microvm locally and deploy
nix run .#microvm-update-HOSTNAME-local
Update MicroVM from our Hydra
Our Hydra runs nix flake update
daily in the updater.timer
,
pushing it to the flake-update
branch so that it can build fresh
systems. This branch is setup as the source flake in all the MicroVMs,
so the following is all that is needed on a MicroVM-hosting server:
microvm -Ru $hostname
Cluster deployment with Skyflake
About
Skyflake provides Hyperconverged Infrastructure to run NixOS MicroVMs on a cluster. Our setup unifies networking with one bridge per VLAN. Persistent storage is replicated with Cephfs.
Recognize nixosConfiguration for our Skyflake deployment by the
self.nixosModules.cluster-options
module being included.
User interface
We use the less-privileged c3d2@
user for deployment. This flake's
name on the cluster is config
. Other flakes can coexist in the same
user so that we can run separately developed projects like
dump-dvb. leon and potentially other users can deploy Flakes and
MicroVMs without name clashes.
Deploying
git push this repo to any machine in the cluster, preferably to Hydra because there building won't disturb any services.
You don't deploy all MicroVMs at once. Instead, Skyflake allows you to select NixOS systems by the branches you push to. You must commit before you push!
Example: deploy nixosConfigurations mucbot
and sdrweb
(HEAD
is your
current commit)
git push c3d2@hydra.serv.zentralwerk.org:config HEAD:mucbot HEAD:sdrweb
This will:
- Build the configuration on Hydra, refusing the branch update on broken builds (through a git hook)
- Copy the MicroVM package and its dependencies to the binary cache that is accessible to all nodes with Cephfs
- Submit one job per MicroVM into the Nomad cluster
Deleting a nixosConfiguration's branch will stop the MicroVM in Nomad.
Updating
TODO: how would you like it?
MicroVM status
ssh c3d2@hydra.serv.zentralwerk.org status
Debugging for cluster admins
Nomad
Check the cluster state
nomad server members
Nomad servers coordinate the cluster.
Nomad clients run the tasks.
Browse in the terminal
wander and damon are nice TUIs that are preinstalled on our cluster nodes.
Browse with a browser
First, tunnel TCP port :4646
from a cluster server:
ssh -L 4646:localhost:4646 root@server10.cluster.zentralwerk.org
Then, visit https://localhost:4646 for for full klickibunti.
Reset the Nomad state on a node
After upgrades, Nomad servers may fail rejoining the cluster. Do this to make a Nomad server behave like a newborn:
systemctl stop nomad
rm -rf /var/lib/nomad/server/raft/
systemctl start nomad
Secrets management
Secrets Management Using sops-nix
Adding a new host
Edit .sops.yaml
:
- Add an AGE key for this host. Comments in this file tell you how to do it.
- Add a
creation_rules
section forhost/$host/*.yaml
files
Editing a hosts secrets
Edit .sops.yaml
to add files for a new host and its SSH pubkey.
# Get sops
nix develop
# Decrypt, start en EDITOR, encrypt
sops hosts/.../secrets.yaml
# Push
git commit -a -m "Add new secrets for new server"
git push
Secrets management with PGP
Add your gpg-id to the .gpg-id file in secrets and let somebody re-encrypt it for you. Maybe this works for you, maybe not. I did it somehow:
PASSWORD_STORE_DIR=$(pwd) tr '\n' ' ' < .gpg-id | xargs -I{} pass init {}
Your gpg key has to have the Authenticate flag set. If not update it and push it to a keyserver and wait. This is necessary, so you can login to any machine with your gpg key.
Laptops / Desktops
This repo could be used in the past as a module. While still technically possible, it is not recommended because the amounts of flake inputs highly increased and the modules are not designed with that in mind.
For end user modules take a look at the c3d2-user-module.
For the deployment options take a look at deployment.
File system setup
Set the disko
options for the machine and run:
$(nix build --print-out-paths --no-link -L '.#nixosConfigurations.HOSTNAME.config.system.build.disko')
When adding new disks the paths under /dev/disk/by-id/
should be used, so that the script is idempotent across device restarts.
Install new server
- Copy the nix files from an existing, similar host.
- Disable all secrets until after the installation is finished.
- Set
simd.arch
option to the output ofnix shell nixpkgs#gcc -c gcc -march=native -Q --help=target | grep march
and update the comment next to it- If that returns
x86_64
search on a search engine for theark.intel.com
entry for the processor which can be found by catting/proc/cpuinfo
- If that returns
- Generate
networking.hostId
withhead -c4 /dev/urandom | od -A none -t x4
according to the options description. - Boot live ISO
- If your ssh key is not baked into the iso, set a password for the
nixos
with passwd to be able to log in over ssh.
- If your ssh key is not baked into the iso, set a password for the
rsync
the this directory into the live system.- generate and apply disk layout with disko (see above).
- Generate
hardware-configuration.nix
withsudo nixos-generate-config --no-filesystems --root /mnt
.- If luks disks should be decrypted in initrd over ssh, enable DHCP in the
hardware-configuration.nix
for the interfaces that should be used for that.
- If luks disks should be decrypted in initrd over ssh, enable DHCP in the
- Install nixos system with
sudo nixos-install --root /mnt --no-channel-copy --no-root-passwd --flake .#HOSTNAME
. - After a reboot add age key to sops-nix with
nix shell nixpkgs#ssh-to-age
andssh-to-age < /etc/ssh/ssh_host_ed25519_key.pub
. - Add
/etc/machine-id
and luks password to sops secrets. - Enable and deploy secrets again.
- Improve new machine setup by automating easy to automate steps and document others.
- Commit everything and push