nix-config/README.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

294 lines
9.2 KiB
Markdown
Raw Permalink Normal View History

2022-01-07 00:55:02 +01:00
---
gitea: none
title: Flockige Infrastruktur deklarativ
include_toc: yes
lang: en
---
2022-12-22 01:25:02 +01:00
# C3D2 infrastructure based on NixOS
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
## Setup
2022-01-07 00:35:39 +01:00
### Enable nix flakes user wide
2022-01-07 00:35:39 +01:00
2022-12-22 01:25:02 +01:00
Add the setting to the user nix.conf. Only do this once!
```bash
2021-02-22 12:31:58 +01:00
echo 'experimental-features = nix-command flakes' >> ~/.config/nix/nix.conf
```
2021-02-22 12:31:58 +01:00
### Enable nix flakes system wide (preferred for NixOS)
2022-12-22 01:25:02 +01:00
add this to your NixOS configuration:
2023-05-04 17:02:14 +02:00
```nix
2023-05-04 17:02:14 +02:00
nix.settings.experimental-features = [ "nix-command" "flakes" ];
2021-02-22 12:31:58 +01:00
```
### nixpkgs/nixos
The nixpkgs/nixos input used lives at <https://github.com/supersandro2000/nixpkgs/tree/nixos-23.05>.
We are using a fork managed by sandro to make backports, cherry-picks and custom fixes dead easy.
If you want to have an additional backport, cherry-pick or other change, please contact sandro.
### nixos-modules repo
The nixos-modules repo lives at <https://github.com/supersandro2000/nixos-modules> and is mirrored to <https://gitea.c3d2.de/c3d2/nixos-modules>.
Auto generated documentation about all options is available at <https://supersandro2000.github.io/nixos-modules/>.
It contains options sandro shares between his private nixos configs and the C3D2 one.
It sets many options by default and when searching for a particular setting you should always grep this repo, too.
In question ask sandro and consider improving the documentation about this with comments and readme explanations.
Something should be changed/added/removed/etc? Please create a PR or start a conversations with your ideas.
### secrets repo
The secrets repo is absolutely deprecated!
Everything new must be done through sops and everything old should be migrated.
2022-12-22 01:25:02 +01:00
If you don't have secrets access ask sandro or astro to get onboarded.
2021-03-28 21:10:52 +02:00
### SSH access
If people should get root access to *all* machines, their keys should be added to ``ssh-public-keys.nix``.
2022-12-22 01:25:02 +01:00
## Deployment
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
### Deploy to a remote NixOS system
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
For every host that has a `nixosConfiguration` in our Flake, there are two scripts that can be run for deployment via ssh.
2021-09-08 22:48:13 +02:00
2022-12-22 01:25:02 +01:00
- `nix run .#HOSTNAME-nixos-rebuild switch`
2021-09-08 22:48:13 +02:00
2022-12-22 01:25:02 +01:00
Copies the current state to build on the target system.
This may fail due to resource limits on eg. Raspberry Pis.
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
- `nix run .#HOSTNAME-nixos-rebuild-local switch`
2022-12-22 01:25:02 +01:00
Builds everything locally, then uses `nix copy` to transfer the new NixOS system to the target.
2022-01-07 00:28:59 +01:00
2022-12-22 01:25:02 +01:00
To use the cache from hydra set the following nix options similar to enabling flakes:
```
2023-04-03 20:34:04 +02:00
trusted-public-keys = nix-cache.hq.c3d2.de:KZRGGnwOYzys6pxgM8jlur36RmkJQ/y8y62e52fj1ps=
trusted-substituters = https://nix-cache.hq.c3d2.de
2022-12-22 01:25:02 +01:00
```
2019-07-04 00:31:45 +02:00
2023-05-04 17:02:14 +02:00
This can also be set with the `c3d2.addBinaryCache` option from the [c3d2-user-module](https://gitea.c3d2.de/c3d2/nix-user-module).
2022-12-22 01:25:02 +01:00
### Checking for updates
2021-09-08 00:34:38 +02:00
```shell
nix run .#list-upgradable
```
![list-upgradable output](doc/list-upgradable.png)
Checks all hosts with a `nixosConfiguration` in `flake.nix`.
2021-09-08 00:34:38 +02:00
2022-12-22 01:25:02 +01:00
### Update from [Hydra build](https://hydra.hq.c3d2.de/jobset/c3d2/nix-config#tabs-jobs)
The fastest way to update a system, a manual alternative to setting
`c3d2.autoUpdate = true;`
Just run:
```shell
update-from-hydra
```
2022-12-22 01:25:02 +01:00
### Deploy a MicroVM
2022-06-16 21:45:27 +02:00
2022-12-22 01:25:02 +01:00
#### Build a microvm remotely and deploy
2022-06-16 21:45:27 +02:00
```shell
2022-12-22 01:25:02 +01:00
nix run .#microvm-update-HOSTNAME
2022-06-16 21:45:27 +02:00
```
2022-12-22 01:25:02 +01:00
#### Build microvm locally and deploy
2022-06-16 21:45:27 +02:00
```shell
2022-12-22 01:25:02 +01:00
nix run .#microvm-update-HOSTNAME-local
2022-06-16 21:45:27 +02:00
```
2022-12-22 01:25:02 +01:00
#### Update MicroVM from our Hydra
2022-06-19 03:30:35 +02:00
Our Hydra runs `nix flake update` daily in the `updater.timer`,
pushing it to the `flake-update` branch so that it can build fresh
systems. This branch is setup as the source flake in all the MicroVMs,
so the following is all that is needed on a MicroVM-hosting server:
```shell
microvm -Ru $hostname
```
2022-12-22 01:25:02 +01:00
## Cluster deployment with Skyflake
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
### About
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
[Skyflake](https://github.com/astro/skyflake) provides Hyperconverged
Infrastructure to run NixOS MicroVMs on a cluster. Our setup unifies
networking with one bridge per VLAN. Persistent storage is replicated
with Cephfs.
2022-11-30 21:00:36 +01:00
2022-12-03 04:10:31 +01:00
Recognize nixosConfiguration for our Skyflake deployment by the
`self.nixosModules.cluster-options` module being included.
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
### User interface
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
We use the less-privileged `c3d2@` user for deployment. This flake's
name on the cluster is `config`. Other flakes can coexist in the same
user so that we can run separately developed projects like
*dump-dvb*. *leon* and potentially other users can deploy Flakes and
MicroVMs without name clashes.
2022-12-22 01:25:02 +01:00
#### Deploying
2022-12-03 01:10:15 +01:00
2022-12-03 04:10:31 +01:00
**git push** this repo to any machine in the cluster, preferably to
Hydra because there building won't disturb any services.
2022-11-30 21:00:36 +01:00
You don't deploy all MicroVMs at once. Instead, Skyflake allows you to
2022-12-03 04:10:31 +01:00
select NixOS systems by the branches you push to. **You must commit
before you push!**
2022-11-30 21:00:36 +01:00
2022-12-03 04:10:31 +01:00
**Example:** deploy nixosConfigurations `mucbot` and `sdrweb` (`HEAD` is your
current commit)
2022-11-30 21:00:36 +01:00
```bash
git push c3d2@hydra.serv.zentralwerk.org:config HEAD:mucbot HEAD:sdrweb
```
2022-12-03 04:10:31 +01:00
This will:
1. Build the configuration on Hydra, refusing the branch update on
broken builds (through a git hook)
2. Copy the MicroVM package and its dependencies to the binary cache
that is accessible to all nodes with Cephfs
2022-12-03 04:10:31 +01:00
3. Submit one job per MicroVM into the Nomad cluster
*Deleting* a nixosConfiguration's branch will **stop** the MicroVM in Nomad.
2022-12-22 01:25:02 +01:00
#### Updating
2022-12-03 01:10:15 +01:00
**TODO:** how would you like it?
2022-12-22 01:25:02 +01:00
#### MicroVM status
2022-12-03 01:10:15 +01:00
```bash
ssh c3d2@hydra.serv.zentralwerk.org status
```
2022-12-22 01:25:02 +01:00
### Debugging for cluster admins
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
#### Nomad
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
##### Check the cluster state
2022-11-30 21:00:36 +01:00
```shell
nomad server members
```
2022-12-03 01:10:15 +01:00
Nomad *servers* **coordinate** the cluster.
Nomad *clients* **run** the tasks.
2022-12-22 01:25:02 +01:00
##### Browse in the terminal
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
[wander](https://github.com/robinovitch61/wander) and
[damon](https://github.com/hashicorp/damon) are nice TUIs that are
preinstalled on our cluster nodes.
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
##### Browse with a browser
2022-11-30 21:00:36 +01:00
First, tunnel TCP port `:4646` from a cluster server:
```bash
ssh -L 4646:localhost:4646 root@server10.cluster.zentralwerk.org
```
Then, visit https://localhost:4646 for for full klickibunti.
2022-12-22 01:25:02 +01:00
##### Reset the Nomad state on a node
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
After upgrades, Nomad servers may fail rejoining the cluster. Do this
to make a *Nomad server* behave like a newborn:
2022-11-30 21:00:36 +01:00
```shell
systemctl stop nomad
rm -rf /var/lib/nomad/server/raft/
systemctl start nomad
```
2022-12-22 01:25:02 +01:00
## Secrets management
2020-01-01 13:40:42 +01:00
2022-12-22 01:25:02 +01:00
### Secrets Management Using `sops-nix`
2022-01-07 00:28:59 +01:00
2022-12-22 01:25:02 +01:00
#### Adding a new host
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
Edit `.sops.yaml`:
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
1. Add an AGE key for this host. Comments in this file tell you how to do it.
2. Add a `creation_rules` section for `host/$host/*.yaml` files
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
#### Editing a hosts secrets
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
Edit `.sops.yaml` to add files for a new host and its SSH pubkey.
2022-01-07 00:28:59 +01:00
2022-01-08 21:22:24 +01:00
```bash
# Get sops
2022-01-07 00:28:59 +01:00
nix develop
2022-01-08 21:22:24 +01:00
# Decrypt, start en EDITOR, encrypt
2022-01-07 00:28:59 +01:00
sops hosts/.../secrets.yaml
2022-01-08 21:22:24 +01:00
# Push
2022-12-22 01:25:02 +01:00
git commit -a -m Adding new secrets
git push origin
2022-01-07 00:28:59 +01:00
```
2022-12-22 01:25:02 +01:00
### Secrets management with PGP
2020-01-01 13:40:42 +01:00
2022-12-22 01:25:02 +01:00
Add your gpg-id to the .gpg-id file in secrets and let somebody reencrypt it for you.
Maybe this works for you, maybe not. I did it somehow:
```bash
PASSWORD_STORE_DIR=`pwd` tr '\n' ' ' < .gpg-id | xargs -I{} pass init {}
```
Your gpg key has to have the Authenticate flag set. If not update it and push it to a keyserver and wait.
This is necessary, so you can login to any machine with your gpg key.
## Laptops / Desktops
2022-12-21 19:43:47 +01:00
2023-05-04 17:02:14 +02:00
This repo could be used in the past as a module. While still technically possible, it is not recommended
because the amounts of flake inputs highly increased and the modules are not designed with that in mind.
2020-01-01 13:40:42 +01:00
2023-05-04 17:02:14 +02:00
For end user modules take a look at the [c3d2-user-module](https://gitea.c3d2.de/c3d2/nix-user-module).
2023-01-02 05:05:53 +01:00
2023-05-04 17:02:14 +02:00
For the deployment options take a look at [deployment](https://gitea.c3d2.de/c3d2/deployment).
2023-01-02 05:05:53 +01:00
## File system setup
2023-01-02 05:05:53 +01:00
2023-05-08 23:33:12 +02:00
Set the `disko` options for the machine and run:
2023-05-20 04:05:52 +02:00
```shell
2024-01-06 18:10:21 +01:00
$(nix build --print-out-paths --no-link -L '.#nixosConfigurations.HOSTNAME.config.system.build.disko')
2023-05-08 23:33:12 +02:00
```
2023-05-20 04:05:52 +02:00
When adding new disks the paths under ``/dev/disk/by-id/`` should be used, so that the script is idempotent across device restarts.
## Install new server
2023-05-21 21:24:21 +02:00
- Copy the nix files from an existing, similar host.
- Disable all secrets until after the installation is finished.
- Set `simd.arch` option to the output of ``nix shell nixpkgs#gcc -c gcc -march=native -Q --help=target | grep march`` and update the comment next to it
2023-05-20 04:05:52 +02:00
- If that returns `x86_64` search on a search engine for the `ark.intel.com` entry for the processor which can be found by catting ``/proc/cpuinfo``
- Generate `networking.hostId` with ``head -c4 /dev/urandom | od -A none -t x4`` according to the options description.
- Boot live ISO
2023-05-21 21:24:21 +02:00
- If your ssh key is not baked into the iso, set a password for the `nixos` with passwd to be able to log in over ssh.
- `rsync` the this directory into the live system.
- generate and apply disk layout with disko (see above).
- Generate `hardware-configuration.nix` with ``sudo nixos-generate-config --no-filesystems --root /mnt``.
- If luks disks should be decrypted in initrd over ssh, enable DHCP in the `hardware-configuration.nix` for the interfaces that should be used for that.
- Install nixos system with ``sudo nixos-install --root /mnt --no-channel-copy --no-root-passwd --flake .#HOSTNAME``.
- After a reboot add age key to sops-nix with ``nix shell nixpkgs#ssh-to-age`` and ``ssh-to-age < /etc/ssh/ssh_host_ed25519_key.pub``.
- Add ``/etc/machine-id`` and luks password to sops secrets.
- Enable and deploy secrets again.
- Improve new machine setup by automating easy to automate steps and document others.
2023-05-20 04:05:52 +02:00
- Commit everything and push