2022-01-07 00:55:02 +01:00
|
|
|
---
|
|
|
|
gitea: none
|
|
|
|
title: Flockige Infrastruktur deklarativ
|
|
|
|
include_toc: yes
|
|
|
|
lang: en
|
|
|
|
---
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
# C3D2 infrastructure based on NixOS
|
2021-02-22 12:31:58 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
## Setup
|
2022-01-07 00:35:39 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
## Enable nix flakes user wide
|
2022-01-07 00:35:39 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
Add the setting to the user nix.conf. Only do this once!
|
2021-03-25 00:40:49 +01:00
|
|
|
|
|
|
|
```bash
|
2021-02-22 12:31:58 +01:00
|
|
|
echo 'experimental-features = nix-command flakes' >> ~/.config/nix/nix.conf
|
2021-03-25 00:40:49 +01:00
|
|
|
```
|
2021-02-22 12:31:58 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
## Enable nix flakes system wide (preferred for NixOS)
|
2021-03-25 00:40:49 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
add this to your NixOS configuration:
|
2023-05-04 17:02:14 +02:00
|
|
|
|
2021-03-25 00:40:49 +01:00
|
|
|
```nix
|
2023-05-04 17:02:14 +02:00
|
|
|
nix.settings.experimental-features = [ "nix-command" "flakes" ];
|
2021-02-22 12:31:58 +01:00
|
|
|
```
|
2021-03-25 00:40:49 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### The secrets repo
|
2019-04-01 15:44:55 +02:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
is deprecated. Everything should be done through sops.
|
|
|
|
If you don't have secrets access ask sandro or astro to get onboarded.
|
2021-03-28 21:10:52 +02:00
|
|
|
|
2023-05-04 17:03:52 +02:00
|
|
|
### SSH access
|
|
|
|
|
|
|
|
If people should get root access to *all* machines, their keys should be added to ``ssh-public-keys.nix``.
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
## Deployment
|
2021-02-22 12:31:58 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### Deploy to a remote NixOS system
|
2021-02-22 12:31:58 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
For every host that has a `nixosConfiguration` in our Flake, there are two scripts that can be run for deployment via ssh.
|
2021-09-08 22:48:13 +02:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
- `nix run .#HOSTNAME-nixos-rebuild switch`
|
2021-09-08 22:48:13 +02:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
Copies the current state to build on the target system.
|
|
|
|
This may fail due to resource limits on eg. Raspberry Pis.
|
2021-02-22 12:31:58 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
- `nix run .#HOSTNAME-nixos-rebuild-local switch`
|
2021-03-28 21:11:13 +02:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
Builds everything locally, then uses `nix copy` to transfer the new NixOS system to the target.
|
2022-01-07 00:28:59 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
To use the cache from hydra set the following nix options similar to enabling flakes:
|
|
|
|
|
|
|
|
```
|
2023-04-03 20:34:04 +02:00
|
|
|
trusted-public-keys = nix-cache.hq.c3d2.de:KZRGGnwOYzys6pxgM8jlur36RmkJQ/y8y62e52fj1ps=
|
|
|
|
trusted-substituters = https://nix-cache.hq.c3d2.de
|
2022-12-22 01:25:02 +01:00
|
|
|
```
|
2019-07-04 00:31:45 +02:00
|
|
|
|
2023-05-04 17:02:14 +02:00
|
|
|
This can also be set with the `c3d2.addBinaryCache` option from the [c3d2-user-module](https://gitea.c3d2.de/c3d2/nix-user-module).
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### Checking for updates
|
2021-09-08 00:34:38 +02:00
|
|
|
|
|
|
|
```shell
|
|
|
|
nix run .#list-upgradable
|
|
|
|
```
|
|
|
|
|
2021-09-08 01:45:28 +02:00
|
|
|
![list-upgradable output](doc/list-upgradable.png)
|
|
|
|
|
|
|
|
Checks all hosts with a `nixosConfiguration` in `flake.nix`.
|
2021-09-08 00:34:38 +02:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### Update from [Hydra build](https://hydra.hq.c3d2.de/jobset/c3d2/nix-config#tabs-jobs)
|
2022-01-25 23:32:02 +01:00
|
|
|
|
|
|
|
The fastest way to update a system, a manual alternative to setting
|
|
|
|
`c3d2.autoUpdate = true;`
|
|
|
|
|
|
|
|
Just run:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
update-from-hydra
|
|
|
|
```
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### Deploy a MicroVM
|
2022-06-16 21:45:27 +02:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Build a microvm remotely and deploy
|
2022-06-16 21:45:27 +02:00
|
|
|
|
|
|
|
```shell
|
2022-12-22 01:25:02 +01:00
|
|
|
nix run .#microvm-update-HOSTNAME
|
2022-06-16 21:45:27 +02:00
|
|
|
```
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Build microvm locally and deploy
|
2022-06-16 21:45:27 +02:00
|
|
|
|
|
|
|
```shell
|
2022-12-22 01:25:02 +01:00
|
|
|
nix run .#microvm-update-HOSTNAME-local
|
2022-06-16 21:45:27 +02:00
|
|
|
```
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Update MicroVM from our Hydra
|
2022-06-19 03:30:35 +02:00
|
|
|
|
|
|
|
Our Hydra runs `nix flake update` daily in the `updater.timer`,
|
|
|
|
pushing it to the `flake-update` branch so that it can build fresh
|
|
|
|
systems. This branch is setup as the source flake in all the MicroVMs,
|
|
|
|
so the following is all that is needed on a MicroVM-hosting server:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
microvm -Ru $hostname
|
|
|
|
```
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
## Cluster deployment with Skyflake
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### About
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-03 01:10:15 +01:00
|
|
|
[Skyflake](https://github.com/astro/skyflake) provides Hyperconverged
|
|
|
|
Infrastructure to run NixOS MicroVMs on a cluster. Our setup unifies
|
|
|
|
networking with one bridge per VLAN. Persistent storage is replicated
|
2023-01-13 01:35:20 +01:00
|
|
|
with Cephfs.
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-03 04:10:31 +01:00
|
|
|
Recognize nixosConfiguration for our Skyflake deployment by the
|
|
|
|
`self.nixosModules.cluster-options` module being included.
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### User interface
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-03 01:10:15 +01:00
|
|
|
We use the less-privileged `c3d2@` user for deployment. This flake's
|
|
|
|
name on the cluster is `config`. Other flakes can coexist in the same
|
|
|
|
user so that we can run separately developed projects like
|
|
|
|
*dump-dvb*. *leon* and potentially other users can deploy Flakes and
|
|
|
|
MicroVMs without name clashes.
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Deploying
|
2022-12-03 01:10:15 +01:00
|
|
|
|
2022-12-03 04:10:31 +01:00
|
|
|
**git push** this repo to any machine in the cluster, preferably to
|
|
|
|
Hydra because there building won't disturb any services.
|
2022-11-30 21:00:36 +01:00
|
|
|
|
|
|
|
You don't deploy all MicroVMs at once. Instead, Skyflake allows you to
|
2022-12-03 04:10:31 +01:00
|
|
|
select NixOS systems by the branches you push to. **You must commit
|
|
|
|
before you push!**
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-03 04:10:31 +01:00
|
|
|
**Example:** deploy nixosConfigurations `mucbot` and `sdrweb` (`HEAD` is your
|
|
|
|
current commit)
|
2022-11-30 21:00:36 +01:00
|
|
|
|
|
|
|
```bash
|
|
|
|
git push c3d2@hydra.serv.zentralwerk.org:config HEAD:mucbot HEAD:sdrweb
|
|
|
|
```
|
|
|
|
|
2022-12-03 04:10:31 +01:00
|
|
|
This will:
|
|
|
|
1. Build the configuration on Hydra, refusing the branch update on
|
|
|
|
broken builds (through a git hook)
|
|
|
|
2. Copy the MicroVM package and its dependencies to the binary cache
|
2023-01-13 01:35:20 +01:00
|
|
|
that is accessible to all nodes with Cephfs
|
2022-12-03 04:10:31 +01:00
|
|
|
3. Submit one job per MicroVM into the Nomad cluster
|
|
|
|
|
|
|
|
*Deleting* a nixosConfiguration's branch will **stop** the MicroVM in Nomad.
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Updating
|
2022-12-03 01:10:15 +01:00
|
|
|
|
|
|
|
**TODO:** how would you like it?
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### MicroVM status
|
2022-12-03 01:10:15 +01:00
|
|
|
|
|
|
|
```bash
|
|
|
|
ssh c3d2@hydra.serv.zentralwerk.org status
|
|
|
|
```
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### Debugging for cluster admins
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Nomad
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
##### Check the cluster state
|
2022-11-30 21:00:36 +01:00
|
|
|
|
|
|
|
```shell
|
|
|
|
nomad server members
|
|
|
|
```
|
|
|
|
|
2022-12-03 01:10:15 +01:00
|
|
|
Nomad *servers* **coordinate** the cluster.
|
|
|
|
|
|
|
|
Nomad *clients* **run** the tasks.
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
##### Browse in the terminal
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-03 01:10:15 +01:00
|
|
|
[wander](https://github.com/robinovitch61/wander) and
|
|
|
|
[damon](https://github.com/hashicorp/damon) are nice TUIs that are
|
|
|
|
preinstalled on our cluster nodes.
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
##### Browse with a browser
|
2022-11-30 21:00:36 +01:00
|
|
|
|
|
|
|
First, tunnel TCP port `:4646` from a cluster server:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
ssh -L 4646:localhost:4646 root@server10.cluster.zentralwerk.org
|
|
|
|
```
|
|
|
|
|
|
|
|
Then, visit https://localhost:4646 for for full klickibunti.
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
##### Reset the Nomad state on a node
|
2022-11-30 21:00:36 +01:00
|
|
|
|
2022-12-03 01:10:15 +01:00
|
|
|
After upgrades, Nomad servers may fail rejoining the cluster. Do this
|
|
|
|
to make a *Nomad server* behave like a newborn:
|
2022-11-30 21:00:36 +01:00
|
|
|
|
|
|
|
```shell
|
|
|
|
systemctl stop nomad
|
|
|
|
rm -rf /var/lib/nomad/server/raft/
|
|
|
|
systemctl start nomad
|
|
|
|
```
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
## Secrets management
|
2020-01-01 13:40:42 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### Secrets Management Using `sops-nix`
|
2022-01-07 00:28:59 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Adding a new host
|
2022-01-08 21:22:24 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
Edit `.sops.yaml`:
|
2022-01-08 21:22:24 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
1. Add an AGE key for this host. Comments in this file tell you how to do it.
|
|
|
|
2. Add a `creation_rules` section for `host/$host/*.yaml` files
|
2022-01-08 21:22:24 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
#### Editing a hosts secrets
|
2022-01-08 21:22:24 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
Edit `.sops.yaml` to add files for a new host and its SSH pubkey.
|
2022-01-07 00:28:59 +01:00
|
|
|
|
2022-01-08 21:22:24 +01:00
|
|
|
```bash
|
|
|
|
# Get sops
|
2022-01-07 00:28:59 +01:00
|
|
|
nix develop
|
2022-01-08 21:22:24 +01:00
|
|
|
# Decrypt, start en EDITOR, encrypt
|
2022-01-07 00:28:59 +01:00
|
|
|
sops hosts/.../secrets.yaml
|
2022-01-08 21:22:24 +01:00
|
|
|
# Push
|
2022-12-22 01:25:02 +01:00
|
|
|
git commit -a -m Adding new secrets
|
|
|
|
git push origin
|
2022-01-07 00:28:59 +01:00
|
|
|
```
|
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
### Secrets management with PGP
|
2020-01-01 13:40:42 +01:00
|
|
|
|
2022-12-22 01:25:02 +01:00
|
|
|
Add your gpg-id to the .gpg-id file in secrets and let somebody reencrypt it for you.
|
|
|
|
Maybe this works for you, maybe not. I did it somehow:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
PASSWORD_STORE_DIR=`pwd` tr '\n' ' ' < .gpg-id | xargs -I{} pass init {}
|
|
|
|
```
|
|
|
|
|
|
|
|
Your gpg key has to have the Authenticate flag set. If not update it and push it to a keyserver and wait.
|
|
|
|
This is necessary, so you can login to any machine with your gpg key.
|
|
|
|
|
|
|
|
## Laptops / Desktops
|
2022-12-21 19:43:47 +01:00
|
|
|
|
2023-05-04 17:02:14 +02:00
|
|
|
This repo could be used in the past as a module. While still technically possible, it is not recommended
|
|
|
|
because the amounts of flake inputs highly increased and the modules are not designed with that in mind.
|
2020-01-01 13:40:42 +01:00
|
|
|
|
2023-05-04 17:02:14 +02:00
|
|
|
For end user modules take a look at the [c3d2-user-module](https://gitea.c3d2.de/c3d2/nix-user-module).
|
2023-01-02 05:05:53 +01:00
|
|
|
|
2023-05-04 17:02:14 +02:00
|
|
|
For the deployment options take a look at [deployment](https://gitea.c3d2.de/c3d2/deployment).
|
2023-01-02 05:05:53 +01:00
|
|
|
|
2023-05-04 17:02:14 +02:00
|
|
|
## ZFS setup
|
|
|
|
|
2023-01-02 05:05:53 +01:00
|
|
|
|
2023-05-08 23:33:12 +02:00
|
|
|
Set the `disko` options for the machine and run:
|
|
|
|
|
|
|
|
```
|
2023-05-19 02:24:14 +02:00
|
|
|
$(nix build --print-out-paths --no-link -L '.#nixosConfigurations.HOSTNAME.config.system.build.diskoNoDeps')
|
2023-05-08 23:33:12 +02:00
|
|
|
```
|