nix-config/README.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

280 lines
6.6 KiB
Markdown
Raw Normal View History

2022-01-07 00:55:02 +01:00
---
gitea: none
title: Flockige Infrastruktur deklarativ
include_toc: yes
lang: en
---
2022-12-22 01:25:02 +01:00
# C3D2 infrastructure based on NixOS
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
## Setup
2022-01-07 00:35:39 +01:00
2022-12-22 01:25:02 +01:00
## Enable nix flakes user wide
2022-01-07 00:35:39 +01:00
2022-12-22 01:25:02 +01:00
Add the setting to the user nix.conf. Only do this once!
```bash
2021-02-22 12:31:58 +01:00
echo 'experimental-features = nix-command flakes' >> ~/.config/nix/nix.conf
```
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
## Enable nix flakes system wide (preferred for NixOS)
2022-12-22 01:25:02 +01:00
add this to your NixOS configuration:
```nix
2022-12-22 01:25:02 +01:00
nix.settings.experimental-features = "nix-command flakes";
2021-02-22 12:31:58 +01:00
```
2022-12-22 01:25:02 +01:00
### The secrets repo
2022-12-22 01:25:02 +01:00
is deprecated. Everything should be done through sops.
If you don't have secrets access ask sandro or astro to get onboarded.
2021-03-28 21:10:52 +02:00
2022-12-22 01:25:02 +01:00
## Deployment
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
### Deploy to a remote NixOS system
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
For every host that has a `nixosConfiguration` in our Flake, there are two scripts that can be run for deployment via ssh.
2021-09-08 22:48:13 +02:00
2022-12-22 01:25:02 +01:00
- `nix run .#HOSTNAME-nixos-rebuild switch`
2021-09-08 22:48:13 +02:00
2022-12-22 01:25:02 +01:00
Copies the current state to build on the target system.
This may fail due to resource limits on eg. Raspberry Pis.
2021-02-22 12:31:58 +01:00
2022-12-22 01:25:02 +01:00
- `nix run .#HOSTNAME-nixos-rebuild-local switch`
2022-12-22 01:25:02 +01:00
Builds everything locally, then uses `nix copy` to transfer the new NixOS system to the target.
2022-01-07 00:28:59 +01:00
2022-12-22 01:25:02 +01:00
To use the cache from hydra set the following nix options similar to enabling flakes:
```
trusted-public-keys = nix-serve.hq.c3d2.de:KZRGGnwOYzys6pxgM8jlur36RmkJQ/y8y62e52fj1ps=
trusted-substituters = https://nix-serve.hq.c3d2.de
```
2019-07-04 00:31:45 +02:00
2022-12-22 01:25:02 +01:00
### Checking for updates
2021-09-08 00:34:38 +02:00
```shell
nix run .#list-upgradable
```
![list-upgradable output](doc/list-upgradable.png)
Checks all hosts with a `nixosConfiguration` in `flake.nix`.
2021-09-08 00:34:38 +02:00
2022-12-22 01:25:02 +01:00
### Update from [Hydra build](https://hydra.hq.c3d2.de/jobset/c3d2/nix-config#tabs-jobs)
The fastest way to update a system, a manual alternative to setting
`c3d2.autoUpdate = true;`
Just run:
```shell
update-from-hydra
```
2022-12-22 01:25:02 +01:00
### Deploy a MicroVM
2022-06-16 21:45:27 +02:00
2022-12-22 01:25:02 +01:00
#### Build a microvm remotely and deploy
2022-06-16 21:45:27 +02:00
```shell
2022-12-22 01:25:02 +01:00
nix run .#microvm-update-HOSTNAME
2022-06-16 21:45:27 +02:00
```
2022-12-22 01:25:02 +01:00
#### Build microvm locally and deploy
2022-06-16 21:45:27 +02:00
```shell
2022-12-22 01:25:02 +01:00
nix run .#microvm-update-HOSTNAME-local
2022-06-16 21:45:27 +02:00
```
2022-12-22 01:25:02 +01:00
#### Update MicroVM from our Hydra
2022-06-19 03:30:35 +02:00
Our Hydra runs `nix flake update` daily in the `updater.timer`,
pushing it to the `flake-update` branch so that it can build fresh
systems. This branch is setup as the source flake in all the MicroVMs,
so the following is all that is needed on a MicroVM-hosting server:
```shell
microvm -Ru $hostname
```
2022-12-22 01:25:02 +01:00
## Cluster deployment with Skyflake
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
### About
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
[Skyflake](https://github.com/astro/skyflake) provides Hyperconverged
Infrastructure to run NixOS MicroVMs on a cluster. Our setup unifies
networking with one bridge per VLAN. Persistent storage is replicated
with Glusterfs.
2022-11-30 21:00:36 +01:00
2022-12-03 04:10:31 +01:00
Recognize nixosConfiguration for our Skyflake deployment by the
`self.nixosModules.cluster-options` module being included.
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
### User interface
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
We use the less-privileged `c3d2@` user for deployment. This flake's
name on the cluster is `config`. Other flakes can coexist in the same
user so that we can run separately developed projects like
*dump-dvb*. *leon* and potentially other users can deploy Flakes and
MicroVMs without name clashes.
2022-12-22 01:25:02 +01:00
#### Deploying
2022-12-03 01:10:15 +01:00
2022-12-03 04:10:31 +01:00
**git push** this repo to any machine in the cluster, preferably to
Hydra because there building won't disturb any services.
2022-11-30 21:00:36 +01:00
You don't deploy all MicroVMs at once. Instead, Skyflake allows you to
2022-12-03 04:10:31 +01:00
select NixOS systems by the branches you push to. **You must commit
before you push!**
2022-11-30 21:00:36 +01:00
2022-12-03 04:10:31 +01:00
**Example:** deploy nixosConfigurations `mucbot` and `sdrweb` (`HEAD` is your
current commit)
2022-11-30 21:00:36 +01:00
```bash
git push c3d2@hydra.serv.zentralwerk.org:config HEAD:mucbot HEAD:sdrweb
```
2022-12-03 04:10:31 +01:00
This will:
1. Build the configuration on Hydra, refusing the branch update on
broken builds (through a git hook)
2. Copy the MicroVM package and its dependencies to the binary cache
that is accessible to all nodes with Glusterfs
3. Submit one job per MicroVM into the Nomad cluster
*Deleting* a nixosConfiguration's branch will **stop** the MicroVM in Nomad.
2022-12-22 01:25:02 +01:00
#### Updating
2022-12-03 01:10:15 +01:00
**TODO:** how would you like it?
2022-12-22 01:25:02 +01:00
#### MicroVM status
2022-12-03 01:10:15 +01:00
```bash
ssh c3d2@hydra.serv.zentralwerk.org status
```
2022-12-22 01:25:02 +01:00
### Debugging for cluster admins
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
#### Glusterfs
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
Glusterfs holds our MicroVMs' state. They *must always be mounted* or
brains are split.
2022-11-30 21:00:36 +01:00
```bash
gluster volume info
gluster volume status
```
2022-12-22 01:25:02 +01:00
##### Restart glusterd
2022-11-30 21:00:36 +01:00
```bash
systemctl restart glusterd
```
2022-12-22 01:25:02 +01:00
##### Remount volumes
2022-11-30 21:00:36 +01:00
```bash
systemctl restart /glusterfs/fast
systemctl restart /glusterfs/big
```
2022-12-22 01:25:02 +01:00
#### Nomad
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
##### Check the cluster state
2022-11-30 21:00:36 +01:00
```shell
nomad server members
```
2022-12-03 01:10:15 +01:00
Nomad *servers* **coordinate** the cluster.
Nomad *clients* **run** the tasks.
2022-12-22 01:25:02 +01:00
##### Browse in the terminal
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
[wander](https://github.com/robinovitch61/wander) and
[damon](https://github.com/hashicorp/damon) are nice TUIs that are
preinstalled on our cluster nodes.
2022-11-30 21:00:36 +01:00
2022-12-22 01:25:02 +01:00
##### Browse with a browser
2022-11-30 21:00:36 +01:00
First, tunnel TCP port `:4646` from a cluster server:
```bash
ssh -L 4646:localhost:4646 root@server10.cluster.zentralwerk.org
```
Then, visit https://localhost:4646 for for full klickibunti.
2022-12-22 01:25:02 +01:00
##### Reset the Nomad state on a node
2022-11-30 21:00:36 +01:00
2022-12-03 01:10:15 +01:00
After upgrades, Nomad servers may fail rejoining the cluster. Do this
to make a *Nomad server* behave like a newborn:
2022-11-30 21:00:36 +01:00
```shell
systemctl stop nomad
rm -rf /var/lib/nomad/server/raft/
systemctl start nomad
```
2022-12-22 01:25:02 +01:00
## Secrets management
2020-01-01 13:40:42 +01:00
2022-12-22 01:25:02 +01:00
### Secrets Management Using `sops-nix`
2022-01-07 00:28:59 +01:00
2022-12-22 01:25:02 +01:00
#### Adding a new host
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
Edit `.sops.yaml`:
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
1. Add an AGE key for this host. Comments in this file tell you how to do it.
2. Add a `creation_rules` section for `host/$host/*.yaml` files
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
#### Editing a hosts secrets
2022-01-08 21:22:24 +01:00
2022-12-22 01:25:02 +01:00
Edit `.sops.yaml` to add files for a new host and its SSH pubkey.
2022-01-07 00:28:59 +01:00
2022-01-08 21:22:24 +01:00
```bash
# Get sops
2022-01-07 00:28:59 +01:00
nix develop
2022-01-08 21:22:24 +01:00
# Decrypt, start en EDITOR, encrypt
2022-01-07 00:28:59 +01:00
sops hosts/.../secrets.yaml
2022-01-08 21:22:24 +01:00
# Push
2022-12-22 01:25:02 +01:00
git commit -a -m Adding new secrets
git push origin
2022-01-07 00:28:59 +01:00
```
2022-12-22 01:25:02 +01:00
### Secrets management with PGP
2020-01-01 13:40:42 +01:00
2022-12-22 01:25:02 +01:00
Add your gpg-id to the .gpg-id file in secrets and let somebody reencrypt it for you.
Maybe this works for you, maybe not. I did it somehow:
```bash
PASSWORD_STORE_DIR=`pwd` tr '\n' ' ' < .gpg-id | xargs -I{} pass init {}
```
Your gpg key has to have the Authenticate flag set. If not update it and push it to a keyserver and wait.
This is necessary, so you can login to any machine with your gpg key.
## Laptops / Desktops
2022-12-21 19:43:47 +01:00
2021-02-22 12:31:58 +01:00
This repository contains a NixOS module that can be used with personal machines
as well. This module appends `/etc/ssh/ssh_known_hosts` with the host keys of
registered HQ hosts, and optionally appends `/etc/hosts` with static IPv6
addresses local to HQ. Simply import the `lib` directory to use the module. As
2020-01-01 13:40:42 +01:00
an example:
```nix
# /etc/nixos/configuration.nix
{ config, pkgs, lib, ... }:
let
2022-12-22 01:25:02 +01:00
# Using a flake is recommended instead
c3d2Config = builtins.fetchGit { url = "https://gitea.c3d2.de/C3D2/nix-config.git"; };
2020-01-01 13:40:42 +01:00
in {
imports = [
"${c3d2Config}/modules/c3d2.nix"
2020-01-01 13:40:42 +01:00
];
c3d2 = {
2022-12-22 01:25:02 +01:00
...
2020-01-01 13:40:42 +01:00
};
}
```