infra/README.md

154 lines
4.4 KiB
Markdown
Raw Normal View History

# nix-community infrastructure
2020-05-03 15:10:15 +02:00
Welcome to the Nix Community infrastructure project. This project holds all
the NixOS and Terraform configuration for this organization.
2020-05-03 15:10:15 +02:00
## Support
2019-08-12 17:24:59 +00:00
2021-07-17 18:00:26 +02:00
If you hit any issues, ping us on Matrix in the
[nix-community](https://matrix.to/#/!PbtOpdWBSRFbEZRLIf:numtide.com?via=numtide.com&via=nixos.dev)
room (see the admin list below) or create an issue here:
2020-05-03 15:10:15 +02:00
[New Issue](https://github.com/nix-community/infra/issues/new).
2019-08-12 17:24:59 +00:00
2020-05-03 15:10:15 +02:00
### Administrators
2020-03-26 18:00:49 +01:00
2020-04-02 16:35:03 -07:00
* @adisbladis
2020-03-26 18:00:49 +01:00
* @flokli
* @grahamc
* @Mic92
2020-03-26 18:00:49 +01:00
* @nlewo
* @ryantm
* @zimbatm
2019-08-12 17:24:59 +00:00
## Services
2020-05-03 15:10:15 +02:00
* BuildKite agent - on build01
* GitLab agent - on build01
* hound - on build01
* https://hydra.nix-community.org - on build01
* marvin-mk2 - on build01
* matterbridge - on build01
* ryantm-updater bot - on build02
2020-05-03 15:10:15 +02:00
## Hosts
### `build01` ![build01](https://healthchecks.io/badge/c9e58e14-c706-4084-959b-17b06fbd124f/QFBOLbO1/build01.svg)
This machine is perfect for running heavy builds.
* Provider: Hetzner
* CPU: AMD Ryzen 7 1700X Eight-Core Processor
* RAM: 64GB
* Drives: 2 x 512 GB SATA SSD
### `build02`
This machine currently just runs r-ryantm/nixpkgs-update.
* Provider: Hetzner
* CPU: AMD Ryzen 7 3700X Eight-Core Processor
* RAM: 64GB DDR4 ECC
* Drives: 2 x 1 TB NVME in RAID 1
2020-05-03 15:10:15 +02:00
### `build03`
This machine is a replacement for build01.
* Provider: Hetzner
* CPU: AMD Ryzen 5 3600 6-Core Processor
* RAM: 64GB DDR4 ECC
* Drives: 2 x 512 TB NVME in RAID 1
2021-08-18 08:55:14 +02:00
### `build04`
This machine is meant as an aarch64 builder for our hydra instance running on build03.
* Provider: Oracle cloud
* Instance type: [Ampere A1 Compute](https://www.oracle.com/cloud/compute/arm/)
* CPU: 4 VCPUs on an Ampere Altra (arm64)
* RAM: 24GB
* Drives: 200 GB Block
2021-01-20 20:25:19 -08:00
## Cache
All the builds on these machines are pushed to https://nix-community.cachix.org/
2020-05-03 15:10:15 +02:00
Thanks to Cachix for sponsoring our binary cache!
## File hierarchy
2020-05-03 15:10:15 +02:00
* ./build\d+ - build machines
* ./ci.sh - What is executed by CI
* ./deploy - NixOps deploy script
* ./nix - pinned Nix dependencies and overlays
* ./roles - shared NixOS configuration modules
* ./secrets - git-crypt encrypted secrets
* ./services - single instances of NixOS services
2020-05-03 15:10:15 +02:00
* ./terraform - Setup DNS
* ./users - NixOS configuration of our admins
2020-05-03 15:10:15 +02:00
2021-03-20 06:17:29 +01:00
## Deployment commands:
```console
$ ./deploy
```
If you want to reboot a machine, use the following
command to also deploy secrets afterwards:
```console
$ ./deploy --force-reboot --include build02
```
## Install/Fix system from Hetzner recovery mode
2021-05-11 18:46:11 +02:00
1. Format and/or mount all filesystems to /mnt:
``` console
# format disk with as follow:
# - partition 1 will be the boot partition, needed for legacy (BIOS) boot
# - partition 2 is for boot partition
# - partition 3 takes up the rest of the space and is for the system
2021-05-12 20:27:38 +02:00
$ sgdisk -n 1:2048:4095 -n 2:4096:+2G -N 3 -t 1:ef02 -t 2:8304 -t 3:8304 /dev/nvme0n1
$ sgdisk -n 1:2048:4095 -n 2:4096:+2G -N 3 -t 1:ef02 -t 2:8304 -t 3:8304 /dev/nvme1n1
2021-05-11 18:46:11 +02:00
# create mdadm raid for /boot with ext4
2021-05-12 20:27:38 +02:00
$ mdadm --create --verbose /dev/md127 --raid-devices=2 --level=1 /dev/nvme{0,1}n1p2
$ mkfs.ext4 -F /dev/md127
2021-05-11 18:46:11 +02:00
# format zpool
2021-05-12 20:27:38 +02:00
# use partuuids as they are more stable than device names
$ ls -la /dev/disk/by-partuuid/
$ zpool create zroot -O acltype=posixacl -O xattr=sa -O compression=lz4 mirror /dev/disk/by-partuuid/long-uuid1 /dev/disk/by-partuuid/long-uuid2
2021-05-11 18:46:11 +02:00
$ zpool create zroot -O acltype=posixacl -O xattr=sa -O compression=lz4 mirror /dev/nvme{0,1}n1p3
$ zfs create -o mountpoint=none zroot/root
$ zfs create -o mountpoint=legacy zroot/root/nixos
$ zfs create -o mountpoint=legacy zroot/root/home
# and finally mount
$ mount -t zfs zroot/root/nixos /mnt
$ mkdir /mnt/{home,boot}
$ mount -t zfs zroot/root/home /mnt/home
$ mount -t ext4 /dev/md127 /mnt/boot
```
2021-05-11 16:57:21 +02:00
2. Install kexec image from Hetzner recovery system as described in [kexec.nix](roles/kexec.nix) and boot into it
3. Download infra repo
``` console
$ nix-shell -p git --run "git clone https://github.com/nix-community/infra && cd infra && nix-shell"
2021-05-11 18:46:11 +02:00
# Just in case generate hardware-configuration.nix and compare it with what we have in the repos
$ nixos-generate-config --root /mnt
$ diff -aur /mnt/etc/nixos/hardware-configuration.nix buildXX/hardware-configuration.nix
```
2021-05-12 20:27:38 +02:00
4. Build and install system
2021-05-12 20:27:38 +02:00
```console
$ nixos-install --system $(nix-build -A buildXX-system)
```
2021-05-12 20:27:38 +02:00
### Debug VM
You can start a vm from the rescue system in order to debug the boot:
```console
2021-05-12 20:27:38 +02:00
$ nix-shell -p qemu_kvm --run 'qemu-kvm -m 10G -hda /dev/sda -hdb /dev/sdb -curses -cpu host -enable-kvm'
```