Kubernetes
Tips
StatefulSets
Using a stateful set is very helpful for when you need to have predictable dns names to access pods within your cluster. For example if you service called webserver
in the namespace of samswebfarm
.
Using a statefulset you would have a dns name of webserver.samswebfarm.svc.cluster.local
which would be your loadbalancer address. This allows you to have something to point other services at if/when needed.
If you needed to target the pods then you would have the servicename-0,1,2, etc… So if have a replica set of 2 then your pods would be webserver-0
and webserver-1
which would give you a dns name of webserver-0.samswebfarm.svc.cluster.local
and webserver-1.samswebfarm.svc.cluster.local
. If you had 3 replica's the third would be webserver-2.samswebfarm.svc.cluster.local
and so on.
Intel Quick Sync
Check to see if your nodes have the stuff needed to do intel quick sync
kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{'\n'}{' i915: '}{.status.allocatable.gpu\.intel\.com/i915}{'\n'}"
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md
Troubleshooting
CoreDNS Keeps Crashing
This happens when the system you're running things doesn't have a resolv.conf or there is something different that what is expected. To fix this edit your configmap for coredns
Then replace the section: forward . /etc/resolv.conf
with something like forward . 1.1.1.1
Now delete the CoreDNS Pods, or wait till the pods pickup the update.
https://docs.k0sproject.io/stable/troubleshooting/#coredns-in-crashloop
https://stackoverflow.com/questions/53559291/kubernetes-coredns-in-crashloopbackoff
https://www.turek.dev/posts/disable-systemd-resolved-cleanly/
Bare Metal; the Hard-way
I've often said the best way to learn something is to teach it. The best way to teach it is to understand it. To understand it, you gotta build it. At least that's how I do most things in IT. I can't say I fully understand everything about Kubernetes but I'm damn close to understanding the basics. I understand that this isn't going to be for most people and that most people are completely happy with never building this on bare metal. After all you have to be a little crazy to this. HOWEVER if you actually do it, you'll learn a lot, or at least I did.
Using K0S to Get Closer to Raw K8S
So I'm going to take a shortcut and cheat a little. While yes, I can totally build out k8s using upstream packages. I've found for my purposes using k0s is damn near identical and a hell of a lot faster to get started. While some people may jump into something like k3s, microk8s, etc. they abstract a lot of stuff away from you. For example, do you know how to flannel works? Have you ever deployed it? When it breaks do you understand it enough to fix it? If you've never built it then I'm guessing the answer is no. So let's fix that!
First we need k0sctl, this will let us deploy the config pretty quickly to our nodes, I also recommend k9s while you're at it
Deploying k0s via k0sctl
Once you have that you'll need a config file, you can generate one with k0sctl or just use this one:
---
apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: k0s-cluster
spec:
hosts:
- ssh:
address: 2601:123:456:7890::11
user: root
port: 22
keyPath: ~/.ssh/id_ed25519
role: controller+worker
noTaints: true
- ssh:
address: 2601:123:456:7890::12
user: root
port: 22
keyPath: ~/.ssh/id_ed25519
role: worker
- ssh:
address: 2601:123:456:7890::13
user: root
port: 22
keyPath: ~/.ssh/id_ed25519
role: worker
k0s:
version: null
versionChannel: stable
dynamicConfig: false
config:
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
creationTimestamp: null
name: k0s
spec:
network:
kubeProxy:
mode: ipvs
ipvs:
strictARP: true
Since we'll be deploying metallb I've included strictARP for kubeproxy. This config will download the latest version of k0s onto the nodes and get them setup.
You can deploy it and update your kube config by doing:
mkdir -p ~/.kube
k0sctl apply --config k0sctl.yaml
k0sctl kubeconfig --config k0sctl.yaml | tee ~/.kube/config
I recommend you make an Ansible inventory file to help with deploying and fixing things.
---
all:
hosts:
controller:
ansible_host: 2601:123:456:7890::11
ansible_user: root
worker1:
ansible_host: 2601:123:456:7890::12
ansible_user: root
worker2:
ansible_host: 2601:123:456:7890::13
ansible_user: root
children:
k0s:
hosts:
controller:
worker1:
worker2:
Fixing Your Messes
If for someone reason you need to unfuck yourself if (when) you blow it up. Don't worry, it's easy to fix by just going to the nodes and running k0s reset
and making sure the /etc/k0s
and /var/lib/k0s
dirs are removed. Then give it a reboot.
You can use this playbook to help automate things:
---
- name: Reset & Remove K0S
hosts: k0s
tags: [k0s]
vars:
k0s_services:
- k0scontroller
- k0sworker
k0s_paths:
- /var/lib/k0s
- /etc/k0s
handlers:
- name: Reboot
ansible.builtin.reboot:
tasks:
- name: Reset & Remove K0S
notify: Reboot
ignore_errors: true
block:
- name: Run Stop Command
ansible.builtin.command: "k0s stop"
- name: Stop K0S Services if still running
ansible.builtin.systemd:
service: "{{ item }}"
state: stopped
loop: "{{ k0s_services }}"
register: k0s_stop_results
until: k0s_stop_results is success
retries: 1
delay: 5
- name: Run Reset Command
ansible.builtin.command: "k0s reset"
- name: Remove Configs
ansible.builtin.file:
path: "{{ item }}"
state: absent
force: true
loop: "{{ k0s_paths }}"
At this point you have a full cluster ready to go. You can deploy whatever you want to it, use it, abuse it, destroy it, etc.
Helm Charts
However I'm guessing we'll want to install some helm charts. In this case I'm to be using traefik and metallb to handle my load balancing & cluster ip. I also will be using openebs and nfs for data storage.
Just because we're doing this hard way, doesn't mean we don't have to do things the hard way. You can run everything manually however I'm going to use Ansible to add my repos and deploy my charts.
---
- name: Helm Repos, Plugins. & Charts
hosts: k0s
tags: [k0s]
vars:
local_user_name: samshamshop
nfs_server: nfs.server=nfs.samsfantastichams.org
nfs_path: nfs.path=/mnt/nfs/datavol
tasks:
- name: Add Helm Repos & Plugins
become: true
become_user: "{{ local_user_name }}"
delegate_to: localhost
run_once: true
block:
- name: Install Helm env plugin
kubernetes.core.helm_plugin:
plugin_path: https://github.com/adamreese/helm-env
state: present
- name: Install Helm diff plugin
kubernetes.core.helm_plugin:
plugin_path: https://github.com/databus23/helm-diff
state: present
- name: Add nfs-subdir-external-provisioner repository
kubernetes.core.helm_repository:
name: nfs-subdir-external-provisioner
repo_url: https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
- name: Add traefik repository
kubernetes.core.helm_repository:
name: traefik
repo_url: https://traefik.github.io/charts
- name: Add cert-manager repository
kubernetes.core.helm_repository:
name: jetstack
repo_url: https://charts.jetstack.io
- name: Add openebs repository
kubernetes.core.helm_repository:
name: openebs
repo_url: https://openebs.github.io/charts
- name: Add longhorn repository
kubernetes.core.helm_repository:
name: longhorn
repo_url: https://charts.longhorn.io
- name: Deploy Helm Charts
become: true
become_user: "{{ local_user_name }}"
delegate_to: localhost
run_once: true
block:
- name: Deploy cert-manager
kubernetes.core.helm:
name: cert-manager
chart_ref: jetstack/cert-manager
release_namespace: cert-manager
create_namespace: true
set_values:
- value: installCRDs=true
- name: Deploy MetalLB
kubernetes.core.helm:
name: metallb
chart_ref: metallb/metallb
namespace: metallb-system
create_namespace: true
- name: Deploy traefik
kubernetes.core.helm:
name: traefik
chart_ref: traefik/traefik
namespace: traefik
create_namespace: true
- name: Deploy OpenEBS cStor
kubernetes.core.helm:
name: openebs
chart_ref: openebs/openebs
namespace: openebs
create_namespace: true
set_values:
- value: cstor.enabled=true
- name: Deploy nfs-subdir-external-provisioner
# This fails if run a second time, hence the ignore
# https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner
kubernetes.core.helm:
name: nfs-subdir-external-provisioner
chart_ref: nfs-subdir-external-provisioner/nfs-subdir-external-provisioner
namespace: kube-system
create_namespace: true
set_values:
- value: "{{ nfs_server }}"
- value: "{{ nfs_path }}"
ignore_errors: true
Give that some time to work though everything and come up. It's going to take a while. You can use k9s
to monitor things in the mean time. Once everything is up and running for the most part we can move on.
OpenEBS Configmap Edits
For my purposes openebs does not support lvm out of the box. So we need to tweak the configmap. If you're using LVM or something like that you'll need to do this to make cStor / NDM work for you.
You'll want to remove /dev/dm-
from the path-filter excludes.
- key: path-filter
name: path filter
state: true
include: ""
exclude: "/dev/loop,/dev/fd0,/dev/sr0,/dev/ram,/dev/md,/dev/rbd,/dev/zd"
This will enable you to use lvm volumes. Though if you have attached raw disks, then you really shouldn't need this.
Traefik Dashboard & Metallb Pools
Now that we have openebs sorted we need to give ourselves someway to access the cluster, since we're on bare metal, let's use metallb and traefik!
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: metal-pool
namespace: metallb-system
spec:
addresses:
- 10.10.10.20-10.10.10.40
- 2601:123:456:7890::1000:1/64
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: metal-l2
namespace: metallb-system
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: dashboard
spec:
entryPoints:
- web
routes:
- match: PathPrefix(`/dashboard`) || PathPrefix(`/api`)
kind: Rule
services:
- name: api@internal
kind: TraefikService
We can apply this with kubectl by doing
Or we can stick theme of Ansible, because once we have it done in ansible, we can just add more without adding more work.
---
- name: Let's deploy some stuff!
hosts: k0s
tags: [k0s]
vars:
local_user_name: samshamshop
k8s_dir: "{{ playbook_dir }}/../k8s"
tasks:
- name: Run k8s configs
become: true
become_user: "{{ local_user_name }}"
delegate_to: localhost
run_once: true
block:
- name: Install metallb pools
kubernetes.core.k8s:
src: "{{ k8s_dir }}/metallb-pool.yaml"
state: present
namespace: metallb-system
- name: Install traefik dashboard
kubernetes.core.k8s:
src: "{{ k8s_dir }}/traefik-dashboard.yaml"
state: present
namespace: traefik
At this point we have a basic cluster setup and you can start doing whatever you want!
OPENEBS:
https://openebs.io/docs/user-guides/cstor
https://openebs.io/docs/user-guides/cstor/launch-sample-application
https://openebs.io/docs/troubleshooting/volume-provisioning
NFS:
https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner
K0S
https://github.com/k0sproject/k0sctl
https://docs.k0sproject.io
Metallb
https://metallb.universe.tf/
Traefik
https://doc.traefik.io/traefik/
Ansible
https://docs.ansible.com/ansible/latest/collections/kubernetes/core/index.html#plugins-in-kubernetes-core
Cert-Manager
https://cert-manager.io/docs/installation/helm/