MicroK8s for LLM- and other PoC’s with https integration.
This is a detailed step-by-step guide to deploying a Kubernetes cluster with MicroK8s on a dedicated machine tailored for proof-of-concept (PoC) projects. At first it might be look complicated for Kubernetes newbies.But when you have finished the round trip and set up ArgoCD you will be excited, promise. The setup includes key components such as:
- HTTPS Configuration: Implemented using Cert-Manager with Let’s Encrypt for real-world security and credibility.
- Multi-Domain Ingress: Supporting multiple applications or services under a single ingress controller.
- Application Deployment: Flexible setups using Dockerized applications, enabling support for various frameworks and languages.
- Autoscaling: Configured with Horizontal Pod Autoscalers (HPA) to handle variable workloads effectively.
The lack of handling helm charts is intended, because of the learning strategy. However for the app deployment you can easily generate a simple helm template what covers the parametrization.
1. Why This Approach?
A. Cost Efficiency
Using MicroK8s allows for a much cheaper alternative to hyperscalers (AWS, GCP, Azure) when running PoC projects. You avoid paying hourly or monthly fees for cloud-managed Kubernetes clusters while still benefiting from a full-featured Kubernetes environment on your local or on-premise setup. A decent dedicated 128 GB machine with a GPU is available for $150–200/month. On a hyperscaler you pay a multiple.
B. Real HTTPS for Credibility
Having HTTPS configured using Cert-Manager with Let’s Encrypt ensures that your projects are secure and showcase-ready. Many PoC projects fail to gain traction due to a lack of real-world deployment standards like HTTPS. This guide ensures that your Kubernetes setup is production-like, helping you demonstrate your work to stakeholders and users without compromising on best practices.
C. Early Detection of Roadblocks
Deploying applications in a real Kubernetes cluster early in the development lifecycle helps identify potential issues, such as:
- Configuration mismatches
- Networking challenges
- Resource scaling limitations
This proactive approach saves time and ensures that the transition from PoC to production is smoother.
D. Flexibility with Frameworks and Languages
This guide leverages the Dockerized approach, enabling you to deploy applications built in different languages and frameworks. Whether you’re using Streamlit, Chainlit, or any other technology, Kubernetes allows you to orchestrate and manage them efficiently. You can integrate tools built in Python, JavaScript, or any other language with ease.
E. Showcasing Versatility
With multi-domain ingress and containerized deployments, you can run multiple independent PoC projects simultaneously on the same cluster. For instance:
- A Streamlit application for data visualization
- A Chainlit based chatbot
- APIs or backend services written in Python, Node.js, or Go This makes the setup ideal for
2. Setting Up the Kubernetes Cluster
This steps are automatically executed by the pulumi-script. So you can start with the manifests immediately.
- Install microk8s:
sudo snap install microk8s --classic
sudo snap install kubectl --classic
2. Start microk8s
microk8s start
microk8s status
3. Enable essential add-ons (enable one at a time is best practice)
microk8s enable dns
microk8s enable ingress
microk8s enable storage
4. Use metallb as a builtin loadbalancer for microk8s
3. Deploying the Ingress Controller
Deploy the NGINX ingress controller to handle HTTP/HTTPS traffic.
# ingress-nginx-controller.yaml
# sync wave helps to apply manifests in some order
# ingress-nginx-controller -> letsencrypt -> deployment/service -> ingress
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx-controller
namespace: ingress
annotations:
argocd.argoproj.io/sync-wave: "-1" # https://argo-cd.readthedocs.io/en/stable/user-guide/sync-waves/
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
- name: https
port: 443
targetPort: 443
protocol: TCP
selector:
name: nginx-ingress-microk8s
Apply the configuration:
kubectl apply -f ingress-controller-service.yaml
4. Configuring HTTPS with Cert-Manager
Cert-Manager automates certificate provisioning using Let’s Encrypt.
- Install Cert-Manager:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.10.0/cert-manager.yamlconfigmap
2. Create a cluster issuer and apply it. It’s responsible to get the https certificates.
# cluster-issuer-lets-encrypt.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dev
namespace: default
annotations:
argocd.argoproj.io/sync-wave: "-1"
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-dev
solvers:
- http01:
ingress:
class: nginx
kubectl apply -f cluster-issuer-lets-encrypt.yaml
5. Deploying Application and Service
Here’s an example deployment configuration including the service accessed by the ingress. Very important is not to forget to have the secret from your “registry” applied.
Access to the artifact registry in google cloud you get after create a secret with the credentials from a service-account you have set. If your images are public, you don’t need it.
kubectl create secret docker-registry gcr-json-key \
--docker-server=europe-west3-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat credentials.json)" \
--docker-email=you@mail.com
# app101-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app101-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: app101
template:
metadata:
labels:
app: app101
spec:
containers:
- name: app101
image: your-docker-image:latest
ports:
- containerPort: 80
imagePullSecrets:
- name: gcr-json-key # Not necessary if your images are public.
---
apiVersion: v1
kind: Service
metadata:
name: app101-service
namespace: default
spec:
selector:
app: app101
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
kubectl apply -f app101-deplyoment.yaml
6. Add new *A records* in your domain registration
I use AWS Route 53 for the domain, but it should be always similar simple.Just add the dedicated IP with the intended subdomain.
6. Multi-Domain Ingress Configuration
Configure ingress rules for multiple domains:
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: multi-domain-ingress
namespace: default
annotations:
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: "letsencrypt-dev"
spec:
tls:
- hosts:
- landing-page.api101.net
- app101.api101.net
secretName: multi-domain-tls
rules:
- host: landing-page.api101.net
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: landing-page-service
port:
number: 80
- host: app101.api101.net
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app101-service
port:
number: 8501
kubectl apply -f ingress.yaml
7. Using google-cloud notes
The difference in using a cloud k8s cluster is instead of the microk8s you don’t need the ingress-nginx-controller! You can start in GCP with:
gcloud container clusters create my-cluster \
--region=europe-west3 \
--enable-autoscaling \
--min-nodes=1 --max-nodes=3
gcloud container clusters get-credentials my-cluster --region=europe-west3
8. Argocd
I’d like very much argocd and once you set this up you’ll never miss it. Please checkout: https://argo-cd.readthedocs.io/en/stable/
Anyway here are some hints how to start it:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "NodePort"}}'
kubectl get svc -n argocd
# This is the first login password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
Now you can access with the IP and the PORT the argocd gui.
kubectl get svc -n argocd
- In settings you need to setup your git repo.
- In applications you setup your project with the the repo from above and the proper path to the manifests (infrastructure/manifests).
If everything works out, it looks like the picture at the top.
9. Ollama
For ollama we have also a deployment, service and hpa and for simplicity together in one file.
- Works without issues with a nvida gpu (microk8s enable nvidia)
- For the pv you need to modify the NodeAffinity, if you use another StorageClass modify. Consider enough space for the models, too!
- Stresstests are on my list.
The host url is defined in the config-map (see: configmap). Checkout the neccessary modifications you’ll need.
Ollama Deployment
First we need the ollama-deployment and enable some resource-configuration what the horizontal-pod-scaler uses for scaling actions. And think of a good location to persist the models, because the have demand in GB space.
# ollama-deployment.yaml
kind: Deployment
metadata:
name: ollama-deployment
namespace: default
labels:
app: ollama
environment: production
annotations:
argocd.argoproj.io/sync-wave: "0"
spec:
replicas: 2
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
requests:
cpu: "500m"
memory: "1Gi"
#nvidia.com/gpu: 1
limits:
cpu: "650m"
memory: "2Gi"
#nvidia.com/gpu: 1
volumeMounts:
- name: ollama-data
mountPath: /root/.ollama
volumes:
- name: ollama-data
persistentVolumeClaim:
claimName: ollama-pvc
Ollama Persistent Volume
This is the location where we persist the models and mount it to the local node path. Use enough space, some models have ~ 50+ GB, so far you want use them.
# ollama-pvc-pv.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-pvc
namespace: default
annotations:
argocd.argoproj.io/sync-wave: "0"
spec:
accessModes:
- ReadWriteOnce
storageClassName: "" # Keep empty for manual binding
resources:
requests:
storage: 10Gi # Checkout how much you need for the models!
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: ollama-pv
namespace: default
annotations:
argocd.argoproj.io/sync-wave: "0"
spec:
capacity:
storage: 10Gi # Checkout how much you need for the models!
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: "" # Matches PVC
local:
path: /home/ubuntu/.ollama # Change if neccessary
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- ip-172-31-16-98 # Node where PV is available, e.g. check with "kubectl get nodes -o wide"
Ollama Service
The service is defined with NodePort forward so you can use the ollama-service outside from the cluster. I like it for testing, but accept the security issue!
# ollam-service.yaml
apiVersion: v1
kind: Service
metadata:
name: ollama-service
namespace: default
annotations:
argocd.argoproj.io/sync-wave: "0"
spec:
selector:
app: ollama
type: NodePort
ports:
- protocol: TCP
port: 11434
targetPort: 11434
nodePort: 30000 # Optional if you want to expose the service outside the cluster
So you can use something like this so far the model is already pulled:
curl http://server-ip:30000/api/generate -d '{ "model": "llama3.2" , "prompt": "Explain me Luhmanns system theory in 500 words", "stream": false
}' | jq
Checkout the api in more detail here : ollama-api
Ollama Autoscaler
Here we can experiment with different setups to make the service responsive as possible.
# ollama-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-ollama
namespace: defaultconfigmap
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ollama-deployment
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
Ollama Models
The ollama-service has no models from the start, but with a job manifest it’ll be handled automatically when the service is ready.
# ollama-init-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ollama-init-job
namespace: default
spec:
template:
spec:
containers:
- name: post-install
image: curlimages/curl:latest # Lightweight curl image
command: ["/bin/sh", "-c"]
args:
- |
echo "Waiting for the service to become available...";
while ! nc -z ollama-service 11434; do sleep 1; done;
echo "Service is available, sending requests...";
curl http://ollama-service:11434/api/pull -d '{"model": "llama3.2:1b"}'
curl http://ollama-service:11434/api/pull -d '{"model": "llama3.2:3b"}'
echo "Requests completed.";
restartPolicy: Never
backoffLimit: 4
10. Appendix
The docker image generation is automatically handled with github actions (see: github-actions).
To test the approach I used an EC2 instance build with pulumi (see: ec2-build).