MariaDB Operator Upgrade Runbook
Unified procedure for upgrading the MariaDB Operator Helm chart through the required progressive upgrade path. Sequential upgrades are mandatory due to CRD evolution and changes in replication and backup behaviour across versions.
Never Delete CRDs
Never delete CRDs during the upgrade — doing so will delete the MariaDB database pods. Uninstalling the operator Helm release alone does not cause DB downtime.
Prerequisites
Check current installed versions
helm list -A | grep mariadb
helm status mariadb-operator -n mariadb-system
helm status mariadb-operator-crds -n mariadb-system
Update Helm repo and list available chart versions
helm repo update mariadb-operator
helm search repo mariadb-operator/mariadb-operator --versions | head -20
Pre-Upgrade Checks
Repeat Before Every Step
Run these checks before every upgrade step in the path.
1. Retrieve MariaDB root password
export MARIADB_ROOT_PASSWORD=$(kubectl get secret mariadb -n openstack \
-o jsonpath='{.data.root-password}' | base64 -d)
2. Create full database backup
Identify the primary pod first, then run the backup from it:
PRIMARY_POD=$(kubectl get mariadb mariadb-cluster -n openstack -o jsonpath="{.status.currentPrimary}")
echo "Primary pod: $PRIMARY_POD"
kubectl exec -i "$PRIMARY_POD" -n openstack -- mariadb-dump \
-u root -p"$MARIADB_ROOT_PASSWORD" \
--all-databases \
--single-transaction \
--routines \
--triggers > mariadb-cluster-full-backup-$(date +%Y%m%d-%H%M).sql
Verify the backup:
3. Check database sizes
kubectl exec -it "$PRIMARY_POD" -n openstack -- mariadb -u root -p"$MARIADB_ROOT_PASSWORD" -e "
SELECT
table_schema AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size_MB'
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'performance_schema', 'mysql', 'sys')
GROUP BY table_schema
ORDER BY Size_MB DESC;"
4. Verify cluster health
Determine your cluster topology and run the appropriate checks:
kubectl exec -it "$PRIMARY_POD" -n openstack -- mariadb -u root -p"$MARIADB_ROOT_PASSWORD" -e "
SHOW STATUS LIKE 'wsrep_cluster_size';
SHOW STATUS LIKE 'wsrep_cluster_status';
SHOW STATUS LIKE 'wsrep_ready';"
Expected Values
| Variable | Expected |
|---|---|
wsrep_cluster_size |
3 |
wsrep_cluster_status |
Primary |
wsrep_ready |
ON |
How to identify your topology
Check the MariaDB CR spec:
- Returns
true→ Galera cluster - Returns empty/false → Primary/Replica (replication)
5. Verify cluster and pod status
kubectl get mariadb -n openstack
kubectl get pods -l app.kubernetes.io/name=mariadb -n openstack
kubectl get mariadb -n openstack -o yaml | grep -B1 autoFailover #Read "Disclaimer" below
kubectl get crd | grep mariadb
Disclaimer
Only present on versions >= 25.10.4
6. Verify current operator, webhook, and MariaDB image versions
Per-Version Upgrade Procedure
Repeat this procedure for each version in the upgrade path. Version-specific notes are listed in the next section.
Preflight: Update MariaDB image and enable autoUpdateDataPlane
Update the MariaDB image in /opt/genestack/base-kustomize/mariadb-cluster/base/mariadb-replication.yaml
to match the version compatible with the operator chart version being deployed.
With every release update you must update this image before upgrading the
mariadb-cluster (Galera or Replication).
Finding the compatible MariaDB image
Check the config.mariadbImage value in the upstream chart's values.yaml at the
corresponding tag:
https://github.com/mariadb-operator/mariadb-operator/blob/v<VERSION>/deploy/charts/mariadb-operator/values.yaml
For example, for chart version 25.8.4:
https://github.com/mariadb-operator/mariadb-operator/blob/v25.8.4/deploy/charts/mariadb-operator/values.yaml
shows mariadbImage: docker-registry1.mariadb.com/library/mariadb:11.8.2.
Update the image in the base manifest. Below is an example while upgrading to 25.8.4 compliant mariadb image:
Then set autoUpdateDataPlane: true using one of the following methods:
Edit /etc/genestack/kustomize/mariadb-cluster/overlay/kustomization.yaml:
Why autoUpdateDataPlane?
Enabling autoUpdateDataPlane uses a ReplicasFirstPrimaryLast strategy instead of
RollingUpdate. This applies to both cluster topologies:
- Galera: Updates replica nodes first, then the primary. Avoids breaking the quorum by ensuring a majority of nodes remain available during the rollout.
- Primary/Replica: Updates replicas first, then the primary. Prevents write downtime and data inconsistency by keeping the primary running until all replicas are updated and healthy.
In both cases, updating the primary first risks downtime and data inconsistency.
Apply the changes:
Step 1: Scale down operator and remove webhooks
kubectl scale deployment mariadb-operator -n mariadb-system --replicas=0
kubectl scale deployment mariadb-operator-webhook -n mariadb-system --replicas=0
kubectl delete validatingwebhookconfiguration mariadb-operator-webhook
kubectl delete mutatingwebhookconfiguration mariadb-operator-webhook 2>/dev/null || true
Step 2: Update chart version
Edit /etc/genestack/helm-chart-versions.yaml:
Confirm the version has been set:
Step 3: Uninstall and reinstall operator
Step 4: Verify deployment
helm list -A | grep mariadb
kubectl get pods -n mariadb-system -o wide
kubectl get pods <new-operator-pod> -n mariadb-system \
-o jsonpath="{..image}" | tr -s '[:space:]' '\n' | sort -u
Step 5: Post-upgrade validation
Run the cluster health checks from the pre-upgrade section again.
Step 6: Disable autoUpdateDataPlane
Set autoUpdateDataPlane: false as per the preflight section and re-apply:
Step 7: Run the migration (Replication clusters only)
Replication Migration — Primary/Replica clusters only
This step is only required for Primary/Replica (replication) clusters. If you are running a Galera cluster, skip this step.
After the operator upgrade, you must run the replication migration script to reset and re-establish replication on each replica pod.
The script automatically identifies the primary pod (skips it) and processes only the replicas. For each replica it:
- Runs
STOP SLAVEandRESET SLAVE ALLviakubectl exec - Deletes the pod so it gets recreated
- Waits for it to come back ready
- Verifies replication is working again
Set the required environment variables:
export MARIADB_NAME=mariadb-cluster
export MARIADB_NAMESPACE=openstack
export MARIADB_ROOT_PASSWORD=$(kubectl get secret mariadb -n openstack \
-o jsonpath='{.data.root-password}' | base64 -d)
Save the script to a file (e.g. /tmp/migrate-replication.sh):
migrate-replication.sh (click to expand)
#!/bin/bash
set -eo pipefail
if [[ -z "$MARIADB_NAME" || -z "$MARIADB_NAMESPACE" || -z "$MARIADB_ROOT_PASSWORD" ]]; then
echo "Error: MARIADB_NAME, MARIADB_NAMESPACE and MARIADB_ROOT_PASSWORD env vars must be set."
exit 1
fi
function exec_sql {
local pod=$1
local sql=$2
kubectl exec -n "$MARIADB_NAMESPACE" "$pod" -- mariadb -u root -p"$MARIADB_ROOT_PASSWORD" -e "$sql"
}
function wait_for_ready_replication {
local pod=$1
local timeout=300 # 5 minutes
local interval=10 # Check every 10 seconds
local elapsed=0
echo "Waiting for ready replication on $pod..."
while [[ $elapsed -lt $timeout ]]; do
local status
status=$(exec_sql "$pod" "SHOW REPLICA STATUS\G" | tee /tmp/replication_status_$pod_$MARIADB_NAMESPACE.txt)
if grep -q "Slave_IO_Running: Yes" /tmp/replication_status_$pod_$MARIADB_NAMESPACE.txt && \
grep -q "Slave_SQL_Running: Yes" /tmp/replication_status_$pod_$MARIADB_NAMESPACE.txt; then
echo "Replication is ready on $pod."
return 0
fi
echo "Replication not ready on $pod. Retrying in $interval seconds..."
sleep $interval
((elapsed+=interval))
done
echo "Error: Replication did not become ready on $pod within 5 minutes."
exit 1
}
echo "Migrating replication on $MARIADB_NAME instance..."
PODS=$(kubectl get pods -n "$MARIADB_NAMESPACE" \
-l app.kubernetes.io/instance=$MARIADB_NAME \
-o jsonpath="{.items[*].metadata.name}")
PRIMARY_POD=$(kubectl get mariadb "$MARIADB_NAME" -n "$MARIADB_NAMESPACE" \
-o jsonpath="{.status.currentPrimary}")
for POD in $PODS; do
if [[ "$POD" == "$PRIMARY_POD" ]]; then
printf "\nSkipping primary pod: $POD\n"
continue
fi
printf "\nProcessing replica pod: $POD\n"
echo "Resetting replication on $POD..."
exec_sql "$POD" "STOP SLAVE 'mariadb-operator';"
exec_sql "$POD" "RESET SLAVE 'mariadb-operator' ALL;"
echo "Deleting pod $POD..."
kubectl delete pod "$POD" -n "$MARIADB_NAMESPACE"
echo "Waiting for pod $POD to become ready..."
kubectl wait --for=condition=Ready pod/"$POD" -n "$MARIADB_NAMESPACE" --timeout=5m
echo "Pod $POD is ready."
wait_for_ready_replication "$POD"
done
echo "Replication migration completed successfully on $MARIADB_NAME instance."
Make it executable and run:
Stuck replication threads
If stuck replication threads are observed, identify and kill them:
Version-Specific Notes
0.37.1
Skip 0.37.0
References
Follow the standard procedure above. No special steps required.
0.38.1
Skip 0.38.0
25.8.4
Skip 25.08.0
Patch-by-patch upgrade is NOT required for 25.8.x
You do not need to step through each patch releases in between 25.8.x - using the above standard procedure:
If you are migrating from 0.38.1, check the helm search repo mariadb-operator/mariadb-operator --versions output:
mariadb-operator/mariadb-operator 25.8.4 25.8.4 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 25.8.3 25.8.3 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 25.8.2 25.8.2 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 25.8.1 25.8.1 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 25.08.0 25.08.0 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 0.38.1 0.38.1 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 0.38.0 0.38.0 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 0.37.1 0.37.1 Run and operate MariaDB in a cloud native way
mariadb-operator/mariadb-operator 0.37.0 0.37.0 Run and operate MariaDB in a cloud native way
and you can directly jump from 0.38.1 → 25.8.4
Kubernetes version requirement
25.8.4 requires Kubernetes >= 1.34.
Known Issue: Webhook cert validation loop
If operator pods are stuck logging:
Check the certificate:
If you see Existing private key is not up to date for spec: [spec.privateKey.algorithm],
fix by deleting the secret and webhook deployment:
kubectl delete secret mariadb-operator-webhook-cert -n mariadb-system
kubectl delete deployment mariadb-operator-webhook -n mariadb-system
Then re-run the install script to recreate them.
25.10.4
Skip 25.10.0, You do not need to step through each patch releases in between 25.8.x - using the above standard procedure.
Patch-by-patch upgrade is NOT required for 25.10.x
You do not need to step through each patch releases in between 25.8.x - using the above standard procedure:
For 25.10.4, Follow the standard procedure.
No special steps required beyond the standard preflight and upgrade.
26.3.0
References
Breaking Change: syncBinlog type change (Replication clusters)
syncBinlog changed from boolean to integer. This affects
Primary/Replica (replication) clusters. Galera clusters without
replication in their spec are not affected.
If the existing MariaDB CR has syncBinlog: true, the new webhook will
reject patches. Remove the webhook before patching:
kubectl delete validatingwebhookconfiguration mariadb-operator-webhook
kubectl -n openstack patch mariadb mariadb-cluster --type merge \
-p '{"spec":{"replication":{"syncBinlog":1}}}'
Then re-run the install script to restore the webhook.
Breaking Change: Image configuration format
Image configuration in Helm values changed from string to structured format:
Ensure /etc/genestack/helm-configs/mariadb-operator/mariadb-operator-helm-overrides.yaml
uses the new format for maxscaleImage, exporterImage, exporterMaxscaleImage image sections before upgrading to 26.3.0.
New CRD
A new CRD is added with this version: pointintimerecoveries.k8s.mariadb.com
Troubleshooting
Webhook blocks patches due to old stored values
If the webhook rejects changes because the existing resource in etcd has
old-format values (e.g. syncBinlog: true), temporarily remove the webhook:
kubectl scale deployment mariadb-operator -n mariadb-system --replicas=0
kubectl scale deployment mariadb-operator-webhook -n mariadb-system --replicas=0
kubectl delete validatingwebhookconfiguration mariadb-operator-webhook
kubectl delete mutatingwebhookconfiguration mariadb-operator-webhook 2>/dev/null || true
Apply the fix, then re-run the install script to restore everything.
Webhook cert validation loop
If operator pods are stuck in a Validating certs loop, delete the webhook
cert secret and deployment, then re-run the install script: