Track A: Superpowers for Helm Charts
Duration: 90 minutes Track: A — Kubernetes / Helm
Introduction
In this lab, you apply the Superpowers workflow — brainstorm, TDD, implement, debug, verify, and code review — to a real IaC project. You will NOT copy-paste templates. You will generate production Helm chart improvements from scratch, using structured context as your only starting point.
What you are doing: Adding five production-hardening resources to the existing reference app Helm chart:
HorizontalPodAutoscaler— autoscaling under loadPodDisruptionBudget— availability guarantee during node drainServiceMonitor— Prometheus metrics discovery- Resource limits — prevent unbounded pod resource consumption
NOTES.txt— post-install usage instructions
What you will NOT have: Starter files. Per the Superpowers approach, you start with context — a project CLAUDE.md — and let the AI generate everything. The structured context you write is your real work product.
A bare prompt like "add HPA to my Helm chart" produces something generic that probably won't lint. A CLAUDE.md with system state, existing values, constraints, and gaps produces something that works the first time — or close to it. This lab demonstrates that gap in action.
Prerequisites:
- Helm v3.x installed (
helm version --short) - kubectl installed and KIND cluster running (from setup guide)
- Reference app deployed to KIND cluster
- Claude Code (or Crush) configured and ready
Phase 0: Setup + Context (10 min)
Step 1 — Verify tools
helm version --short
Expected result: v3.x.x+... (version 3.x.x with some build metadata)
kubectl cluster-info
Expected result: Kubernetes control plane is running at https://127.0.0.1:... (your KIND cluster)
If kubectl cluster-info shows a cloud cluster instead of a local one, switch your context: kubectl config use-context kind-kind (or your KIND cluster context name).
Step 2 — Navigate to the chart directory
cd reference-app/helm/reference-app
ls templates/
Expected result: You should see deployment, service, and configmap files for api-gateway, catalog, worker, and dashboard — 13+ template files total.
Step 3 — Create the project CLAUDE.md
This is your "starter." In the Superpowers workflow, you do not start with a pre-written template — you start with structured context that tells the AI exactly what the project is, what is already there, and what is missing.
Create a file called CLAUDE.md in the chart directory (reference-app/helm/reference-app/CLAUDE.md):
# Helm Chart Hardening — Reference App
## System State
- Existing chart: reference-app/helm/reference-app/
- Chart version: 1.0.0 (apiVersion: v2)
- Services: api-gateway (8080), catalog (8081), worker (8082), dashboard (3000, NodePort 30080)
- Current state: resource requests set (cpu: 50m, memory: 64Mi), liveness/readiness probes configured
- Deployment target: KIND cluster (local development)
## What Is Missing (Production Gaps)
- No resource limits (only requests) — pods can consume unlimited resources
- No HorizontalPodAutoscaler — no autoscaling under load
- No PodDisruptionBudget — no availability guarantee during node drain
- No ServiceMonitor — Prometheus cannot discover service metrics
- No NOTES.txt — helm install shows no usage instructions
## Constraints
- Must pass helm lint with zero errors and zero warnings
- Must pass kubectl apply --dry-run=client on rendered output
- All new resources must be toggleable via values.yaml (enabled: true/false pattern)
- Resource limits should be 2x the existing requests (cpu: 100m, memory: 128Mi)
- K8s API versions: use autoscaling/v2 for HPA (not v2beta2, removed in K8s 1.26+)
- HPA apiVersion autoscaling/v2 requires K8s 1.23+; KIND v0.31 ships with K8s 1.32
Expected result: CLAUDE.md created. The AI can now understand exactly what the project is, what already works, and what you need it to build.
Writing the CLAUDE.md is the most important step. It encodes: system vocabulary, existing state, gap analysis, and hard constraints. This is context engineering — not prompt writing. A good CLAUDE.md makes the AI generate correct code. A weak one wastes 20 minutes of debug time.
Phase 1: Brainstorm (15 min)
Goal: What needs to change?
Open Claude Code (or Crush) in the chart directory and ask:
Read the CLAUDE.md and the existing chart at reference-app/helm/reference-app/.
List every production gap, categorize by severity (critical/important/nice-to-have),
and propose a prioritized implementation order with reasons.
Expected result: The AI produces a ranked analysis. Your job is to review it and adjust priorities based on your actual constraints.
What to verify in the AI's output:
- Gap list includes at minimum: resource limits, HPA, PDB, ServiceMonitor, NOTES.txt
- It understands the existing templates reference
apiGateway,catalog,worker,dashboardsections in values.yaml - It recognizes the enablement pattern requirement (toggling via values.yaml)
Do NOT accept generic output. If the AI says "add resource limits" without referencing the specific cpu/memory values from values.yaml, ask it to be more specific. Push back with: "Reference the exact current values from values.yaml and propose specific new limit values."
If you're completing this lab self-paced without an instructor, spend 3-5 minutes writing down your own gap analysis BEFORE asking the AI. Then compare. This builds your pattern recognition for IaC production readiness — a skill that transfers to every future chart you work with.
Key teaching point
The brainstorm produces a plan, not code. The AI's analysis quality depends entirely on the CLAUDE.md context you provided. Notice: if you had just asked "what should I add to my Helm chart?" with no CLAUDE.md, the AI would give you a generic list. With the CLAUDE.md, it gives you context-specific, constraint-aware recommendations.
Phase 2: TDD — RED (20 min)
Goal: Define what success looks like before writing code
The TDD "test" for Helm charts uses helm lint and helm template | kubectl apply --dry-run=client as the verification toolchain. No extra test frameworks needed — these tools already exist in your workflow.
Step 1 — Establish baseline
Run the current chart through lint to confirm it passes before you change anything:
helm lint reference-app/helm/reference-app/
Expected result: 1 chart(s) linted, 0 chart(s) failed with no warnings.
If this fails, stop and fix lint errors first. Never add code to a broken baseline.
Step 2 — Define success criteria
Before generating any templates, write a checklist of what the enhanced chart MUST produce. This is your TDD test definition:
helm templateoutput MUST contain aHorizontalPodAutoscalerresourcehelm templateoutput MUST contain aPodDisruptionBudgetresourcehelm templateoutput MUST contain aServiceMonitorresourcehelm install --dry-runoutput MUST show NOTES.txt content- All deployments MUST have resource limits set
Step 3 — Create the verification script
Create verify-chart.sh in the chart root:
#!/bin/bash
set -e
CHART_DIR="reference-app/helm/reference-app"
ERRORS=0
echo "=== Helm Chart Verification ==="
# Lint check
echo "Running helm lint..."
helm lint "$CHART_DIR" || { echo "FAIL: helm lint failed"; ERRORS=$((ERRORS+1)); }
# Render the chart
RENDERED=$(helm template test-release "$CHART_DIR" -f "$CHART_DIR/values.yaml" 2>/dev/null)
# Check for required resources
echo "Checking for HorizontalPodAutoscaler..."
echo "$RENDERED" | grep -q "kind: HorizontalPodAutoscaler" || { echo "FAIL: no HPA found"; ERRORS=$((ERRORS+1)); }
echo "Checking for PodDisruptionBudget..."
echo "$RENDERED" | grep -q "kind: PodDisruptionBudget" || { echo "FAIL: no PDB found"; ERRORS=$((ERRORS+1)); }
echo "Checking for ServiceMonitor..."
echo "$RENDERED" | grep -q "kind: ServiceMonitor" || { echo "FAIL: no ServiceMonitor found"; ERRORS=$((ERRORS+1)); }
# Dry-run against cluster
echo "Running kubectl dry-run..."
echo "$RENDERED" | kubectl apply --dry-run=client -f - 2>&1 | tail -5 || { echo "FAIL: kubectl dry-run failed"; ERRORS=$((ERRORS+1)); }
# Check NOTES.txt
echo "Checking NOTES.txt..."
helm install test-release "$CHART_DIR" --dry-run 2>/dev/null | grep -q "NOTES:" || { echo "FAIL: no NOTES.txt in install output"; ERRORS=$((ERRORS+1)); }
echo ""
if [ $ERRORS -eq 0 ]; then
echo "ALL CHECKS PASS"
else
echo "$ERRORS check(s) failed — fix before proceeding"
exit 1
fi
Make it executable:
chmod +x verify-chart.sh
Step 4 — Run the verification script (MUST FAIL — this is the RED state)
./verify-chart.sh
Expected result: Multiple FAIL lines. The current chart has none of these resources, so every check fails:
FAIL: no HPA found
FAIL: no PDB found
FAIL: no ServiceMonitor found
FAIL: no NOTES.txt in install output
4 check(s) failed — fix before proceeding
A failing test is the starting point. If all checks passed already, you would not need to build anything. The failing script proves your success criteria are real and currently unmet. This is how TDD works — you define done before you write code.
If the kubectl apply --dry-run=client step fails on ServiceMonitor with "no kind is registered", it means the Prometheus CRD is not installed in your KIND cluster. This is expected in a base KIND setup. For this lab, you can skip the dry-run check for ServiceMonitor specifically — the helm lint and helm template checks are sufficient to validate the template structure. The dry-run validates everything else.
Phase 3: Implement — GREEN (20 min)
Goal: Generate the minimum code to pass the verification script
Prompt AI with:
Using the CLAUDE.md context and the verification requirements in verify-chart.sh,
generate Helm templates for:
1. templates/hpa.yaml — HorizontalPodAutoscaler for api-gateway, catalog, and worker
2. templates/pdb.yaml — PodDisruptionBudget for api-gateway, catalog, and worker
3. templates/servicemonitor.yaml — ServiceMonitor for all 4 services
4. templates/NOTES.txt — post-install usage instructions using template functions
5. Update templates/deployment-api-gateway.yaml, deployment-catalog.yaml,
deployment-worker.yaml, deployment-dashboard.yaml to add resource limits
Each new resource must be:
- Toggleable via values.yaml using an `enabled: true/false` pattern
- Using correct apiVersion for K8s 1.32 (autoscaling/v2 for HPA, policy/v1 for PDB)
- Using matchLabels that match the existing deployment selectors
Before placing the generated files, review each one:
- Does
hpa.yamluseautoscaling/v2(notv2beta2)? - Does
pdb.yamlusepolicy/v1(notv1beta1)? - Do the
matchLabelsin HPA/PDB reference the same selectors as the existing Deployments? - Does
servicemonitor.yamlreference port names that match your Service definitions?
Step 1 — Place the generated files
# Copy AI-generated templates into the chart
# Replace this with the actual file paths the AI provides
cp hpa.yaml reference-app/helm/reference-app/templates/
cp pdb.yaml reference-app/helm/reference-app/templates/
cp servicemonitor.yaml reference-app/helm/reference-app/templates/
cp NOTES.txt reference-app/helm/reference-app/templates/
Step 2 — Update values.yaml
The AI should also generate new values.yaml additions. Add them to the existing values.yaml:
# Add under each service section (example for apiGateway):
apiGateway:
# ... existing values ...
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 70
podDisruptionBudget:
enabled: true
minAvailable: 1
serviceMonitor:
enabled: false
interval: 30s
path: /metrics
Apply the same pattern for catalog, worker, and dashboard sections.
Step 3 — Run verification (MUST PASS — this is the GREEN state)
./verify-chart.sh
Expected result:
=== Helm Chart Verification ===
Running helm lint...
1 chart(s) linted, 0 chart(s) failed
Checking for HorizontalPodAutoscaler... [found]
Checking for PodDisruptionBudget... [found]
Checking for ServiceMonitor... [found]
Running kubectl dry-run... [success]
Checking NOTES.txt... [found]
ALL CHECKS PASS
Do NOT manually fix the templates yet. Failing checks feed directly into the Debug phase. Note which checks fail and proceed to Phase 4. The debug methodology handles this — the same approach you use for real AI-generated code in production.
Phase 4: Debug (10 min)
Goal: Fix what the AI got wrong using systematic debugging
This phase uses REAL errors from AI generation. You are not chasing planted bugs — you are practicing the skill of recognizing and fixing the specific categories of errors that AI tools produce when generating Helm templates.
Common AI generation errors for Helm charts
| Error Type | What the AI produces | What it should be |
|---|---|---|
| Wrong HPA apiVersion | autoscaling/v2beta2 | autoscaling/v2 (v2beta2 removed in K8s 1.26) |
| Wrong PDB apiVersion | policy/v1beta1 | policy/v1 (v1beta1 removed in K8s 1.25) |
| Wrong matchLabels | app: reference-app | Must match the selector in the Deployment (e.g., app.kubernetes.io/name: api-gateway) |
| Wrong ServiceMonitor port | port: 8080 | Must be the port name from the Service resource, not the number |
| NOTES.txt wrong syntax | {{ Release.Name }} | {{ .Release.Name }} (missing the dot prefix) |
| HPA targetRef name | Hardcoded name: api-gateway | Should use {{ include "reference-app.fullname" . }} or equivalent helper |
Systematic Debugging Workflow
Step 1 — Read the error output:
./verify-chart.sh 2>&1 | head -30
Identify which check(s) fail. The error message tells you which resource is missing or invalid.
Step 2 — Trace the error to a specific template:
If lint fails: helm lint reference-app/helm/reference-app/ --debug shows the exact template file and line number.
If dry-run fails: helm template test-release reference-app/helm/reference-app/ 2>&1 | head -20 shows the render error.
Step 3 — Form a hypothesis:
Common patterns:
- "no HPA found" → check if the HPA template has an
{{ if .Values.apiGateway.autoscaling.enabled }}condition and your values.yaml hasenabled: false - Lint error on
v2beta2→ replace withv2in the template - Label mismatch → run
helm template ... | grep -A5 "kind: Deployment" | grep -A3 "matchLabels"to find the actual selector values, then update the HPA/PDB to match
Step 4 — Apply the smallest fix and re-run:
# Fix one thing, verify immediately
./verify-chart.sh
If you have made 3 fixes and the verification still fails on the same check, stop trying individual fixes. Ask the AI: "Here is my current templates/hpa.yaml and the exact error from helm lint --debug. What is wrong?" Provide both the file content AND the exact error. AI generation errors are faster to fix by asking the AI with full context than by scanning YAML manually.
Phase 5: Verify + Code Review (15 min)
Verification — Prove it works
Run all three verification commands in sequence:
# 1. Lint — schema and syntax validation
helm lint reference-app/helm/reference-app/
Expected result: 1 chart(s) linted, 0 chart(s) failed
# 2. Template + dry-run — validate rendered output against the cluster API
helm template course-app reference-app/helm/reference-app/ \
-f reference-app/helm/reference-app/values.yaml \
| kubectl apply --dry-run=client -f -
Expected result: Multiple ... configured (dry run) lines, zero errors.
# 3. Install dry-run — validates the full release installation including NOTES.txt
helm install course-app reference-app/helm/reference-app/ --dry-run
Expected result: Output ends with NOTES: section showing the content from your NOTES.txt.
All three must succeed with zero errors before proceeding.
Code Review — Improve what was generated
Prompt AI with:
Review the generated Helm templates in reference-app/helm/reference-app/templates/
against these 5 dimensions:
1. Code Quality — naming conventions, DRY, no hardcoded values
2. Architecture — are resource relationships correct? Does HPA target the right Deployment?
3. Testing — does the verify-chart.sh cover all critical cases?
4. Requirements — are all items from CLAUDE.md implemented?
5. Production Readiness — are resource limits reasonable? Is the HPA min/max sensible?
Does the PDB allow rolling updates (minAvailable: 1 with replicaCount: 1 means
no pods can be evicted — is that intentional)?
Common code review findings to act on:
- If the PDB has
minAvailable: 1and the deployment hasreplicaCount: 1, node drain will block because the PDB protects the only pod. For development, this is often intentional. For production, you would increase replicas. Document the trade-off in a comment. - If the HPA has
minReplicas: 1andmaxReplicas: 3but no memory-based metric, ask: "Is CPU-only scaling sufficient for our workload?" The CLAUDE.md described API and worker services — those likely have both CPU and memory pressure. - Check that NOTES.txt uses
{{ .Release.Namespace }}and{{ .Release.Name }}to give meaningful post-install output.
Apply any agreed improvements from the code review, then run the full verification suite one final time.
Compare against baseline
See exactly what was added during this lab:
# List all new files
git status reference-app/helm/reference-app/templates/
# Diff the values.yaml changes
git diff reference-app/helm/reference-app/values.yaml
Expected result: New template files listed (hpa.yaml, pdb.yaml, servicemonitor.yaml, NOTES.txt), and the values.yaml diff shows the added resource limits and autoscaling configuration.
Wrap-Up
What you accomplished in 90 minutes:
- Wrote structured context (CLAUDE.md) encoding system state, gaps, and constraints
- Defined success criteria as executable verification before writing any code
- Generated 5 production-hardening Helm chart additions using AI
- Debugged AI generation errors using a systematic methodology
- Validated the result with three independent verification methods
- Improved quality through AI-assisted code review
Reflection questions:
- How did the CLAUDE.md context affect the quality of AI generation compared to a bare prompt?
- Which debug step was fastest: reading the error, tracing to the file, or asking the AI with full context?
- Looking at the verify-chart.sh, what cases are NOT covered that a production chart should also test?
The pattern you practiced today applies to:
- Any Helm chart: adding network policies, Ingress resources, RBAC
- Terraform modules: adding monitoring, security groups, IAM policies
- Kubernetes manifests: adding sidecar containers, init containers, PodSecurityPolicy
The skill is structured context + TDD cycle + systematic debugging — the domain changes, the workflow stays the same.
Next: Module 6 introduces AI Workflow Tools (GSD cycle, memory systems, plan modes) that automate this cycle across larger projects.
If you were not able to get the cluster running locally, you can still complete Phases 0-2 (context creation, brainstorm, verification script authoring) and Phase 3 (template generation) without a live cluster. Skip the kubectl apply --dry-run=client step in verify-chart.sh and focus on helm lint validation only. The TDD and AI generation workflow is identical — you just cannot do the dry-run step without a cluster.