Module 7: Agent Skills — Teaching Agents Runbooks
Duration: 60 minutes (hands-on lab) Day: Day 3, Session 1
The hands-on lab for this module lives in this site at Module 7 Lab. Track C learners should use the dedicated Track C lab for a linear Kubernetes-only path. Read the Concepts and Reference pages first, then open the lab guide.
Track C (Kubernetes): The Track C lab uses a real KIND cluster with learner-applied failure scenarios. No mock mode, no environment variable setup.
What This Module Is About
You've been writing runbooks for years. Great runbooks have decision trees, escalation paths, exact CLI commands, and conditional steps based on what you find. The problem is that runbooks are written for humans — they rely on implied context, organizational memory, and judgment that humans accumulate over time.
SKILL.md is a machine-readable runbook. It encodes the same operational expertise in a structured format that an AI agent can read, follow, and apply at runtime. This module teaches you how to write one — and why the format matters.
Learning Objectives
By the end of this module, you will be able to:
- Write a domain-specific SKILL.md with decision trees, conditional steps, and escalation rules that an AI agent can execute reliably
- Explain Retrieval-Augmented Generation (RAG) using operational analogies — and understand when agents need retrieved knowledge versus when skills are sufficient
- Distinguish the three memory types (short-term conversation, long-term cross-session, procedural skills) and explain which problems each solves
- Articulate why machine-readable skills beat wiki runbooks for AI agent reliability — and what specifically makes them more reliable
Choose Your Track
The lab has four track options — pick the one closest to your domain. Stay with this track through Module 8 and beyond — your Module 7 skill gets attached to your agent profile in Module 8, and Modules 10-13 build on the same track.
| Track | Focus | Primary Tools |
|---|---|---|
| Track A — Database Health | RDS slow query investigation | psql, pg_stat_statements, AWS RDS, CloudWatch metrics, index recommendations |
| Track B — FinOps | EC2 cost anomaly investigation | aws ec2, aws ce (Cost Explorer), idle-resource detection, cost attribution |
| Track C — Kubernetes Health | Pod failure diagnosis and self-healing | kubectl, KIND cluster, 6 failure modes (ImagePullBackOff, CrashLoopBackOff, OOMKilled, Liveness probe, missing Secret/ConfigMap, port mismatch) |
| Track D — Observability | Alert noise analysis | CloudWatch alarms, dedup detection, correlation scoring, snooze recommendations |
All four tracks produce the same artifact: a domain-specific SKILL.md your agent
can execute. Track C has a dedicated lab with
concrete kubectl commands and the six K8s mock scenarios — Tracks A, B, and D
share the unified lab with track-specific callouts.
Prerequisites
- Modules 1-6 completed
- Hermes installed and running (from Module 3)
- Familiarity with at least one of the four track domains
- Track C only: KIND cluster running (from Module 6). Verify with:
kubectl cluster-info --context kind-lab
Module Contents
| Section | Content | Time |
|---|---|---|
| Reading | Concepts: RAG, Memory Types, and Procedural Skills | 20 min |
| Reading | Reference: SKILL.md Format and Skill Lifecycle | 10 min |
| Lab | Write Your Domain-Specific SKILL.md (or Track C version) | 60 min |
| Quiz | Module 7 Assessment | 10 min |
| Exploratory | Stretch Projects | Optional |
Key Insight: The Runbook Reliability Problem
A well-written wiki runbook gets followed accurately by an experienced engineer. The same runbook, given to an AI agent as plain text, produces inconsistent results — because the agent fills in gaps with reasoning rather than executing specified steps.
SKILL.md solves this by making the runbook unambiguous:
- Inputs are typed and validated
- Steps are numbered with exact commands
- Conditions are explicit (
if latency > 200ms: step 4a, else step 4b) - Escalation paths are named, not implied
- Success and failure criteria are measurable
This is not prompt engineering. This is context engineering — encoding your operational expertise in a format your agent reads as structured knowledge, not prose to interpret.