Module 7: Agent Skills — Teaching Agents Runbooks

Duration: 60 minutes (hands-on lab) Day: Day 3, Session 1

Lab Location

The hands-on lab for this module lives in this site at Module 7 Lab. Track C learners should use the dedicated Track C lab for a linear Kubernetes-only path. Read the Concepts and Reference pages first, then open the lab guide.

Track C (Kubernetes): The Track C lab uses a real KIND cluster with learner-applied failure scenarios. No mock mode, no environment variable setup.

What This Module Is About

You've been writing runbooks for years. Great runbooks have decision trees, escalation paths, exact CLI commands, and conditional steps based on what you find. The problem is that runbooks are written for humans — they rely on implied context, organizational memory, and judgment that humans accumulate over time.

SKILL.md is a machine-readable runbook. It encodes the same operational expertise in a structured format that an AI agent can read, follow, and apply at runtime. This module teaches you how to write one — and why the format matters.

Learning Objectives

By the end of this module, you will be able to:

Write a domain-specific SKILL.md with decision trees, conditional steps, and escalation rules that an AI agent can execute reliably
Explain Retrieval-Augmented Generation (RAG) using operational analogies — and understand when agents need retrieved knowledge versus when skills are sufficient
Distinguish the three memory types (short-term conversation, long-term cross-session, procedural skills) and explain which problems each solves
Articulate why machine-readable skills beat wiki runbooks for AI agent reliability — and what specifically makes them more reliable

Choose Your Track

The lab has four track options — pick the one closest to your domain. Stay with this track through Module 8 and beyond — your Module 7 skill gets attached to your agent profile in Module 8, and Modules 10-13 build on the same track.

Track	Focus	Primary Tools
Track A — Database Health	RDS slow query investigation	`psql`, `pg_stat_statements`, AWS RDS, CloudWatch metrics, index recommendations
Track B — FinOps	EC2 cost anomaly investigation	`aws ec2`, `aws ce` (Cost Explorer), idle-resource detection, cost attribution
Track C — Kubernetes Health	Pod failure diagnosis and self-healing	`kubectl`, KIND cluster, 6 failure modes (ImagePullBackOff, CrashLoopBackOff, OOMKilled, Liveness probe, missing Secret/ConfigMap, port mismatch)
Track D — Observability	Alert noise analysis	CloudWatch alarms, dedup detection, correlation scoring, snooze recommendations

All four tracks produce the same artifact: a domain-specific SKILL.md your agent can execute. Track C has a dedicated lab with concrete kubectl commands and the six K8s mock scenarios — Tracks A, B, and D share the unified lab with track-specific callouts.

Prerequisites

Modules 1-6 completed
Hermes installed and running (from Module 3)
Familiarity with at least one of the four track domains
Track C only: KIND cluster running (from Module 6). Verify with: kubectl cluster-info --context kind-lab

Module Contents

Section	Content	Time
Reading	Concepts: RAG, Memory Types, and Procedural Skills	20 min
Reading	Reference: SKILL.md Format and Skill Lifecycle	10 min
Lab	Write Your Domain-Specific SKILL.md (or Track C version)	60 min
Quiz	Module 7 Assessment	10 min
Exploratory	Stretch Projects	Optional

Key Insight: The Runbook Reliability Problem

A well-written wiki runbook gets followed accurately by an experienced engineer. The same runbook, given to an AI agent as plain text, produces inconsistent results — because the agent fills in gaps with reasoning rather than executing specified steps.

SKILL.md solves this by making the runbook unambiguous:

Inputs are typed and validated
Steps are numbered with exact commands
Conditions are explicit (if latency > 200ms: step 4a, else step 4b)
Escalation paths are named, not implied
Success and failure criteria are measurable

This is not prompt engineering. This is context engineering — encoding your operational expertise in a format your agent reads as structured knowledge, not prose to interpret.

What This Module Is About​

Learning Objectives​

Choose Your Track​

Prerequisites​

Module Contents​

Key Insight: The Runbook Reliability Problem​