Skip to main content

Module 7: Agent Skills — Teaching Agents Runbooks

Duration: 60 minutes (hands-on lab) Day: Day 3, Session 1

Lab Location

The hands-on lab for this module lives in this site at Module 7 Lab. Track C learners should use the dedicated Track C lab for a linear Kubernetes-only path. Read the Concepts and Reference pages first, then open the lab guide.

Track C (Kubernetes): The Track C lab uses a real KIND cluster with learner-applied failure scenarios. No mock mode, no environment variable setup.

What This Module Is About

You've been writing runbooks for years. Great runbooks have decision trees, escalation paths, exact CLI commands, and conditional steps based on what you find. The problem is that runbooks are written for humans — they rely on implied context, organizational memory, and judgment that humans accumulate over time.

SKILL.md is a machine-readable runbook. It encodes the same operational expertise in a structured format that an AI agent can read, follow, and apply at runtime. This module teaches you how to write one — and why the format matters.

Learning Objectives

By the end of this module, you will be able to:

  1. Write a domain-specific SKILL.md with decision trees, conditional steps, and escalation rules that an AI agent can execute reliably
  2. Explain Retrieval-Augmented Generation (RAG) using operational analogies — and understand when agents need retrieved knowledge versus when skills are sufficient
  3. Distinguish the three memory types (short-term conversation, long-term cross-session, procedural skills) and explain which problems each solves
  4. Articulate why machine-readable skills beat wiki runbooks for AI agent reliability — and what specifically makes them more reliable

Choose Your Track

The lab has four track options — pick the one closest to your domain. Stay with this track through Module 8 and beyond — your Module 7 skill gets attached to your agent profile in Module 8, and Modules 10-13 build on the same track.

TrackFocusPrimary Tools
Track A — Database HealthRDS slow query investigationpsql, pg_stat_statements, AWS RDS, CloudWatch metrics, index recommendations
Track B — FinOpsEC2 cost anomaly investigationaws ec2, aws ce (Cost Explorer), idle-resource detection, cost attribution
Track C — Kubernetes HealthPod failure diagnosis and self-healingkubectl, KIND cluster, 6 failure modes (ImagePullBackOff, CrashLoopBackOff, OOMKilled, Liveness probe, missing Secret/ConfigMap, port mismatch)
Track D — ObservabilityAlert noise analysisCloudWatch alarms, dedup detection, correlation scoring, snooze recommendations

All four tracks produce the same artifact: a domain-specific SKILL.md your agent can execute. Track C has a dedicated lab with concrete kubectl commands and the six K8s mock scenarios — Tracks A, B, and D share the unified lab with track-specific callouts.

Prerequisites

  • Modules 1-6 completed
  • Hermes installed and running (from Module 3)
  • Familiarity with at least one of the four track domains
  • Track C only: KIND cluster running (from Module 6). Verify with: kubectl cluster-info --context kind-lab

Module Contents

SectionContentTime
ReadingConcepts: RAG, Memory Types, and Procedural Skills20 min
ReadingReference: SKILL.md Format and Skill Lifecycle10 min
LabWrite Your Domain-Specific SKILL.md (or Track C version)60 min
QuizModule 7 Assessment10 min
ExploratoryStretch ProjectsOptional

Key Insight: The Runbook Reliability Problem

A well-written wiki runbook gets followed accurately by an experienced engineer. The same runbook, given to an AI agent as plain text, produces inconsistent results — because the agent fills in gaps with reasoning rather than executing specified steps.

SKILL.md solves this by making the runbook unambiguous:

  • Inputs are typed and validated
  • Steps are numbered with exact commands
  • Conditions are explicit (if latency > 200ms: step 4a, else step 4b)
  • Escalation paths are named, not implied
  • Success and failure criteria are measurable

This is not prompt engineering. This is context engineering — encoding your operational expertise in a format your agent reads as structured knowledge, not prose to interpret.