Agentic DevOps: Building Agentic Skills for Infrastructure Automation
A 3-day advanced workshop for DevOps practitioners who are completely new to AI and agentic systems — also available as a self-paced Udemy course.
You already know the hard part — Terraform, Kubernetes, CI/CD pipelines, CloudWatch, incident response. This course teaches you how to build AI agents that encode that expertise and work alongside you at scale.
What This Course Is
This is not a course about writing clever prompts.
It is about context engineering — structuring the right domain knowledge, system state, and operational constraints so that AI produces expert-level output on your infrastructure problems. The AI's intelligence does not change. What changes is the context you give it.
By the end of this course, you will have built a production-ready domain agent for your own operational environment and a 30-day deployment roadmap.
Format
Live 3-day workshop — instructor-led, cohort-based, team exercises. Hands-on throughout.
Udemy self-paced — all labs are designed to be completable solo, with no team exercise dependencies. Complete at your own pace.
Prerequisites
This is an advanced workshop. You need working knowledge of:
- Docker — comfortable running containers and reading
dockeroutput - Kubernetes basics —
kubectl, deployed to a cluster before - Terraform basics — written and applied a module
- CI/CD — GitHub Actions, Jenkins, or equivalent
- AWS basics — console navigation, IAM concepts, EC2/RDS/CloudWatch familiarity
- Git workflows — branching, PRs, code review
- CLI comfort — bash/zsh, you live in the terminal
Zero AI knowledge required. Your mental model is built from scratch using operational analogies.
Module Overview
Day 1: Foundations and Assessment
| Module | Topic | Duration |
|---|---|---|
| Module 1: AI Foundations | LLMs from an ops perspective — tokenization, context windows, inference pipeline using operational analogies | 90 min |
| Module 2: Platform AI | AWS built-in AI features — GuardDuty, CodeGuru, DevOps Guru, Bedrock | 60 min |
| Module 3: Platform → Custom Agents | Bridge from platform AI to custom agents — live Hermes demo | 45 min |
| Module 4: Impact Assessment | Structured framework for scoring and prioritizing your top 10 automation candidates | 60 min |
Day 2: Structured Coding and IaC
| Module | Topic | Duration |
|---|---|---|
| Module 5: Superpowers for IaC | TDD, verification, debugging, and code review applied to Helm charts and Terraform modules — the Superpowers workflow on real IaC projects | 90 min |
| Module 6: AI Workflow Tools | GSD planning workflow, CLAUDE.md context engineering, cross-session memory, plan mode selection | 90 min |
| Module 7: Agent Skills | SKILL.md authoring — encoding runbooks as machine-readable domain knowledge | 90 min |
| Module 8: Tool Integration | Wiring agents to real infrastructure tools — AWS CLI, kubectl, Terraform, monitoring APIs | 60 min |
Day 3: Agent Systems and Capstone
| Module | Topic | Duration |
|---|---|---|
| Module 9: Design Patterns | Agent patterns — triage-and-delegate, noise reduction, change gating, cost guardian | 60 min |
| Module 10: Domain Agent Build | Build a complete domain agent — three tracks (Track A: Database, Track B: FinOps, Track C: Kubernetes) | 120 min |
| Module 11: Fleet Management | Multi-agent systems — the fleet coordinator pattern, delegation loops, Morgan | 60 min |
| Module 12: Triggers and Automation | Event-driven agent invocation — CI/CD hooks, CloudWatch triggers, scheduled runs | 60 min |
| Module 13: Governance and Safety | L1–L4 governance levels, SOUL.md safety design, audit trails, production criteria | 60 min |
| Module 14: Capstone | Present your agent and 30-day deployment plan | 60 min |
Tools You Will Use
Primary Coding Tool (Modules 1–8)
Claude Code — Anthropic's terminal AI coding agent. Uses your Claude Pro or Team subscription. No additional billing.
OpenCode — Alternative terminal agent from opencode.ai. Supports 75+ LLM providers including free-tier options (Gemini 2.5 Flash, Groq). Use this if you prefer not to use a Claude subscription.
Agent Framework (Modules 7–13)
Hermes — The agent framework you will use to build your domain agent. Python-based, runs locally, supports multiple LLM providers. Four agent profiles ship with the course as starting templates.
Learning Objectives
By the end of this course, you will:
- Explain how LLMs work — tokenization, context windows, inference pipeline — using operational analogies
- Demonstrate progressive context engineering on real infrastructure data (CloudWatch alarms, cost anomalies, deployment failures)
- Build a working domain agent using Hermes that encodes your team's operational expertise
- Wire your agent to real infrastructure tools — AWS CLI, kubectl, Terraform, monitoring APIs
- Evaluate the automation potential of your top 10 operational tasks using a structured scoring framework
- Deliver a 30-day roadmap for deploying your agent in production
How to Navigate This Site
Workshop participants: Your instructor sets the pace. Use this site for reference — open the module page before each session to review objectives and prerequisites.
Udemy learners: Work through modules in order. Each module page has a lab link — the lab is the primary content. The reading and explainer materials are supplementary context.
Before you start: Complete the Setup Guide first. You need the reference app running and an AI coding tool configured before Module 1.
Resources
| Resource | Description |
|---|---|
| Setup Guide | Full environment setup — Day 1 (Docker/KIND/Claude Code) and Day 2–3 (Hermes) |
| Skills Reference | The 4 SKILL.md example files with descriptions and authoring guidance |
| Agent Profiles | Aria, Finley, Kiran, Morgan — identity docs and safety model for each |
| Governance Templates | L1–L4 governance YAML with diffs showing the progression |
This course was developed for live 3-day workshop delivery and self-paced Udemy learning. All labs are completable with free-tier infrastructure and a single Claude Pro or free-tier LLM provider.