Skip to main content

Agentic DevOps: Building Agentic Skills for Infrastructure Automation

A 3-day advanced workshop for DevOps practitioners who are completely new to AI and agentic systems — also available as a self-paced Udemy course.

You already know the hard part — Terraform, Kubernetes, CI/CD pipelines, CloudWatch, incident response. This course teaches you how to build AI agents that encode that expertise and work alongside you at scale.


What This Course Is

This is not a course about writing clever prompts.

It is about context engineering — structuring the right domain knowledge, system state, and operational constraints so that AI produces expert-level output on your infrastructure problems. The AI's intelligence does not change. What changes is the context you give it.

By the end of this course, you will have built a production-ready domain agent for your own operational environment and a 30-day deployment roadmap.


Format

Live 3-day workshop — instructor-led, cohort-based, team exercises. Hands-on throughout.

Udemy self-paced — all labs are designed to be completable solo, with no team exercise dependencies. Complete at your own pace.


Prerequisites

This is an advanced workshop. You need working knowledge of:

  • Docker — comfortable running containers and reading docker output
  • Kubernetes basics — kubectl, deployed to a cluster before
  • Terraform basics — written and applied a module
  • CI/CD — GitHub Actions, Jenkins, or equivalent
  • AWS basics — console navigation, IAM concepts, EC2/RDS/CloudWatch familiarity
  • Git workflows — branching, PRs, code review
  • CLI comfort — bash/zsh, you live in the terminal

Zero AI knowledge required. Your mental model is built from scratch using operational analogies.


Module Overview

Day 1: Foundations and Assessment

ModuleTopicDuration
Module 1: AI FoundationsLLMs from an ops perspective — tokenization, context windows, inference pipeline using operational analogies90 min
Module 2: Platform AIAWS built-in AI features — GuardDuty, CodeGuru, DevOps Guru, Bedrock60 min
Module 3: Platform → Custom AgentsBridge from platform AI to custom agents — live Hermes demo45 min
Module 4: Impact AssessmentStructured framework for scoring and prioritizing your top 10 automation candidates60 min

Day 2: Structured Coding and IaC

ModuleTopicDuration
Module 5: Superpowers for IaCTDD, verification, debugging, and code review applied to Helm charts and Terraform modules — the Superpowers workflow on real IaC projects90 min
Module 6: AI Workflow ToolsGSD planning workflow, CLAUDE.md context engineering, cross-session memory, plan mode selection90 min
Module 7: Agent SkillsSKILL.md authoring — encoding runbooks as machine-readable domain knowledge90 min
Module 8: Tool IntegrationWiring agents to real infrastructure tools — AWS CLI, kubectl, Terraform, monitoring APIs60 min

Day 3: Agent Systems and Capstone

ModuleTopicDuration
Module 9: Design PatternsAgent patterns — triage-and-delegate, noise reduction, change gating, cost guardian60 min
Module 10: Domain Agent BuildBuild a complete domain agent — three tracks (Track A: Database, Track B: FinOps, Track C: Kubernetes)120 min
Module 11: Fleet ManagementMulti-agent systems — the fleet coordinator pattern, delegation loops, Morgan60 min
Module 12: Triggers and AutomationEvent-driven agent invocation — CI/CD hooks, CloudWatch triggers, scheduled runs60 min
Module 13: Governance and SafetyL1–L4 governance levels, SOUL.md safety design, audit trails, production criteria60 min
Module 14: CapstonePresent your agent and 30-day deployment plan60 min

Tools You Will Use

Primary Coding Tool (Modules 1–8)

Claude Code — Anthropic's terminal AI coding agent. Uses your Claude Pro or Team subscription. No additional billing.

OpenCode — Alternative terminal agent from opencode.ai. Supports 75+ LLM providers including free-tier options (Gemini 2.5 Flash, Groq). Use this if you prefer not to use a Claude subscription.

Agent Framework (Modules 7–13)

Hermes — The agent framework you will use to build your domain agent. Python-based, runs locally, supports multiple LLM providers. Four agent profiles ship with the course as starting templates.


Learning Objectives

By the end of this course, you will:

  1. Explain how LLMs work — tokenization, context windows, inference pipeline — using operational analogies
  2. Demonstrate progressive context engineering on real infrastructure data (CloudWatch alarms, cost anomalies, deployment failures)
  3. Build a working domain agent using Hermes that encodes your team's operational expertise
  4. Wire your agent to real infrastructure tools — AWS CLI, kubectl, Terraform, monitoring APIs
  5. Evaluate the automation potential of your top 10 operational tasks using a structured scoring framework
  6. Deliver a 30-day roadmap for deploying your agent in production

How to Navigate This Site

Workshop participants: Your instructor sets the pace. Use this site for reference — open the module page before each session to review objectives and prerequisites.

Udemy learners: Work through modules in order. Each module page has a lab link — the lab is the primary content. The reading and explainer materials are supplementary context.

Before you start: Complete the Setup Guide first. You need the reference app running and an AI coding tool configured before Module 1.


Resources

ResourceDescription
Setup GuideFull environment setup — Day 1 (Docker/KIND/Claude Code) and Day 2–3 (Hermes)
Skills ReferenceThe 4 SKILL.md example files with descriptions and authoring guidance
Agent ProfilesAria, Finley, Kiran, Morgan — identity docs and safety model for each
Governance TemplatesL1–L4 governance YAML with diffs showing the progression

This course was developed for live 3-day workshop delivery and self-paced Udemy learning. All labs are completable with free-tier infrastructure and a single Claude Pro or free-tier LLM provider.