Skip to main content

Capstone: 30-Day Deployment Roadmap

This template is your post-workshop implementation plan. Use it to commit to specific milestones before leaving the workshop (or completing the course). The goal is to move your agent from the workshop environment to actual use in your organization within 30 days.

Solo Learner

You do not need a team for this roadmap. Every milestone is designed for individual completion. "Share with a teammate" in Week 4 means sharing the output with one colleague — it does not require a formal review process.


Before You Fill This In

Review your capstone presentation. The roadmap should connect directly to:

  • The specific agent you built (domain, pattern, autonomy level)
  • The infrastructure you will wire it to (real RDS instance, real KIND cluster, real Cost Explorer account)
  • The promotion criteria you defined in the governance spec

Fill in the brackets [like this] with your specific details as you work through the template.


Week 1: Foundation

Objective: Get the agent running against your real infrastructure (read-only).

Actions

#ActionDone When
1Install Hermes in your work environment (laptop or team VM)hermes --version returns successfully
2Configure your LLM provider: hermes login --provider [claude-code / google-ai-studio / openrouter]hermes chat "hello" returns a response
3Copy your Module 10 agent profile to your work environmentProfile directory exists with SOUL.md + config.yaml + skills/
4Update the agent's database/cluster connection to point to your real infrastructurehermes run "[read-only diagnostic task]" returns real data
5Verify the skill runs correctly on real data: hermes run "[first read-only scenario]"Agent output matches expected format, no errors

Success criteria: The agent completes one read-only diagnostic task on your real infrastructure and produces output that a colleague could act on.

If something blocks you: Most common Week 1 blocker is LLM provider configuration (token expiry, rate limits, firewall). Check your team's acceptable use policy for AI tools before connecting to production data.

Rollback plan: If the agent produces unexpected errors on real data, disconnect from production and re-run against the mock data from Module 10. The agent should be fully functional against mock data before connecting to real infrastructure.


Week 2: First Real Run

Objective: Run the agent on one real operational task and document honestly what it got right and wrong.

Actions

#ActionDone When
1Select one specific operational scenario: [describe the exact task — e.g., "morning slow query review for orders-db"]Scenario defined and documented
2Run the agent against real data for this scenarioAgent completes run without errors
3Review the output: Is it correct? Is it actionable? Is it complete?Output reviewed, findings documented
4Document what the agent got rightList of correct findings
5Document what the agent got wrong or missedList of failures with "why did it fail?" for each
6Document what context was missing from the SKILL.mdList of gaps

Success criteria: You have a written document (even a few paragraphs) recording what the agent produced, whether it was accurate, and specifically what context would have made it more accurate.

What to record:

Date: [date]
Scenario: [what you ran]
Input context: [what data the agent had access to]
Output: [paste key parts of the agent's output]
What was correct: [list]
What was wrong/missing: [list]
SKILL.md gaps identified: [list]

Rollback plan: If the agent produces a recommendation you are unsure about, do not act on it until you have verified it independently. In Week 2, treat all agent output as advisory regardless of the autonomy level — you are still evaluating its accuracy.


Week 3: Skill Iteration

Objective: Refine the SKILL.md based on Week 2 observations. Verify the improvements against real scenarios.

Actions

#ActionDone When
1List the specific SKILL.md gaps from Week 2Gaps documented with "what the agent should have done" for each
2Update SKILL.md: add the missing decision branchesSKILL.md has new branches for each identified gap
3Run the Week 2 scenario again — does the updated skill fix the gaps?Agent output improves for the Week 2 failures
4Run 4 additional scenarios in the same domain5 total scenarios run
5Track accuracy: [N]/5 scenarios produced correct, actionable outputAccuracy score recorded

Success criteria: SKILL.md updated with at least one new decision branch. Accuracy tracked across 5 real scenarios. You know your agent's accuracy rate in its domain.

The key skill iteration question: When the agent failed in Week 2, was it because the SKILL.md decision tree was incomplete, or because the agent needed different data to be retrieved first? These require different fixes: incomplete decision tree → add branches; wrong data → add a retrieval step earlier in the skill flow.

Rollback plan: Keep a copy of the original SKILL.md before making changes. If the updated skill produces worse results than the original, you can restore and take a different approach.


Week 4: Production Trial

Objective: Run the agent on a scheduled basis, set up audit logging, and get one piece of external feedback.

Actions

#ActionDone When
1Schedule the first automated run: hermes cron add "[agent-name]" --schedule "0 7 * * 1-5" --task "[morning review task]"Cron job added, first scheduled run completes
2Configure audit log output: verify agent runs are logged to [cloudwatch log group / file / slack channel]Audit logs visible after first scheduled run
3Review the first automated run output: is the output at the same quality as manual runs?Automated run produces comparable output to manual runs
4Share the output from one run with [colleague name] for feedbackColleague has reviewed and provided feedback
5Document the feedback: what did they find useful, what was missing, what would they want to see differently?Feedback recorded

Success criteria: The agent runs on schedule at least twice in Week 4. Audit logs exist. One colleague has reviewed an output and provided feedback.

What "production trial" means: This is not full production deployment — you are running the agent on real data but treating all output as advisory. You are verifying that automated runs work, that logging is functional, and that the output quality is consistent. This is L1 at scale before any promotion.

Rollback plan: If scheduled runs produce unexpected errors or incorrect output, pause the cron job (hermes cron pause [agent-name]) and investigate. Do not let a broken cron job run unattended — automated failures accumulate.


Month 2: Expansion

Objective: Add capability and consider promotion based on the 30-day track record.

Actions

#ActionDone When
1Review the 30-day track record: count accurate vs. inaccurate outputsTrack record documented
2If track record meets L1→L2 criteria from your governance spec: initiate team reviewTeam review scheduled or completed
3Add one new skill or expand the existing skill to a new scenarioNew skill or branch authored, tested against mock data
4Identify the next Module 10 track to add: [database health / cost optimization / K8s health]Next track identified, initial planning started
5Document lessons learned from Month 1 for the next agent buildLessons recorded in a document you can reference

The promotion decision: L1→L2 promotion is not automatic when the track record threshold is met. It requires a team review. Schedule that conversation explicitly — it will not happen unless you schedule it. Bring the track record data, the failure analysis from Weeks 2-3, and the SKILL.md update history.

If the track record does not meet criteria: Identify the failure pattern, update the SKILL.md again, and reset the tracking window. Promotion is earned — if the agent is not ready, a month of additional iteration is a better outcome than a premature promotion that fails in production.


Commitment Section

Fill this in before the workshop ends (or before completing the Udemy course). Making the commitment specific and time-bound increases the probability of follow-through.

My agent: [domain + pattern + autonomy level — e.g., "RDS investigator at L1"]

Week 1 first milestone (by [date]):
[specific action — e.g., "Agent runs against dev-rds-01 and produces output by Tuesday"]

Week 2 scenario I will run (by [date]):
[specific task — e.g., "Morning slow query review on 2026-05-12"]

Who I will share Week 4 output with:
[name and role — e.g., "Alex (SRE lead)"]

My 30-day promotion decision date:
[date — e.g., 2026-06-04 — the date you will review the track record]

Continue to: Evaluation Rubric