Skip to main content

Exploratory: AI Superpowers Workflow

These are stretch projects — not required for course completion.

If you finished the main lab early, or want to explore how AI tools integrate with disciplined engineering workflows, these projects are for you. Each builds on the reference app or lab infrastructure from the main sessions.

What are Superpowers?

Superpowers are structured engineering disciplines — TDD, systematic debugging, code review — that become more powerful when combined with AI assistance. The insight is that AI tools produce better output when given structured context, and these workflows provide exactly that structure.

This is context engineering applied to development discipline, not just infrastructure.


Project 1: TDD with AI Assistance

Time: 30 minutes Target: Add a new endpoint to the reference app's catalog service using Test-Driven Development

What You'll Build

A /items/:id endpoint that returns a single item by ID from the catalog service. The service already has a /items endpoint that returns all items — you'll add the single-item lookup.

Why TDD Works Better with AI

The failing test IS the specification. When you give Claude Code a failing test as context, it knows exactly what behavior to implement — not a vague description, but a precise, executable specification. The AI writes minimal code that satisfies the test, nothing more.

Without a test, AI generates plausible code that may or may not match your intent. With a failing test, there's no ambiguity.

The TDD Cycle

Step 1 — RED: Write a failing test

Open the catalog service test file at reference-app/services/catalog/src/main.rs (or create a separate test module).

Write a test for the new endpoint before writing any implementation:

#[cfg(test)]
mod tests {
use super::*;
use axum::http::StatusCode;

#[tokio::test]
async fn test_get_item_by_id_returns_item() {
// Given: a catalog with items
let app = create_test_app().await;

// When: GET /items/1
let response = app
.oneshot(
axum::http::Request::builder()
.uri("/items/1")
.body(axum::body::Body::empty())
.unwrap(),
)
.await
.unwrap();

// Then: returns 200 with item data
assert_eq!(response.status(), StatusCode::OK);
}

#[tokio::test]
async fn test_get_item_by_id_returns_404_for_missing_item() {
let app = create_test_app().await;

let response = app
.oneshot(
axum::http::Request::builder()
.uri("/items/999")
.body(axum::body::Body::empty())
.unwrap(),
)
.await
.unwrap();

assert_eq!(response.status(), StatusCode::NOT_FOUND);
}
}

Run the tests — they should FAIL (the endpoint doesn't exist yet):

cargo test -p catalog

Expected result: Tests fail with a compilation error or 404 — the endpoint route doesn't exist. This is correct. A test that passes before you write the code is not a test.


Step 2 — GREEN: Implement the minimum code to pass

Ask Claude Code to implement the endpoint, providing the failing test as context:

@reference-app/services/catalog/src/main.rs

The tests in the test module are failing because /items/:id doesn't exist.
Implement the minimal code to make both tests pass. The existing /items
endpoint returns Vec<Item> from a pool query. Follow the same pattern.
Do not add any behavior beyond what the tests require.

Run the tests again:

cargo test -p catalog

Expected result: Both tests pass. The implementation is minimal — exactly what the tests require.


Step 3 — REFACTOR: Clean up without changing behavior

Ask Claude Code to review the new code for cleanup opportunities:

The /items/:id tests now pass. Review the implementation for:
- Unnecessary code duplication with /items
- Missing error handling (what if the ID is not a valid integer?)
- Inconsistent naming conventions with the existing handlers

Refactor only. Do not add new behavior.

Run the tests again to confirm nothing broke:

cargo test -p catalog

Expected result: All tests still pass. The code is cleaner. No new behavior was introduced.


Key insight: The failing test defined the specification precisely. Claude Code had no ambiguity about what "returns a single item by ID" means — it had executable evidence. This is why TDD produces better AI-generated code than vague natural language descriptions.

Project 1 Deliverable: A passing test + working /items/:id endpoint in the catalog service.


Project 2: Systematic Debugging with AI Assistance

Time: 20 minutes Target: Find and fix 3 deliberate errors in a broken Helm values file using a structured debugging approach

What You'll Debug

A broken values.yaml for the reference app Helm chart. Three errors have been deliberately introduced:

  1. A wrong port number (silent failure — services won't connect)
  2. A misspelled service name (deployment fails with ImagePullBackOff or CrashLoopBackOff)
  3. Missing resource requests (pod scheduling may fail on resource-constrained nodes)

Why Structured Debugging Works Better with AI

Vague prompts produce vague answers. "It doesn't work" gives the AI nothing to work with. Providing the error message, the relevant config, and the surrounding context gives the AI the same information an expert debugger would start with.

The systematic approach: Read errors → Reproduce → Check recent changes → Trace data flow → Form hypothesis → Test smallest fix.

The Debugging Exercise

Step 1 — Create the broken values file

Create values-broken.yaml in your course working directory:

# values-broken.yaml — contains 3 deliberate errors
replicaCount: 1

apiGateway:
image:
repository: course/api-gateway
tag: latest
service:
port: 8090 # Error 1: wrong port (should be 8080)

catalog:
image:
repository: course/catlog # Error 2: typo in image name (should be catalog)
tag: latest
service:
port: 8081

worker:
image:
repository: course/worker
tag: latest
service:
port: 8082
# Error 3: missing resources block (required for scheduling on constrained nodes)

Step 2 — Run helm lint

helm lint reference-app/helm/reference-app/ -f values-broken.yaml

Expected result: Helm lint may or may not catch all three errors — some are semantic, not syntactic. This is realistic: many infra errors only surface at apply time.

Step 3 — Apply structured debugging with AI

Do NOT ask Claude Code "what's wrong with this file?" Instead, provide the error context:

@reference-app/helm/reference-app/values.yaml
@values-broken.yaml

I ran: helm lint reference-app/helm/reference-app/ -f values-broken.yaml
Output: [paste actual lint output]

I also tried to dry-run apply:
kubectl apply --dry-run=server -f ... [paste any errors]

Using the systematic debugging approach:
1. What does the error tell us? (Read errors)
2. Can I reproduce it? (the lint output is the reproduction)
3. What changed recently? (compare values-broken.yaml to the original values.yaml)
4. What's the data flow from the values file to the running pod?

Identify all issues, starting with the most likely to cause a pod startup failure.

Expected result: AI identifies all 3 errors with reasoning, not just guesses. The structured context (error output + diff + data flow question) produces a systematic diagnosis.

Step 4 — Fix and verify

Fix each error, then re-run:

helm lint reference-app/helm/reference-app/ -f values-broken.yaml

Expected result: helm lint passes with no errors or warnings.


Key insight: AI debugging is most useful when you provide the error message AND surrounding context, not just the symptom. "It doesn't work" is the worst input. "Here's the error, here's the config, here's what changed" is the best input. The systematic approach structures the context you provide.

Project 2 Deliverable: All 3 errors identified and fixed, helm lint passing.


Project 3: AI Code Review for Production Readiness

Time: 15 minutes Target: Review the reference app's api-gateway Dockerfile for production readiness issues

What You'll Review

reference-app/services/api-gateway/Dockerfile — the container image build file for the api-gateway service.

Why Structured Code Review Works Better with AI

Generic "review this" prompts produce generic observations. Specifying what to review for — multi-stage build efficiency, security hardening, layer caching, image size — focuses the review on dimensions that matter for production operations.

This is context engineering applied to code review: you control what the AI looks for.

The Review Exercise

Step 1 — Generic review (baseline)

Ask Claude Code for a review without specifying criteria:

@reference-app/services/api-gateway/Dockerfile

Review this Dockerfile.

Note the output. How specific is it? Does it focus on what matters for a production Rust service?

Expected result: Generic observations. May mention common Dockerfile patterns but probably misses specifics relevant to this context (Rust binary size, cargo caching, multi-stage optimization).


Step 2 — Structured review with specified criteria

Ask again with explicit review dimensions:

@reference-app/services/api-gateway/Dockerfile

Review this Dockerfile for production readiness. Evaluate specifically:

1. Multi-stage build efficiency: Is the build stage properly separated from the runtime stage?
Is the final image minimal (scratch or distroless)?

2. Security: Does the container run as a non-root user? Are there any secrets or credentials
in the build layers?

3. Layer caching: Are dependencies cached separately from source code? A Rust service should
cache cargo dependencies before copying src/. Does this Dockerfile do that?

4. Image size: What's the approximate final image size? What's the bottleneck?

5. .dockerignore coverage: Which files would bloat the build context? Is there a .dockerignore?

For each issue found: severity (critical/important/minor), specific line reference, and recommended fix.

Expected result: Specific, actionable findings with line references and severity ratings. The AI now has a precise rubric — it knows what "production readiness" means in this context.


Step 3 — Implement one finding

Pick one "important" finding from the structured review. Ask Claude Code to implement the fix:

The review identified [paste the specific finding]. Implement this fix.
Show me the diff and explain why this is better for production.

Expected result: A targeted fix to the Dockerfile addressing the specific finding. The AI works from the review context it already has — no need to re-explain the situation.


Key insight: AI code review is most useful when you specify WHAT to review for. A generic "review this" produces generic output. A review with explicit dimensions (security, caching, size) produces a structured, actionable report. The review criteria ARE the context — specify them precisely.

Project 3 Deliverable: A review report with specific, actionable findings and at least one implemented fix.


Going Further

These three patterns — TDD, systematic debugging, structured code review — are covered in depth in the Claude Code superpowers documentation:

  • TDD: ~/.claude/superpowers/tdd.md
  • Systematic debugging: ~/.claude/superpowers/debugging.md
  • Code review: ~/.claude/superpowers/code-review.md

The exploratory projects above are entry points. The full superpowers workflow integrates these patterns with the GSD structured workflow — so every feature has a test, every debug follows a hypothesis cycle, and every significant change gets a structured review.

That integration is what "production-grade AI-assisted DevOps" looks like in practice.