Workflow Blueprint Template
Use this template to design AI-native workflows that are auditable and resilient. Output: 1 diagram + 2-page blueprint.
What Is an AI-Native Workflow?
An AI-native workflow embeds AI into the core decision path, not "off to the side." It includes:
- AI task (what the system decides)
- Human checkpoints (where humans verify or override)
- Stop cards (when humans must intervene)
- Handoff rules (when to escalate or defer)
- Audit trail (proof of decision integrity)
Step 1: Workflow Diagram (High-Level)
┌─────────────────────────────────────────────────────────────────┐
│ SUPPORT TICKET INTAKE │
└─────────────────────────────────────────────────────────────────┘
↓
┌────────────────────────────────────────┐
│ AI: Classify Ticket Category │
│ (Prompting Skill) │
│ ✓ Input: Ticket text │
│ ✓ Output: Category + confidence │
└────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Decision Logic (Critical Evaluation Skill) │
│ IF confidence > 0.85: │
│ → Route automatically (non-urgent only) │
│ IF confidence 0.65-0.85: │
│ → Flag for human review │
│ IF confidence < 0.65: │
│ → STOP CARD: Human review required │
│ IF category = "Safety/Fraud": │
│ → STOP CARD: Escalate immediately │
└─────────────────────────────────────────────────────┘
↓
┌────────────────────┬──────────────────┬──────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ Auto-Route │ │ Queue for │ │ Human │ │ Escalate to │
│ (Confidence │ │ Review │ │ Review │ │ Manager │
│ >0.85) │ │ (0.65-0.85) │ │ (<0.65) │ │ (Safety/ │
│ │ │ │ │ │ │ Fraud) │
└─────────────┘ └─────────────┘ └─────────────┘ └──────────────┘
↓ ↓ ↓ ↓
┌────────────────────────────────────────────────────────────────┐
│ Audit Trail: Log Decision, Outcome, Override (if any) │
│ (Workflow Integration & Ethics & Trust Skills) │
└────────────────────────────────────────────────────────────────┘
Step 2: Detailed Workflow Blueprint
1. Workflow Name & Purpose
Name: Customer Support Ticket Triage
Owner: [Name/Team]
Version: 1.0
Last Updated: [Date]
Goal: Route 80% of tickets automatically, maintain <2% misdirection rate
2. Inputs & Outputs
| Input | Source | Format | Frequency |
|---|---|---|---|
| Customer support ticket | Zendesk / Help Desk | Text (subject + body) | Real-time |
| Ticket metadata | Help Desk | JSON (priority, account type) | Real-time |
| Output | Consumer | Format | Action |
|---|---|---|---|
| Routing decision | Ticketing system | API call | Route to queue |
| Confidence score | Human reviewer | JSON | Display in dashboard |
| Override record | Audit system | Log entry | Store in database |
3. AI Task Definition
What the AI does:
Classify a customer support ticket into 1 of 8 predefined categories:
1. Billing & Payment
2. Technical Support
3. Account Access
4. Product Feedback
5. Complaint / Escalation
6. General Inquiry
7. Other
8. Safety / Fraud Alert
Model: Claude 3.5 Sonnet (via Anthropic API)
Latency SLA: <2 seconds per ticket
Cost: ~$0.001 per classification
What the AI does NOT do:
- Draft responses to customers (humans do this)
- Decide ticket priority or SLA
- Close tickets without human sign-off
- Contact customers directly
4. Decision Logic & Stop Cards
4a. Confidence-Based Routing
**IF confidence_score > 0.85 AND category != "Safety/Fraud":**
└─ AUTO-ROUTE to appropriate queue
└─ Update ticket status: "Routed (automated)"
└─ Log decision with confidence score
└─ Human reviewer spot-checks 5% of auto-routed tickets daily
**ELIF 0.65 < confidence_score <= 0.85:**
└─ FLAG FOR HUMAN REVIEW (add to review queue)
└─ Show model's top 2 category suggestions
└─ Show confidence score
└─ Wait for human decision (<4-hour SLA)
**ELIF confidence_score <= 0.65:**
└─ STOP CARD: MANUAL REVIEW REQUIRED
└─ Route to most experienced agent
└─ Flag as "uncertain—human expertise needed"
└─ SLA: <2 hours
**IF category == "Safety/Fraud":**
└─ STOP CARD: ESCALATE IMMEDIATELY
└─ Notify manager + compliance team
└─ Do NOT auto-route
└─ SLA: <15 minutes
4b. Stop Card Conditions (Non-Negotiable Escalation)
| Stop Card | Trigger | Action | Owner |
|---|---|---|---|
| Safety Alert | Model classifies as "Safety/Fraud" OR human detects harm/fraud mention | Escalate to manager within 15 min | Support Mgr |
| Model Drift | Weekly accuracy drops below 92% | Pause auto-routing, review prompt, retrain if needed | ML Engineer |
| System Failure | API latency >10 seconds OR >3 consecutive failures | Switch to manual routing, page on-call engineer | SRE |
| Override Storm | >20% of human reviews contradict model in a day | Review workflow logic, update prompt | Product Manager |
5. Handoff Points (Workflow Integration Skill)
5a. AI → Human Review
Trigger: Confidence 0.65–0.85
Responsibility: Human support agent
SLA: <4 hours
Output: Corrected category + reason for correction (logged)
5b. Human Review → AI Learning Loop
If human decision differs from model:
1. Log disagreement (category, confidence, correction)
2. Weekly: Aggregate corrections
3. Monthly: Retrain prompt or fine-tune model if >5% pattern
4. Update prompt library
5c. Escalation → Manager
Trigger: Stop card conditions (safety, model drift, system failure)
Responsibility: Support manager
SLA: Varies (safety = 15 min, others = 1 day)
Output: Incident report + corrective action
6. Audit Trail & Evidence (Ethics & Trust Skill)
What gets logged for every ticket:
{
"ticket_id": "TKT-12345",
"timestamp": "2026-02-06T14:23:45Z",
"input": {
"subject": "Can't reset my password",
"body": "I've tried three times. Says 'invalid token.' Please help.",
"customer_account_type": "premium"
},
"ai_output": {
"category": "Account Access",
"confidence": 0.92,
"reasoning": "Customer explicitly mentions password reset failure."
},
"routing_decision": {
"auto_routed": true,
"queue": "account-support",
"confidence_threshold_met": true
},
"human_override": {
"occurred": false,
"override_reason": null,
"human_category": null
},
"outcome": {
"actual_category": "Account Access",
"resolved_correctly": true,
"resolution_time_minutes": 12
}
}
Retention: 2 years minimum (for regulatory audit)
Access: Logged-in support agents + compliance team
Security: Encrypted at rest, masked PII in logs
7. Monitoring & KPIs (Critical Evaluation Skill)
Daily Dashboard
| Metric | Target | Alert If |
|---|---|---|
| Accuracy (manual vs. AI) | ≥95% | <92% |
| Auto-route rate | 75–85% | <70% or >90% |
| Human override rate | <5% | >10% |
| Stop card activation rate | <2% | >3% |
| Average confidence score | 0.80+ | <0.75 |
Weekly Review
- Sample 100 auto-routed tickets; verify category accuracy
- Aggregate human overrides; identify patterns
- Check for category drift (e.g., more "Safety" alerts than before?)
- Review stop card triggers; were they valid?
Monthly Review
- Retrain/refine prompt if accuracy <93%
- Benchmark against baseline (human-only routing)
- Calculate cost savings: (4 FTEs saved) - (model cost + human review overhead)
- Report to business stakeholder + compliance team
Step 3: Integration Checklist (AI Strategy Skill)
- Workflow goal is measurable (e.g., "reduce ticket routing time from 8 min to <1 min")
- AI task is specific & scoped (classify ≠ decide priority ≠ respond)
- Confidence thresholds are defined (>0.85 auto, 0.65–0.85 review, <0.65 escalate)
- Stop cards are crisp & unambiguous (safety, drift, failure = immediate escalation)
- Handoff rules are clear (AI → human review: <4 hrs; human review → AI learning: monthly check)
- Audit trail captures everything (input, output, override, outcome, timestamp)
- Monitoring is automated (daily dashboard, weekly spot-check, monthly full review)
- Escalation path is documented (who gets paged for safety? for drift? for outage?)
Step 4: Pilot & Validation
Pilot Phase (Week 1)
- Deploy to 5% of traffic (100 tickets/day)
- Monitor accuracy daily
- Document all human overrides + reason
- Alert if any Stop Card triggered
Go/No-Go Decision:
- If accuracy ≥92%: proceed to expansion
- If accuracy <92%: debug prompt, retest
Expansion Phase (Weeks 2–4)
- 25% → 50% → 100% of traffic (if pilot successful)
- Monitor key metrics daily
- Implement weekly review process
Production Phase (Ongoing)
- Continuous monitoring per KPIs
- Monthly prompt refresh
- Quarterly business review with stakeholders
Foundational Skills Checklist
- AI Strategy: Workflow goal tied to business metric (e.g., cost savings, faster resolution)
- Prompting: Prompt is versioned and tested for accuracy across models
- Workflow Integration: Handoff rules, stop cards, and audit trail fully designed
- Critical Evaluation: KPIs defined, monitoring dashboard built, spot-check process documented
- Ethics & Trust: Guardrails (confidence thresholds, stop cards) designed to survive model changes; audit trail captures all decisions
Real-World Example: Comparison
Workflow Anti-Pattern (❌ Not AI-Native)
Tickets come in → AI classifier runs in background →
Sometimes teams see the result, sometimes they don't →
No one knows if accuracy is good →
AI quietly misdirects critical issues →
Customers complain
Workflow Best Practice (✓ AI-Native)
Tickets come in → AI classifier + confidence score →
IF high confidence → auto-route + log → weekly spot-check
IF low confidence → human review (with AI suggestion) →
IF safety flag → escalate immediately → audit trail captured
Monitoring dashboard shows accuracy, override rate, SLA metrics
Monthly: Review outcomes, retrain if needed, celebrate wins
The difference: Intentional human oversight + continuous learning + measurable outcomes.