Strategic AI Brief Template
Use this template to justify AI deployment decisions to stakeholders. Output: 1-2 pages.
Executive Summary (1 paragraph)
Describe the business opportunity, the AI capability being deployed, and the key risk threshold. State clearly whether this deployment is recommended (GO), conditional (GO with controls), or not recommended (NO GO).
Example: "We propose deploying Claude 3.5 Sonnet for customer service intent classification (70% automation goal). Hallucinogenic errors in safety-critical cases present unacceptable risk without a human verification layer. Recommendation: GO with Circuit Breaker Rule."
Opportunity Assessment
Business Problem
- What business pain point does this solve? (e.g., 4 FTEs spent on intent routing; 18% mislabeling rate)
- Why now? (e.g., new regulatory requirement, competitive threat, cost pressure)
AI Capability
- Which model/system? (e.g., OpenAI GPT-4, Claude, Gemini, internal fine-tune)
- Task definition: What exactly will the AI do? (Be specific: "classify support tickets into 8 categories" not "improve support")
- Success metric: How will we know it works? (Accuracy target, latency, cost reduction %)
Deployment Context
- Scale: How many decisions/outputs per month?
- Stakeholders affected: Who uses this output or is impacted by errors?
- Replacement scenario: What human process does this replace/augment?
Risk Threshold Definition
Consequence Classes (rank by severity)
| Consequence | Definition | Risk Level |
|---|---|---|
| Critical | Regulatory violation, direct financial loss >$100K, safety/injury | Red |
| Major | Customer loss of trust, reputational damage, rework costs $10-100K | Orange |
| Minor | Operational inefficiency, customer friction, <$10K rework | Yellow |
| Negligible | Cosmetic errors, no downstream impact | Green |
Current Hallucination/Error Rate (from testing)
- False positive rate: ___% (e.g., "classified as 'billing issue' when it was 'technical support'")
- False negative rate: ___% (e.g., "missed critical issues")
- Rare edge-case errors: List 2-3 examples from testing (e.g., "confused 'credit card fraud' with 'billing question'")
Acceptable Risk Tolerance
- For critical consequences: How many critical errors can we tolerate per month? (e.g., "0-1" vs. "up to 5")
- For major consequences: Same (e.g., "up to 10")
- Threshold: If error rate exceeds this, we pause the deployment.
Control Architecture (Refusal Stack)
Layer 1: Pre-Action Controls
What prevents a bad decision from being made?
- Human review step (e.g., all "other" category tickets reviewed by human before routing)
- Confidence threshold (e.g., only auto-route if model confidence >0.85)
- Scope limits (e.g., only route non-urgent tickets)
- Other: ________________
Effectiveness: What % of errors will this catch? ____%
Layer 2: Circuit Breaker Rule
Under what condition do we stop the deployment?
- Example: "If weekly hallucination rate exceeds 2%, pause auto-routing and escalate to human review."
- Monitoring metric: _____________________
- Pause threshold: _____________________
- Who gets alerted: _____________________
Layer 3: Audit Trail / Logging
What data do we capture to prove safe operation?
- Model input (the ticket text)
- Model output (predicted category + confidence)
- Human override (if human changed the decision)
- Outcome (actual correct category, from customer feedback or manual review)
- Timestamp and operator
Retention: How long do we keep logs? _____ (recommended: 2 years minimum)
Financial Model
| Input | Value | Notes |
|---|---|---|
| Implementation cost | $___K | (infrastructure, training, validation) |
| Ongoing monthly cost | $___K | (API calls, human oversight, monitoring) |
| Savings per month | $___K | (e.g., 4 FTEs × $6K/month = $24K) |
| Payback period | ___ months | (implementation ÷ monthly savings) |
| ROI (annual) | __% |
Risk-Adjusted Scenario
If actual accuracy is 10% worse than testing, what does ROI look like?
- New monthly savings: $___K
- New ROI: ___%
Governance Checkpoints
Approval Timeline
- Week 1: Pilot with 5% of traffic (100 tickets/day). Daily error rate monitoring.
- Week 2: Expand to 25% of traffic if error rate stays <threshold.
- Week 3: Expand to 50% of traffic.
- Week 4: Full rollout if all thresholds met.
Quarterly Review
- Audit 100 random decisions (manual review of input + output + outcome).
- Report actual hallucination rate to governance committee.
- Update risk threshold based on new model releases.
Annual Renewal
- Business case still valid? (ROI, regulatory environment)
- Recommend upgrade to newer model or retire capability?
Sign-Off
| Role | Name | Date | Approval |
|---|---|---|---|
| Business Sponsor | __________ | //____ | ☐ Approve ☐ Conditional ☐ Reject |
| Risk/Compliance | __________ | //____ | ☐ Approve ☐ Conditional ☐ Reject |
| Tech Lead | __________ | //____ | ☐ Approve ☐ Conditional ☐ Reject |
Conditions (if applicable):
Foundational Skills Checklist
- AI Strategy: Deployment rationale and risk threshold documented
- Critical Evaluation: Error rates from testing quantified and ranked
- Workflow Integration: Human handoff points and circuit-breaker defined
- Ethics & Trust: Guardrails and audit trail designed to survive model changes