Your agents are making
thousands of decisions.
You can see none of them.
WhyAgent captures every decision your AI agent makes—model routing, cache vs API calls, escalation thresholds—with full context, alternatives considered, and rationale. Debug the reasoning, not just the execution traces.
Free during beta. No credit card required.
See how it works
You can trace every function call.
You still can't explain the behavior.
Your agent called Claude 3.5 Sonnet on Bedrock ($6/$30 per MTok) eighteen times for simple text classification that DeepSeek v3.2 ($0.62/$1.85 per MTok) handles perfectly—a 30x cost difference. It bypassed DynamoDB cache for "fresh" external API data that hadn't changed in hours. It escalated to human review when it could have self-corrected with better confidence calibration.
What your current tools show
Execution traces answer "what happened" but can't tell you why your agent made those choices.
[2026-02-19 01:45:33] Agent called claude-3.5-sonnet (Bedrock) [2026-02-19 01:45:35] Response: 200 OK (1.1s) [2026-02-19 01:45:35] Tokens: 156 input, 89 output [2026-02-19 01:45:35] Cost: $3.61 [2026-02-19 01:42:15] Agent called mistral-large-3 (Bedrock) [2026-02-19 01:42:17] Response: 200 OK (1.8s) [2026-02-19 01:42:17] Tokens: 89 input, 45 output [2026-02-19 01:42:17] Cost: $0.11 [2026-02-19 01:38:22] Agent called Live API [2026-02-19 01:38:22] Response: 200 OK (0.5s) [2026-02-19 01:38:22] Cost: $0.08
❓ Why did it choose the premium model for simple tasks?
❓ Why bypass the cache?
❓ How to prevent this waste?
What WhyAgent shows
Decision audit trails answer "why it decided to do that" with actionable insights for optimization.
Decision: Model Selection Task: Generate docstring for Python function Options Considered: • deepseek-v3.2 (Bedrock): $0.13, 420ms, sufficient quality • mistral-large-3 (Bedrock): $0.11, 680ms, high quality • claude-3.5-sonnet (Bedrock): $3.61, 1100ms, overkill for task Choice: claude-3.5-sonnet Reasoning: Default routing to "flagship" model Optimal: NO (-$3.48, +680ms, +0% quality) Decision: Data Source Selection Task: Product pricing lookup Options Considered: • DynamoDB cache: $0.001, 45ms, fresh (cached 60s ago) • External API: $0.08, 450ms, identical data Choice: External API Reasoning: Cache invalidation policy too conservative Optimal: NO (-$0.079, +405ms, +0% quality)
✓ Clear reasoning for every choice
✓ Cost impact of decisions
✓ Optimization paths to fix waste
Current observability tools (LangSmith, Arize, Braintrust) track inputs, outputs, latency, and token counts. That's essential.
WhyAgent shows why your agents made those specific choices. That's what's missing.
Instrument decisions,
not just executions
WhyAgent adds decision observability to your existing stack. One SDK call captures every choice your agent makes with full context.
// Agent decision point
const model = await modelSelector.choose({
task: "classify_sentiment",
inputLength: 67,
context: "user_feedback",
platform: "bedrock",
priority: "balanced"
})
// Decision: claude-3.5-sonnet selectedQuery Like Data
Find decision patterns with SQL-like queries across all your agent decisions
Optimize Precisely
Identify exact decision points causing inefficiency, not just symptoms
Learn & Improve
Understand your agent's reasoning patterns and systematically improve them
Debug what matters:
Why your agents make expensive choices
Three common production scenarios where decision observability turns mysterious behavior into actionable optimization opportunities.
Model Cost Optimization
AI budget burning through premium Bedrock model calls for simple tasks
Your customer support agent called Claude 3.5 Sonnet on Bedrock 2,800 times last week for FAQ responses and sentiment analysis. DeepSeek v3.2 or Mistral Large 3 handle these perfectly at 30x lower cost ($0.13 vs $3.61 per query).
Current metrics
WhyAgent solution
Query Bedrock decision history, identify over-provisioning patterns, and create intelligent routing that uses DeepSeek/Mistral for simple tasks and reserves Claude 3.5 Sonnet for complex reasoning that requires its premium capabilities.
Optimized metrics
Cache Hit Rate Recovery
Agents bypassing DynamoDB cache and hitting expensive APIs unnecessarily
Your data enrichment agent makes 22,000 live external API calls daily, but 74% request data that was cached in DynamoDB less than 2 minutes ago and hasn't changed. Each cache miss costs 40x more than a cache hit.
Current metrics
WhyAgent solution
Analyze cache bypass decision patterns, understand why agents prioritize 'freshness' over DynamoDB cached data, and optimize cache policies based on actual data volatility in your domain.
Optimized metrics
Escalation Logic Debugging
Agents escalating to humans when they could self-correct
Your code review agent escalates 34% of pull requests to senior developers, but analysis shows 78% were false positives that the agent could have resolved.
Current metrics
WhyAgent solution
Analyze escalation decision patterns, identify confidence thresholds causing false positives, and calibrate self-correction logic.
Optimized metrics
These scenarios happen every day in production AI systems.
WhyAgent makes the invisible decisions visible.
Ready to debug your agent decisions?
Join our design partner program. Help us build decision observability that actually works for production AI teams.
Request Early Access
Tell us about your AI agent setup. We'll prioritize teams with production agents facing decision optimization challenges.
We'll review your application and reach out within 48 hours.
What you'll get in beta
Design partner benefits
Not ready to apply yet?
Follow our progress at updates@whyagent.dev