Decision observability for AI agents

Your agents are making
thousands of decisions.
You can see none of them.

WhyAgent captures every decision your AI agent makes—model routing, cache vs API calls, escalation thresholds—with full context, alternatives considered, and rationale. Debug the reasoning, not just the execution traces.

Free during beta. No credit card required.

See how it works

Terminal
$ whyagent query \ --type model_selection \ --where "chosen=claude-3.5-sonnet" \ --where "platform=bedrock" \ --where "optimal=false" \ --last 7d
The observability gap

You can trace every function call.
You still can't explain the behavior.

Your agent called Claude 3.5 Sonnet on Bedrock ($6/$30 per MTok) eighteen times for simple text classification that DeepSeek v3.2 ($0.62/$1.85 per MTok) handles perfectly—a 30x cost difference. It bypassed DynamoDB cache for "fresh" external API data that hadn't changed in hours. It escalated to human review when it could have self-corrected with better confidence calibration.

Monthly Overspend
$43,200
On Claude 3.5 Sonnet over-selection
Latency Increase
48%
From cache bypassing
Escalation Rate
3x
Unexplained agent drift

What your current tools show

Execution traces answer "what happened" but can't tell you why your agent made those choices.

LangSmith / Arize / Braintrust
[2026-02-19 01:45:33] Agent called claude-3.5-sonnet (Bedrock)
[2026-02-19 01:45:35] Response: 200 OK (1.1s)
[2026-02-19 01:45:35] Tokens: 156 input, 89 output
[2026-02-19 01:45:35] Cost: $3.61

[2026-02-19 01:42:15] Agent called mistral-large-3 (Bedrock)
[2026-02-19 01:42:17] Response: 200 OK (1.8s)
[2026-02-19 01:42:17] Tokens: 89 input, 45 output
[2026-02-19 01:42:17] Cost: $0.11

[2026-02-19 01:38:22] Agent called Live API
[2026-02-19 01:38:22] Response: 200 OK (0.5s)
[2026-02-19 01:38:22] Cost: $0.08

Why did it choose the premium model for simple tasks?
Why bypass the cache?
How to prevent this waste?

What WhyAgent shows

Decision audit trails answer "why it decided to do that" with actionable insights for optimization.

WhyAgent Decision Audit
Decision: Model Selection
Task: Generate docstring for Python function
Options Considered:
  • deepseek-v3.2 (Bedrock): $0.13, 420ms, sufficient quality
  • mistral-large-3 (Bedrock): $0.11, 680ms, high quality  
  • claude-3.5-sonnet (Bedrock): $3.61, 1100ms, overkill for task
Choice: claude-3.5-sonnet
Reasoning: Default routing to "flagship" model
Optimal: NO (-$3.48, +680ms, +0% quality)

Decision: Data Source Selection  
Task: Product pricing lookup
Options Considered:
  • DynamoDB cache: $0.001, 45ms, fresh (cached 60s ago)
  • External API: $0.08, 450ms, identical data
Choice: External API
Reasoning: Cache invalidation policy too conservative
Optimal: NO (-$0.079, +405ms, +0% quality)

Clear reasoning for every choice
Cost impact of decisions
Optimization paths to fix waste

Current observability tools (LangSmith, Arize, Braintrust) track inputs, outputs, latency, and token counts. That's essential.
WhyAgent shows why your agents made those specific choices. That's what's missing.

Complements your existing LLMOps stack
The solution

Instrument decisions,
not just executions

WhyAgent adds decision observability to your existing stack. One SDK call captures every choice your agent makes with full context.

Step 1: Agent makes decision
// Agent decision point
const model = await modelSelector.choose({
  task: "classify_sentiment",
  inputLength: 67,
  context: "user_feedback",
  platform: "bedrock",
  priority: "balanced"
})
// Decision: claude-3.5-sonnet selected

Query Like Data

Find decision patterns with SQL-like queries across all your agent decisions

Optimize Precisely

Identify exact decision points causing inefficiency, not just symptoms

Learn & Improve

Understand your agent's reasoning patterns and systematically improve them

Real debugging scenarios

Debug what matters:
Why your agents make expensive choices

Three common production scenarios where decision observability turns mysterious behavior into actionable optimization opportunities.

Cost Optimization

Model Cost Optimization

AI budget burning through premium Bedrock model calls for simple tasks

Your customer support agent called Claude 3.5 Sonnet on Bedrock 2,800 times last week for FAQ responses and sentiment analysis. DeepSeek v3.2 or Mistral Large 3 handle these perfectly at 30x lower cost ($0.13 vs $3.61 per query).

Current metrics

Weekly AI cost$10,108
Avg response time1.1s
Cost per query$3.61
WhyAgent

WhyAgent solution

Query Bedrock decision history, identify over-provisioning patterns, and create intelligent routing that uses DeepSeek/Mistral for simple tasks and reserves Claude 3.5 Sonnet for complex reasoning that requires its premium capabilities.

96% cost reduction

Optimized metrics

Weekly AI cost$364
Avg response time0.7s
Cost per query$0.13
Performance

Cache Hit Rate Recovery

Agents bypassing DynamoDB cache and hitting expensive APIs unnecessarily

Your data enrichment agent makes 22,000 live external API calls daily, but 74% request data that was cached in DynamoDB less than 2 minutes ago and hasn't changed. Each cache miss costs 40x more than a cache hit.

Current metrics

Cache hit rate11%
Daily API costs$1,540
Avg latency620ms
WhyAgent

WhyAgent solution

Analyze cache bypass decision patterns, understand why agents prioritize 'freshness' over DynamoDB cached data, and optimize cache policies based on actual data volatility in your domain.

86% cost reduction

Optimized metrics

Cache hit rate89%
Daily API costs$220
Avg latency85ms
Workflow

Escalation Logic Debugging

Agents escalating to humans when they could self-correct

Your code review agent escalates 34% of pull requests to senior developers, but analysis shows 78% were false positives that the agent could have resolved.

Current metrics

Human escalation rate34%
Dev interruptions/day47
Review bottleneck6.2hrs
WhyAgent

WhyAgent solution

Analyze escalation decision patterns, identify confidence thresholds causing false positives, and calibrate self-correction logic.

68% fewer interruptions

Optimized metrics

Human escalation rate11%
Dev interruptions/day12
Review bottleneck1.8hrs

These scenarios happen every day in production AI systems.
WhyAgent makes the invisible decisions visible.

Works with your existing observability stack
Design Partner Program

Ready to debug your agent decisions?

Join our design partner program. Help us build decision observability that actually works for production AI teams.

Request Early Access

Tell us about your AI agent setup. We'll prioritize teams with production agents facing decision optimization challenges.

Help us understand your current setup and pain points

We'll review your application and reach out within 48 hours.

What you'll get in beta

SDK Integration
Python, Node.js, and REST APIs for any stack
Privacy First
Your decision data stays in your infrastructure
Real-time Queries
Query decisions as they happen, not just in batch

Design partner benefits

Free access during entire beta period
Direct influence on product roadmap
Custom integrations for your use case
1-on-1 implementation support
Early access to all new features
Now Accepting
Design partners for early access

Not ready to apply yet?
Follow our progress at updates@whyagent.dev