Platform Engineering

Agent Fabric: The Infrastructure Behind Praxion’s AI

Q: What happens if an LLM provider goes down?

Each agent has a provider configuration. A provider outage is handled by updating that configuration — no code deployment. The platform retries transient errors, and a fallback provider can be configured and selected automatically when the primary provider returns an error.

How Praxion orchestrates financial AI agents across a shared six-plane platform — with unified observability, multi-model provider support, governance, and durable orchestration.

Published May 27, 2026 · Praxion Engineering

Scope: This article describes Praxion’s internal Agent Fabric platform — the shared runtime substrate all Praxion AI agents run on. It is written for technical readers: engineers, platform architects, and practitioners evaluating AI agent infrastructure. For the product perspective on how the five agents (Tax Strategy, Shock Simulation, Withdrawal Sequencing, Data Health, Critic) help users — and the canonical mapping from each backend agent to the AI Review / AI Optimizer / Data Health UI surfaces — see AI Agents and Retirement Optimization.

1. The Agent Coordination Problem

Every team that ships AI agents eventually reinvents the same infrastructure: kill-switches to halt a misbehaving agent without a code deploy, retry policies that don’t amplify cost on failure, per-call cost attribution so you know which agent consumed your LLM budget, audit trails for regulated-adjacent features, and a way to swap model providers without touching agent business logic.

The naive approach is to build this ad hoc, per agent. The result: inconsistent governance, duplicated telemetry code, provider-specific LLM calls scattered across a codebase, and a kill-switch that works for agent A but was never wired for agent B.

Praxion solved this once, as a platform: Agent Fabric. It is a modular platform that provides the shared runtime substrate for all Praxion AI agents. Agents implement one common interface and get safety cut-offs, policy governance, schema validation, structured telemetry, provider-agnostic model access, cross-session memory, and durable orchestration without writing any of that plumbing themselves.

2. Architecture: Six Capability Planes

Agent Fabric organizes capabilities into six named planes plus foundational layers. Each plane is an independent module — agents use only what they need. A runtime layer wires the wrapper, identity, model access, tools, and triggers together; the memory and orchestration planes are constructed by the consuming service and passed in. The runtime adapts automatically to whether it is running in a short-lived serverless function or a long-running process, configuring telemetry flushing accordingly.

  ┌─────────────────────────────────────────────────────────────────┐
  │                     Trigger Layer                               │
  │        Request  ·  Queue  ·  Scheduled  ·  Manual              │
  └──────────────────────────┬──────────────────────────────────────┘
                             │
  ┌──────────────────────────▼──────────────────────────────────────┐
  │                          Wrapper                                │
  │  Safety Cut-off → Policy → Validate Input → run() → Validate Out │
  │  ↳ emits a structured telemetry event on every lifecycle step   │
  └──────┬──────────┬──────────┬──────────┬──────────┬─────────────┘
         │          │          │          │          │
  ┌──────▼──┐ ┌────▼────┐ ┌──▼──────┐ ┌─▼──────┐ ┌▼───────────┐
  │ Model   │ │ Tool    │ │ Memory  │ │Orchestr│ │ Policy     │
  │ Plane   │ │ Plane   │ │ Plane   │ │ Plane  │ │ Plane      │
  └──────┬──┘ └────┬────┘ └──┬──────┘ └─┬──────┘ └────────────┘
         │         │         │           │
  ┌──────▼─────────▼─────────▼───────────▼─────────────────────┐
  │            Platform Services (authenticated)                │
  │   Model Registry · Audit · Memory · Orchestration           │
  │   Safety Cut-off · Secrets · Identity · State               │
  └─────────────────────────────────────────────────────────────┘

🛡️Wrapper

The single invocation interface. Enforces the lifecycle: safety cut-off check → policy check → input validation → run() → output validation. Emits a structured telemetry event on every step. Telemetry destinations are pluggable.

🧠Model Plane

A provider-agnostic abstraction over multiple model providers. Per-agent model profiles (provider, model, parameters, limits) live in a registry. Tracks token counts and estimated cost per call. Swap providers by updating one configuration record.

🔧Tool Plane

Dispatches tool calls from the model turn loop. Sanitizes sensitive arguments before audit logging. Compresses large outputs. Records each tool call in telemetry automatically.

🧩Memory Plane

Cross-session key/value store with expiry and tag-based search, isolated per tenant and agent. Agents store decisions, risk flags, and profile observations and retrieve them on the next session to avoid cold starts.

🔀Orchestration Plane

Durable multi-step workflows with parallel fan-out/fan-in for multi-agent execution. Approval gates let an agent pause, the user confirm, and the workflow resume. Cross-agent delegation uses short-lived signed tokens.

⚖️Policy Plane

Governance enforcement: a data-sensitivity ceiling, a minimum compliance level, and request rate limits. Two modes: audit (log violations, allow) and enforce (block). Runs inside the Wrapper before input reaches agent code.

3. Multi-Model Provider Support

The Model Plane routes all model calls through a single, provider-agnostic interface. Every provider is normalized to a common request/response shape, so no provider-specific code lives in agent business logic.

General-purposeBalanced quality and cost

Low-latencyFast interactive responses

Cost-efficientHigh-volume background work

High-reasoningComplex, multi-step tasks

LocalDevelopment without external calls

FallbackAutomatic failover on provider errors

Each agent has a model profile in the registry: the provider, model, generation parameters, and limits to use — plus an optional fallback. Changing providers, or failing over during an outage, is a configuration update; there is no code change or deploy. On every successful call, the Model Plane records token counts, latency, and an estimated cost — regardless of provider.

4. The Integration Pattern

Building a new agent takes three steps: define the agent, wire the wrapper, attach a trigger. All governance, telemetry, and provider abstraction emerge automatically.

Step 1 — Define the Agent

Every agent implements one common interface with a single entry point. That entry point receives validated input plus a run context that exposes helpers for telemetry, memory, tools, and orchestration. The agent author writes business logic only — the plumbing is provided by the platform.

Step 2 — Wire the Wrapper

The wrapper is configured with the agent plus its telemetry destinations, safety cut-off, and policy rules. From that point on, every invocation is automatically governed and instrumented. Service-to-service calls are authenticated by the platform’s identity layer rather than per-service credentials.

Step 3 — Attach a Trigger

The same wrapped agent can be exposed through several trigger types — request-based, queue-driven, scheduled, or manual — without changing the agent. Each trigger resolves the tenant and validates the caller before the agent runs.

Agent Manifest

Every agent ships a declarative manifest, validated against a shared schema and consumed by the build pipeline, the policy plane, and the platform dashboard. It declares which other agents this agent may call, its rate limits, its data-sensitivity level, its trigger schedule, reliability targets, and the memory keys it uses. Because these are declared rather than coded, they can be reviewed and enforced consistently across every agent.

5. Observability: Metrics, Envelopes, Audit Trail

Observability is not opt-in. Every Wrapper-wrapped agent emits three layers of telemetry automatically on every invocation — no per-agent instrumentation code required.

Layer 1 — Metrics

Each telemetry event is emitted in a standard structured format and forwarded to the platform’s metrics system, which extracts metrics automatically. Every metric carries a consistent set of dimensions, so any metric can be sliced by agent, team, tenant, environment, or compliance level without extra instrumentation.

Standard dimensions (on every metric):

AgentVersionTeamCost centerTenantEnvironmentCompliance level

Tracked metric categories:

InvocationsEach agent run

Success / failureOutcome of each run

LatencyTime per invocation

Token usagePrompt + completion, per model call

Estimated costPer model call

Tool callsEach tool dispatch

Tool failuresEach failed tool call

Provider fallbackWhen failover is used

Safety cut-offsWhen a run is blocked

Validation failuresInput or output schema fail

Approval pausesWhen a workflow gate is hit

Compliance exceptionsNon-standard compliance level

Layer 2 — Structured Event Record

A consistent, flat record is emitted on every lifecycle step — start, model call, tool call, completion, error, safety cut-off, policy decision, pause, and approval. The record format is versioned and forward-compatible; new fields are additive.

Layer 3 — Audit Trail

An append-only store for every event record — tamper-evident because the service exposes no update or delete operations. Records are isolated per tenant, written idempotently, and retained according to a configurable retention tier. Every record associated with a given request can be looked up directly, which supports replay and incident investigation.

Turn-Level Observability in the UI

Every agent turn is visible at two levels: the Thinking Panel that users see as the agent reasons in real time, and the Turn Breakdown that surfaces per-round LLM latency, token counts, and tool dispatch order. Both are driven by the structured SSE event stream emitted by the Wrapper.

Reasoning Chain (user-facing)

▾How Praxion AI reasoned6 steps

1Analyzing your planning question…
2Running projection engine…
3Projection engine complete — incorporating results…
4Running tax impact analyzer…
5Tax analyzer complete — incorporating results…
6Composing your personalized recommendation…

Turn Breakdown (per-round telemetry)

🔧 3 tool callsllm → tool
🧭 Routing
routing_sourcellm → tool
agent_usedplanning-advisor
router_intentwithdrawal_strategy
provider(configured)
model(per profile)
LLM rounds
R11.2s2,847p + 312crun_projection
R21.8s4,102p + 448ctax_analyzer
R32.4s5,614p + 612c(response)
Tool calls
run_projection842ms✓ success
tax_analyzer1,203ms✓ success
policy_lookup12ms✓ success

The Thinking Panel updates from a live event stream emitted as the agent reasons. The Turn Breakdown is driven by the model-call and tool-call records returned in the response diagnostics — the same data that flows into the platform's metrics.

Cost Attribution

Every model call records an estimated cost alongside the agent and cost-center dimensions. This means cost rolls up automatically into per-agent, per-team, and per-tenant aggregates without any additional instrumentation. The two views below show how cost surfaces: at the per-invocation level (single session breakdown) and at the fleet level (aggregate by agent over a rolling window).

Per-Invocation Cost Breakdown

💸 Session cost$0.0032
TOKENS
tokens_in  (prompt)9,461
tokens_out (completion)1,372
total_tokens10,833
COST (USD)
prompt   ($0.15 / 1M tok)$0.0014
completion ($0.60 / 1M tok)$0.0008
tool calls (3 × deterministic)$0.0000
total_cost_usd_cents0.32¢
ATTRIBUTION
agent_namesample-planning-advisor
cost_centerai.planning
tenant_idacme-corp-prod
llm_calls3
wall_clock5,412ms

Fleet Cost by Agent (rolling 30 days)

AgentCallsTokensCost

planning-advisor

ai.planning

8,24189.3M$13.40

review-orchestrator

ai.review

1,10448.7M$7.31

monitor-agent

ai.monitoring

22,64031.2M$4.68

clarity-agent

ai.planning

3,91719.8M$2.97

scenario-simulator

ai.review

89211.4M$1.71

platform-validator

platform.ops

4402.1M$0.32

All agents (30d)37,234202.5M$30.39

Source: platform telemetry aggregated by agent and cost center. Deterministic tool calls (projection engine, tax calculator) have zero model cost — only the conversational turns are billed.

6. Sample Telemetry Output

Every completed invocation produces a single, flat record populated entirely by the wrapper — the agent author writes none of it. Grouped by purpose, each record captures:

Identity & tracing: a unique request identifier and trace links, so any invocation can be found and replayed.
Tenant & cost attribution: the tenant, the owning team/cost center, and the calling agent (if any).
Governance: the data-sensitivity level, compliance level, and any approved exceptions.
Timing & outcome: start/end time, total latency, and success or error.
Model & tool usage: token counts and an estimated cost for each model call, plus the outcome of each tool call.
Validation & policy: input/output validation results and the policy decision for the run.

What this record enables: cost attribution, latency monitoring, usage visibility, trust-chain audit, validation-regression detection, and replay — all from one flat record, with no joins required.

7. Governance: Safety Cut-off, Policy, Trust

Three enforcement layers fire inside the Wrapper before an agent’s run() is called. Agent business logic cannot bypass them.

🔴Safety Cut-off

A per-agent or per-tenant emergency stop. Activated without a code deploy — a single control flips the switch. When active, the wrapper records the event and the agent's logic is never called. A failure in the safety check itself is treated as a stop (fail-safe). Used for runaway cost events, safety incidents, and provider failures requiring an immediate halt.

⚖️Policy Plane

Declares three governance rules per agent: the most sensitive data the agent may process, the minimum compliance level required, and request rate limits. Audit mode logs violations; enforce mode blocks them before the agent runs.

🔐Identity Plane

Cross-agent calls require a short-lived signed token issued by the calling agent. The set of agents each agent may call is declared in configuration and enforced when the token is issued: an agent will not issue a token for any agent outside its allowed set, and the receiver verifies the token before running. Every telemetry record carries the full trust chain.

The wrapper runs the same lifecycle on every invocation, in order:

Safety cut-off check — if the agent is stopped, the run is blocked and recorded; agent logic never executes.
Policy check — violations are blocked in enforce mode, or logged in audit mode.
Input validation — malformed input is rejected before reaching agent logic.
Run — the agent’s business logic executes, wrapped in a configurable retry policy for transient errors.
Output validation — responses are validated against the agent’s declared output shape.
Completion — the final record is emitted and all telemetry is flushed before the run returns.

8. Memory Plane: Cross-Session Context

Without memory, every agent session starts cold. The agent has no knowledge of what was discussed last week, which risk flags were surfaced last month, or which strategies the user has already considered and rejected. For a financial planning product, this is a meaningful limitation — recommendations feel generic, not personalized.

The Memory Plane provides a cross-session key/value store, isolated per tenant and per agent and exposed to the agent through its run context. Agents store facts during a session — accepted strategies, risk flags, goal changes — each with an expiry and optional tags, and retrieve them on the next session by key or by tag. Retrieved facts are added as additional context so the agent can pick up where the last conversation left off. Records can also be removed, for example when a user resets a strategy.

BackendManaged cloud key/value store

IsolationTenant + agent scoped (no cross-contamination)

ExpiryPer-record time-to-live

SearchTag-based filter, scoped to the tenant + agent

AuthAuthenticated service-to-service calls

EnvironmentsIn-memory for tests, networked for production

Example use: A planning advisor agent looks up prior user decisions at session start and adds them as context, so it can reference earlier strategy acceptances and goal changes without the user having to repeat themselves.

9. Orchestration: Durable Workflows & Approval Gates

The Orchestration Plane provides durable multi-step workflow coordination. Workflow state is persisted, so it survives across separate executions — a step that takes minutes or requires human approval doesn’t have to complete in a single run. The plane is constructed by the consuming service and threaded into the agent’s run context.

Durable Workflows

A workflow is a sequence of steps with explicit dependencies. Each step can call another agent or run a unit of work; its result is recorded before the next dependent step begins. Because state is durable, a later step can resume in a completely separate execution from the one that started the workflow.

Fan-Out / Fan-In (Parallel Multi-Agent)

For complex questions that need several specialist perspectives at once, a parent step fans out into parallel branches — for example, separate tax, income, and estate analyses. A join step declares a dependency on every branch and waits until all of them complete, then feeds their combined outputs into a synthesizing step.

Approval Gates

For irreversible or high-stakes recommendations — a specific Roth conversion amount, a rebalancing election, or a Social Security filing date — the agent pauses at an approval gate. The step waits until the user explicitly approves or rejects; the workflow resumes only on approval, and a rejection cancels the downstream chain. The orchestration service does not notify the user itself — the consuming service watches for the status change and notifies the user.

Design principle: Praxion agents surface recommendations and require user confirmation for any action the user might call “executing a strategy.” Agents do not autonomously initiate financial transactions. The approval gate is the platform-level enforcement of this principle — not a convention that could be accidentally omitted.

10. Benefits in Practice

These are the concrete outcomes of running Praxion's financial AI agents on Agent Fabric, measured against what ad-hoc per-agent implementations would require.

⚡Days to production

A new agent needs a manifest, an implementation of the common agent interface, and a trigger. The platform wires the safety cut-off, policy, telemetry, identity, model access, and tools out of the box; memory and orchestration are added by the consuming service. What previously took weeks of platform plumbing now takes days of business logic.

💸Unified cost visibility

Every model call records an estimated cost tagged with the agent, team, and tenant. One dashboard shows which agent consumed what budget, the cost-per-invocation trend, and cost attribution by team. No manual instrumentation, no spreadsheet reconciliation.

🔄Zero-downtime provider switching

A provider outage is handled by updating one configuration record to point at a fallback provider. No code changes, no deploys. Agents pick up the new profile on the next invocation, and the switch is logged in the audit trail.

🛡️Compliance by default

An agent that handles regulated user data declares that sensitivity level in its manifest. The Policy Plane enforces the ceiling at the wrapper boundary — not in business logic where it could be forgotten. A new engineer cannot accidentally route regulated data through an unapproved agent.

🔁Regression replay

Every invocation is stored in the audit trail and can be re-run against a new agent version before it is promoted to production. If outputs diverge beyond a threshold, the release is flagged. Regressions are caught before they reach users.

🧠Personalization across sessions

The Memory Plane persists user decisions (accepted strategies, risk flag acknowledgements, goals). On the next session, the agent retrieves this context and reasons about it — without the user needing to re-explain. Recommendations feel like a conversation, not a cold start.

11. Frequently Asked Questions

Is Agent Fabric open-source?

Agent Fabric is Praxion's internal platform, not currently published as open-source. The architecture and patterns described here reflect the production system that runs all Praxion AI agents.

Which LLM providers does Praxion use?

The model layer is provider-agnostic and supports multiple large language model providers. The active provider for each agent is a configuration setting, so providers can be changed without code changes.

How does Praxion prevent one user's agent from accessing another user's data?

Every platform service — memory, orchestration, audit trail, and state — is isolated per tenant. Cross-agent calls require a short-lived signed token, and the set of agents each agent may call is declared in configuration and enforced at the boundary.

What happens if an LLM provider goes down?

Updating one configuration record redirects all calls for the affected agent to a fallback provider — no code deploy required. The platform retries transient errors with backoff before a provider switch is needed.

Can Praxion agents take financial actions automatically?

No. Praxion agents analyze, model, and recommend — they do not execute trades, initiate transfers, or take irreversible financial actions. The Orchestration Plane's approval gate requires explicit user confirmation before any workflow step that surfaces an irreversible recommendation. The deterministic retirement engine computes all numerical outcomes; agents orchestrate and explain.

See It in Action

Every Praxion AI response is powered by an Agent Fabric agent — with full telemetry, deterministic grounding, and cross-session memory. Try Praxion AI with your own financial profile.

Try Praxion AI Full Technology Stack AI Agents & Retirement