Agent Fabric: The Infrastructure Behind Praxion’s AI
How Praxion orchestrates financial AI agents across a shared six-plane platform — with unified observability, multi-model provider support, governance, and durable orchestration.
Published · Praxion Engineering
1. The Agent Coordination Problem
Every team that ships AI agents eventually reinvents the same infrastructure: kill-switches to halt a misbehaving agent without a code deploy, retry policies that don’t amplify cost on failure, per-call cost attribution so you know which agent consumed your LLM budget, audit trails for regulated-adjacent features, and a way to swap model providers without touching agent business logic.
The naive approach is to build this ad hoc, per agent. The result: inconsistent governance, duplicated telemetry code, provider-specific LLM calls scattered across a codebase, and a kill-switch that works for agent A but was never wired for agent B.
Praxion solved this once, as a platform: Agent Fabric. It is a modular platform that provides the shared runtime substrate for all Praxion AI agents. Agents implement one common interface and get safety cut-offs, policy governance, schema validation, structured telemetry, provider-agnostic model access, cross-session memory, and durable orchestration without writing any of that plumbing themselves.
2. Architecture: Six Capability Planes
Agent Fabric organizes capabilities into six named planes plus foundational layers. Each plane is an independent module — agents use only what they need. A runtime layer wires the wrapper, identity, model access, tools, and triggers together; the memory and orchestration planes are constructed by the consuming service and passed in. The runtime adapts automatically to whether it is running in a short-lived serverless function or a long-running process, configuring telemetry flushing accordingly.
┌─────────────────────────────────────────────────────────────────┐
│ Trigger Layer │
│ Request · Queue · Scheduled · Manual │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ Wrapper │
│ Safety Cut-off → Policy → Validate Input → run() → Validate Out │
│ ↳ emits a structured telemetry event on every lifecycle step │
└──────┬──────────┬──────────┬──────────┬──────────┬─────────────┘
│ │ │ │ │
┌──────▼──┐ ┌────▼────┐ ┌──▼──────┐ ┌─▼──────┐ ┌▼───────────┐
│ Model │ │ Tool │ │ Memory │ │Orchestr│ │ Policy │
│ Plane │ │ Plane │ │ Plane │ │ Plane │ │ Plane │
└──────┬──┘ └────┬────┘ └──┬──────┘ └─┬──────┘ └────────────┘
│ │ │ │
┌──────▼─────────▼─────────▼───────────▼─────────────────────┐
│ Platform Services (authenticated) │
│ Model Registry · Audit · Memory · Orchestration │
│ Safety Cut-off · Secrets · Identity · State │
└─────────────────────────────────────────────────────────────┘The single invocation interface. Enforces the lifecycle: safety cut-off check → policy check → input validation → run() → output validation. Emits a structured telemetry event on every step. Telemetry destinations are pluggable.
A provider-agnostic abstraction over multiple model providers. Per-agent model profiles (provider, model, parameters, limits) live in a registry. Tracks token counts and estimated cost per call. Swap providers by updating one configuration record.
Dispatches tool calls from the model turn loop. Sanitizes sensitive arguments before audit logging. Compresses large outputs. Records each tool call in telemetry automatically.
Cross-session key/value store with expiry and tag-based search, isolated per tenant and agent. Agents store decisions, risk flags, and profile observations and retrieve them on the next session to avoid cold starts.
Durable multi-step workflows with parallel fan-out/fan-in for multi-agent execution. Approval gates let an agent pause, the user confirm, and the workflow resume. Cross-agent delegation uses short-lived signed tokens.
Governance enforcement: a data-sensitivity ceiling, a minimum compliance level, and request rate limits. Two modes: audit (log violations, allow) and enforce (block). Runs inside the Wrapper before input reaches agent code.
3. Multi-Model Provider Support
The Model Plane routes all model calls through a single, provider-agnostic interface. Every provider is normalized to a common request/response shape, so no provider-specific code lives in agent business logic.
Each agent has a model profile in the registry: the provider, model, generation parameters, and limits to use — plus an optional fallback. Changing providers, or failing over during an outage, is a configuration update; there is no code change or deploy. On every successful call, the Model Plane records token counts, latency, and an estimated cost — regardless of provider.
4. The Integration Pattern
Building a new agent takes three steps: define the agent, wire the wrapper, attach a trigger. All governance, telemetry, and provider abstraction emerge automatically.
Step 1 — Define the Agent
Every agent implements one common interface with a single entry point. That entry point receives validated input plus a run context that exposes helpers for telemetry, memory, tools, and orchestration. The agent author writes business logic only — the plumbing is provided by the platform.
Step 2 — Wire the Wrapper
The wrapper is configured with the agent plus its telemetry destinations, safety cut-off, and policy rules. From that point on, every invocation is automatically governed and instrumented. Service-to-service calls are authenticated by the platform’s identity layer rather than per-service credentials.
Step 3 — Attach a Trigger
The same wrapped agent can be exposed through several trigger types — request-based, queue-driven, scheduled, or manual — without changing the agent. Each trigger resolves the tenant and validates the caller before the agent runs.
Agent Manifest
Every agent ships a declarative manifest, validated against a shared schema and consumed by the build pipeline, the policy plane, and the platform dashboard. It declares which other agents this agent may call, its rate limits, its data-sensitivity level, its trigger schedule, reliability targets, and the memory keys it uses. Because these are declared rather than coded, they can be reviewed and enforced consistently across every agent.
5. Observability: Metrics, Envelopes, Audit Trail
Observability is not opt-in. Every Wrapper-wrapped agent emits three layers of telemetry automatically on every invocation — no per-agent instrumentation code required.
Layer 1 — Metrics
Each telemetry event is emitted in a standard structured format and forwarded to the platform’s metrics system, which extracts metrics automatically. Every metric carries a consistent set of dimensions, so any metric can be sliced by agent, team, tenant, environment, or compliance level without extra instrumentation.
Standard dimensions (on every metric):
AgentVersionTeamCost centerTenantEnvironmentCompliance levelTracked metric categories:
Layer 2 — Structured Event Record
A consistent, flat record is emitted on every lifecycle step — start, model call, tool call, completion, error, safety cut-off, policy decision, pause, and approval. The record format is versioned and forward-compatible; new fields are additive.
Layer 3 — Audit Trail
An append-only store for every event record — tamper-evident because the service exposes no update or delete operations. Records are isolated per tenant, written idempotently, and retained according to a configurable retention tier. Every record associated with a given request can be looked up directly, which supports replay and incident investigation.
Turn-Level Observability in the UI
Every agent turn is visible at two levels: the Thinking Panel that users see as the agent reasons in real time, and the Turn Breakdown that surfaces per-round LLM latency, token counts, and tool dispatch order. Both are driven by the structured SSE event stream emitted by the Wrapper.
Reasoning Chain (user-facing)
Turn Breakdown (per-round telemetry)
The Thinking Panel updates from a live event stream emitted as the agent reasons. The Turn Breakdown is driven by the model-call and tool-call records returned in the response diagnostics — the same data that flows into the platform's metrics.
Cost Attribution
Every model call records an estimated cost alongside the agent and cost-center dimensions. This means cost rolls up automatically into per-agent, per-team, and per-tenant aggregates without any additional instrumentation. The two views below show how cost surfaces: at the per-invocation level (single session breakdown) and at the fleet level (aggregate by agent over a rolling window).
Per-Invocation Cost Breakdown
Fleet Cost by Agent (rolling 30 days)
Source: platform telemetry aggregated by agent and cost center. Deterministic tool calls (projection engine, tax calculator) have zero model cost — only the conversational turns are billed.
6. Sample Telemetry Output
Every completed invocation produces a single, flat record populated entirely by the wrapper — the agent author writes none of it. Grouped by purpose, each record captures:
- Identity & tracing: a unique request identifier and trace links, so any invocation can be found and replayed.
- Tenant & cost attribution: the tenant, the owning team/cost center, and the calling agent (if any).
- Governance: the data-sensitivity level, compliance level, and any approved exceptions.
- Timing & outcome: start/end time, total latency, and success or error.
- Model & tool usage: token counts and an estimated cost for each model call, plus the outcome of each tool call.
- Validation & policy: input/output validation results and the policy decision for the run.
7. Governance: Safety Cut-off, Policy, Trust
Three enforcement layers fire inside the Wrapper before an agent’s run() is called. Agent business logic cannot bypass them.
A per-agent or per-tenant emergency stop. Activated without a code deploy — a single control flips the switch. When active, the wrapper records the event and the agent's logic is never called. A failure in the safety check itself is treated as a stop (fail-safe). Used for runaway cost events, safety incidents, and provider failures requiring an immediate halt.
Declares three governance rules per agent: the most sensitive data the agent may process, the minimum compliance level required, and request rate limits. Audit mode logs violations; enforce mode blocks them before the agent runs.
Cross-agent calls require a short-lived signed token issued by the calling agent. The set of agents each agent may call is declared in configuration and enforced when the token is issued: an agent will not issue a token for any agent outside its allowed set, and the receiver verifies the token before running. Every telemetry record carries the full trust chain.
The wrapper runs the same lifecycle on every invocation, in order:
- Safety cut-off check — if the agent is stopped, the run is blocked and recorded; agent logic never executes.
- Policy check — violations are blocked in enforce mode, or logged in audit mode.
- Input validation — malformed input is rejected before reaching agent logic.
- Run — the agent’s business logic executes, wrapped in a configurable retry policy for transient errors.
- Output validation — responses are validated against the agent’s declared output shape.
- Completion — the final record is emitted and all telemetry is flushed before the run returns.
8. Memory Plane: Cross-Session Context
Without memory, every agent session starts cold. The agent has no knowledge of what was discussed last week, which risk flags were surfaced last month, or which strategies the user has already considered and rejected. For a financial planning product, this is a meaningful limitation — recommendations feel generic, not personalized.
The Memory Plane provides a cross-session key/value store, isolated per tenant and per agent and exposed to the agent through its run context. Agents store facts during a session — accepted strategies, risk flags, goal changes — each with an expiry and optional tags, and retrieve them on the next session by key or by tag. Retrieved facts are added as additional context so the agent can pick up where the last conversation left off. Records can also be removed, for example when a user resets a strategy.
Example use: A planning advisor agent looks up prior user decisions at session start and adds them as context, so it can reference earlier strategy acceptances and goal changes without the user having to repeat themselves.
9. Orchestration: Durable Workflows & Approval Gates
The Orchestration Plane provides durable multi-step workflow coordination. Workflow state is persisted, so it survives across separate executions — a step that takes minutes or requires human approval doesn’t have to complete in a single run. The plane is constructed by the consuming service and threaded into the agent’s run context.
Durable Workflows
A workflow is a sequence of steps with explicit dependencies. Each step can call another agent or run a unit of work; its result is recorded before the next dependent step begins. Because state is durable, a later step can resume in a completely separate execution from the one that started the workflow.
Fan-Out / Fan-In (Parallel Multi-Agent)
For complex questions that need several specialist perspectives at once, a parent step fans out into parallel branches — for example, separate tax, income, and estate analyses. A join step declares a dependency on every branch and waits until all of them complete, then feeds their combined outputs into a synthesizing step.
Approval Gates
For irreversible or high-stakes recommendations — a specific Roth conversion amount, a rebalancing election, or a Social Security filing date — the agent pauses at an approval gate. The step waits until the user explicitly approves or rejects; the workflow resumes only on approval, and a rejection cancels the downstream chain. The orchestration service does not notify the user itself — the consuming service watches for the status change and notifies the user.
10. Benefits in Practice
These are the concrete outcomes of running Praxion's financial AI agents on Agent Fabric, measured against what ad-hoc per-agent implementations would require.
A new agent needs a manifest, an implementation of the common agent interface, and a trigger. The platform wires the safety cut-off, policy, telemetry, identity, model access, and tools out of the box; memory and orchestration are added by the consuming service. What previously took weeks of platform plumbing now takes days of business logic.
Every model call records an estimated cost tagged with the agent, team, and tenant. One dashboard shows which agent consumed what budget, the cost-per-invocation trend, and cost attribution by team. No manual instrumentation, no spreadsheet reconciliation.
A provider outage is handled by updating one configuration record to point at a fallback provider. No code changes, no deploys. Agents pick up the new profile on the next invocation, and the switch is logged in the audit trail.
An agent that handles regulated user data declares that sensitivity level in its manifest. The Policy Plane enforces the ceiling at the wrapper boundary — not in business logic where it could be forgotten. A new engineer cannot accidentally route regulated data through an unapproved agent.
Every invocation is stored in the audit trail and can be re-run against a new agent version before it is promoted to production. If outputs diverge beyond a threshold, the release is flagged. Regressions are caught before they reach users.
The Memory Plane persists user decisions (accepted strategies, risk flag acknowledgements, goals). On the next session, the agent retrieves this context and reasons about it — without the user needing to re-explain. Recommendations feel like a conversation, not a cold start.
11. Frequently Asked Questions
Is Agent Fabric open-source?
Agent Fabric is Praxion's internal platform, not currently published as open-source. The architecture and patterns described here reflect the production system that runs all Praxion AI agents.
Which LLM providers does Praxion use?
The model layer is provider-agnostic and supports multiple large language model providers. The active provider for each agent is a configuration setting, so providers can be changed without code changes.
How does Praxion prevent one user's agent from accessing another user's data?
Every platform service — memory, orchestration, audit trail, and state — is isolated per tenant. Cross-agent calls require a short-lived signed token, and the set of agents each agent may call is declared in configuration and enforced at the boundary.
What happens if an LLM provider goes down?
Updating one configuration record redirects all calls for the affected agent to a fallback provider — no code deploy required. The platform retries transient errors with backoff before a provider switch is needed.
Can Praxion agents take financial actions automatically?
No. Praxion agents analyze, model, and recommend — they do not execute trades, initiate transfers, or take irreversible financial actions. The Orchestration Plane's approval gate requires explicit user confirmation before any workflow step that surfaces an irreversible recommendation. The deterministic retirement engine computes all numerical outcomes; agents orchestrate and explain.
See It in Action
Every Praxion AI response is powered by an Agent Fabric agent — with full telemetry, deterministic grounding, and cross-session memory. Try Praxion AI with your own financial profile.