Why we rebuilt Atom’s chat streaming architecture

How and why a streaming chat architecture matters when your AI isn't just answering questions but acting on your behalf across tools and systems.

Updated:

April 28, 2026

Authored by:

Balaji Saravanan

Engineering @ Atomicwork

If you've used Claude, you've seen what good streaming feels like. You type a question, and the response starts arriving almost immediately. Words appear in real time. You can read, react, and follow the reasoning as it forms without any dead gaps between asking and receiving a response.

That experience set the bar for us at Atomicwork. We wanted Atom, our universal agent for IT, to feel the same way. But our problem is harder than rendering text tokens progressively. Atom doesn'tjust generate responses. It looks up policies, checks access, runs actions across tools, and coordinates with other specialist agents. All of that takes time. And the way we handle that time shapes whether people trust Atom or decide to navigate complicated portals by themselves.

So we rebuilt our streaming architecture from the ground up as a structural change in how Atom communicates with the people it works alongside.

I’ve covered the ‘why’ and the top-level methodology in this article and shared a detailed technical post about the design decisions behind our new streaming system, covering the event model, state synchronization, and failure handling. You can read the full engineering deep-dive here.

What Atom actually does in a conversation

Most AI chat products have a simple loop: you type, the model generates text, you read it. The hard part is making the text good.

Atom's loop is different. When an employee asks "Can I get access to the design system repo?", Atom checks who's asking, looks up their role, finds the access policy for that repository, determineswhether approval is needed, and if not, provisions the access right there. Then it tells the employee what happened.

A single question triggers a sequence of actions with intermediate states like:

Reasoning
Tool calls or
Partial results.

All of those are meaningful to the person waiting. However, showing "thinking..." for twelve seconds while that happens in the background wastes a perfectly good conversation.

And it gets more complex. Atom is evolving from a single universal agent into a coordinated team of specialist coworkers: a hardware specialist, a software specialist, a security specialist, an HR ops agent. A coordinator manages handoffs between them. A single employee question might touch two or three specialists before a resolution lands. Which means there will be multiple agents, multiple actions, happening across seconds or minutes.

If your streaming architecture can only handle "append text to a message," you're stuck. You can't show the employee that their request moved from the coordinator to the software specialist. You can't surface that an approval was triggered. You can't progressively render a structured response that includes both a text explanation and an access confirmation.

We needed streaming that could carry all of that.

Why the old approach stopped working

Our original implementation was the same one everyone starts with: open a connection, push text tokens as they're generated, render them in the UI. It worked fine when Atom was primarily a conversational interface.

It stopped working when we hit real-world conditions:

Reconnects introduced duplicate text
Out-of-order delivery corrupted message structure
We couldn't cleanly reconcile partially streamed structured outputs, like a table of software licenses or a status card showing an approval workflow, because the format assumed everything was just text.
The deeper problem was a framing issue. We were treating Atom's output as responses. But responses arrive complete. What Atom produces is a series of state transitions: the chat is a view into a system whose state evolves continuously.

Once we saw it that way, the architecture we needed became much clearer.

The shift: Modeling chat as evolving state

The core idea is to stop thinking about messages and start thinking about items whose state evolves through a stream of updates.

Each update describes a change: append some text, update a status, add a structured element, replace a state entirely. This maps naturally to how AI systems produce output. Incrementally, non-linearly, and often across multiple items at once.

On the client side, we normalize state into a map of items by ID with an ordered list for rendering. When Atom streams twenty updates per second during multi-agent coordination, only the individual items being updated re-render. The rest of the chat stays untouched.

We also introduced the concept of turns: a single interaction cycle from user input through system processing to completion signal. A turn isn't tied to a single message. One turn can produce multiple items, multiple update types, structured outputs alongside text.

The turn gives us a boundary:

When does a response start and end?
When should the input field re-enable?
How do we ignore stale updates from a previous interaction?

Without turns, streaming is an unbounded flow. With them, it's structured progress.

Why this matters to customers using Atom

Streaming isn't a nice-to-have animation layer on top of a real system. It's how the system communicates trust.

When customers ask Atom to provision access and the response starts appearing in 200 milliseconds, even if the full resolution takes 10 seconds, they know the system heard them. They can see progress. They can see Atom checking the access policy, then confirming the provisioning, then summarizing what happened. Each of those intermediate states is a signal: "I'm on it, here's where I am."

Compare that to a loading spinner and a 10-second wait. Same outcome. Completely different experience.

This is especially important for Atomicwork's end users, the employees across an organization who interact with Atom through web chat, a browser extension, Microsoft Teams, or Slack. These are people in the middle of their workday who need a password reset or a software license and want to get back to what they were doing. Every second of dead time is a second they spend wondering whether to just file a manual ticket instead.

Shortening perceived time-to-answer is one of the biggest lever for adoption we've found.

Beyond the immediate UX win, it's also the foundation for the multi-agent experience we're building toward. When Atom's coordinator hands a request to the security specialist, and the security specialist triggers an access review, and the result flows back through the coordinator to the employee, that interaction involves three agents, multiple tool calls, a structured approval flow, and a conversational summary.

Streaming all of that coherently to the UI requires exactly the kind of event model and turn-based coordination we've built as organizations bring in more specialized and collaborative AI agents into the workforce.

What this costs

I want to be honest about tradeoffs.

This architecture is more complex than what we had before. The event model requires discipline: every update must be well-typed and sequenced. Client-side state management is more sophisticated. We spent real engineering time on failure handling, including deduplication, timeout-based completion synthesis, and periodic snapshots to recover from partial corruption.

The complexity is deliberate. It buys us correctness under unreliable networks, extensibility for non-text outputs, consistency between what's streamed live and what's persisted, and the ability to scale with event frequency instead of collapsing under it.

If you want to dive deeper into the "how," covering the event model, the state synchronization approach, reconnection handling, partial state corruption, and the design of turns, my detailed post walks through the entire system design.

If you're building something similar, or if you're curious about how AI-native platforms think about the infrastructure behind chat, I think you'll find it useful.

Get a demo