Scaling AI token streams with Centrifugo
In a previous post we showed how to stream LLM responses through Centrifugo — backend receives tokens from an LLM API, publishes them to a channel, browser subscribes and renders text as it arrives.
That post covered the basics. This one looks at what Centrifugo adds beyond just delivering tokens from point A to point B — automatic recovery after disconnects, horizontal scaling, transport flexibility, and multi-tab synchronization backed by a database. We built an interactive playground demo that demonstrates the concepts – you can run it locally and see every feature in action.
The playground
The source code is on GitHub. Run it:
git clone https://github.com/centrifugal/examples.git
cd examples/v6/scale-ai
docker compose up --build
Open http://localhost:9000.
The playground simulates an AI token stream without requiring an actual LLM API. You control the token rate, total token count, and other parameters. The backend picks a random AI-related question, generates random words as the "answer", and publishes them to Centrifugo — the delivery path is identical to a real LLM integration.
The architecture:
┌─────────────────────┐
│ nginx :9000 │
└──┬───────────────┬──┘
│ │
▼ ▼
