Back to blog
#Cloudflare#Streaming#Infrastructure

Streaming LLM responses reliably on Cloudflare Workers

vooy runs its agents, landing, and API entirely on Cloudflare. Here's how we stream long LLM responses without stalls at the edge — worker lifetimes, backpressure, and the OpenNext story.

DDaniel ChoInfrastructure3 min read
On this page

Nearly everything at vooy — the agent runtime, the API, and the blog you're reading right now — runs on Cloudflare. The edge is fast and cheap, but it was a poor match out of the box for our workload: "an LLM stream pushing tokens for tens of seconds." This is the record of closing that gap.

Why the edge

The reasoning is simple. Users are scattered worldwide, and messenger webhooks are latency-sensitive. Receiving at the edge visibly cuts time-to-first-token (TTFT), and cold starts are an order of magnitude faster than containers.

In return, the edge runtime imposes constraints: only part of the Node API is available, there's a CPU time budget, and above all a worker's lifetime is bound to the request.

Streaming and worker lifetime

The naive version looks like this:

naive.ts
export default {
  async fetch(req: Request) {
    const llm = await model.stream(prompt); // can take 30 seconds
    return new Response(llm.toReadableStream());
  },
};

The catch: the worker has to stay alive until the response body stream closes. If you kick off other async work (log shipping, memory updates) and finish the response first, that work evaporates with the worker. The fix is to explicitly extend background work's lifetime with waitUntil.

stream.ts
export default {
  async fetch(req: Request, env: Env, ctx: ExecutionContext) {
    const { readable, writable } = new TransformStream();
    const writer = writable.getWriter();
 
    ctx.waitUntil(
      (async () => {
        for await (const chunk of model.stream(prompt)) {
          await writer.write(encoder.encode(chunk));
        }
        await persistTurn();   // safely wrap up even after the response ends
        await writer.close();
      })()
    );
 
    return new Response(readable, {
      headers: { "content-type": "text/event-stream" },
    });
  },
};

Splitting read and write with a TransformStream lets you return the response immediately while generation continues in the background.

Backpressure and stalls

The bug we hit most often at the edge was the slow consumer. When a user on a mobile network reads slower than we write tokens, the buffer balloons and slams into the memory limit.

The key is to always await the promise returned by writer.write(). That's your backpressure signal. Skip the await and a fast producer overwhelms a slow consumer.

If you ever feel tempted to drop the await from await writer.write(...), you're almost certainly planting a bug.

Putting Next on Workers with OpenNext

The blog and landing are Next.js. To run them on Cloudflare Workers we use OpenNext's Cloudflare adapter. It transforms the build output into something a worker can execute, and static assets ride on Cloudflare's cache.

The crux is what can be made static. This blog's posts and changelog freeze markdown into HTML at build time, so nothing touches the filesystem at runtime. As a result, post pages are pure static assets served instantly from the edge cache.

Observability

Debugging at the edge is hard. Workers die quickly and stack traces are short. We attach a trace ID to every turn and ship structured logs asynchronously inside waitUntil — without blocking the streaming response itself, but still recording what happened.


Streaming LLMs at the edge sits in "it works, but tread carefully" territory. Get worker lifetimes, backpressure, and the static/dynamic boundary right, and you can serve smooth responses on infrastructure that's faster and cheaper than containers. For us, that trade was the right one.

Keep reading