April 17, 2026

Cloudflare Workers AI brownouts: the 7-minute window we see before you do.

Why the fast-path for Cloudflare Workers AI model @cf/baai/bge-m3 degrades narrowly and often — and how our community + SDK telemetry surfaces it an average of 7 minutes before the status page acknowledges the drift.

By CheckUpstream Team

Cloudflare Workers AI brownouts: the 7-minute window we see before you do.

Cloudflare Workers AI is, on paper, one of the most elegant edge-AI surfaces available. Drop @cf/baai/bge-m3 into your Worker and get embeddings at a sub-50ms p50, with worldwide distribution included.

In practice, the fast path on specific models — notably @cf/baai/bge-m3 and @cf/meta/llama-3.1-8b-instruct — degrades narrowly and often. Our engine has tracked 63 distinct cloudflare incidents in 2025 so far. Of those, 22 were scoped specifically to Workers AI. Of those 22, eleven were sub-component outages that never rolled up to a full-service Cloudflare status banner.

That's the "brownout" shape: a narrow sub-service degrades for 5–15 minutes, the community notices immediately, and the official status page either never posts or posts late.

What the brownout looks like in the engine.

On 2025-11-04 at roughly 14:07 UTC the engine saw:

T+0:00 — Three community posts on Bluesky + HN in a 90-second window: "bge-m3 inference calls returning 502 on Workers."
T+0:01 — Aggregated cloudflare_radar signal went yellow on the "North America" region for Workers AI.
T+0:03 — Our scope extractor identified @cf/baai/bge-m3 as the specific affected component from the incident-update text.
T+0:03 — The community composite score crossed our PROMOTION_THRESHOLD (0.6) — incident promoted, fan-out analyzer started.
T+0:03 — Customer projects with a Workers AI + bge-m3 dependency got their impact-verdict analysis: 17 projects degraded, 41 operational, 3 down.
T+0:07 — Cloudflare's status page posted: "Investigating elevated errors on Workers AI."
T+0:18 — Cloudflare recovered.

Seven minutes of lead. For a RAG pipeline that's re-embedding user queries on every request, seven minutes is the difference between a clean fall-through to cached embeddings and an SLO breach.

Why it matters more than it looks.

Workers AI brownouts are particularly dangerous because the surface is so easy to adopt:

No self-hosted infra. Almost every team using it has adopted it as a hosted primitive with no thought given to failover.
The Workers AI SDK does not expose circuit-breaker helpers. If the upstream is returning 502s, your code keeps making requests.
Embeddings are often on the hot path of request handling (semantic search, RAG lookup, recommendation scoring) — not in a background job where latency can absorb.

The net effect: a 7-minute narrow brownout on @cf/baai/bge-m3 manifests as 7 minutes of failed requests in a product where embeddings are load-bearing, and almost nothing happens gracefully unless the developer already planned for this.

The shape of a good defense.

The fix class is almost always the same:

Circuit-break the dependency. When the upstream is failing beyond some threshold, stop trying and fall through to a degraded-but-working path. p-retry + opossum or the Nova circuit-breaker package are both cheap to add.
Cache aggressively on the read path. If an embedding for a query has been computed in the last hour, use it. Embeddings don't change, but the cache layer often isn't built because it wasn't thought about.
Track the sub-service, not the service. "Cloudflare is up" is a useless signal if your code only uses Workers AI. Our engine treats Workers AI, R2, D1, Durable Objects, KV, and the Worker runtime as independent signals that happen to share a parent.

The broader pattern.

Brownouts are not a Cloudflare problem. They're the default shape for modern composable platforms. AWS Lambda has sub-region brownouts. Vercel's Fluid Compute has had bailout-to-Node brownouts. OpenAI has model-specific tier-2 brownouts. The status pages of these platforms — by design — consolidate to the parent service.

Our job is to look at the sub-service level, correlate across four signal streams, and emit a narrow, specific verdict — preferably faster than the vendor's own status page would. Seven minutes is the median for Workers AI. For the full stack of tracked services, the median is six minutes.

Wondering which of your upstream deps have a sub-service you'd be blind to? Drop your repo URL into checkupstream.com/audit and the engine will return every mapped service plus the specific Workers AI / Lambda / model / region components your code actually uses.