Solution Engineering — Data Federation

Data Federation Tradeoffs

Why storefront pages fetch the CMS before commerce, how caching makes that cheap, and when it's worth reordering the calls.

⚡ The short version


Remember waterfalls?

You ask for one thing. Then, only once that comes back, you ask for the next thing. Then the next. Each request politely waits in line for the one ahead of it, even when it has no reason to.

Think of three people in a coffee queue where nobody's order depends on anyone else's. They still wait their turn — not because they have to, but because of where they're standing. That's a waterfall.

On a single page, in a single component tree, waterfalls are easy to create by accident. You await something, and everything below that line now lives after it in time — whether it needed to or not.

This post is about one specific waterfall. The one that shows up on almost every storefront built on a federation layer like the Alokai Middleware. It's small. It's defensible. And it's worth understanding precisely, because the moment one assumption changes, it stops being small.

Let's look at a product listing page.


1 The Page

Here's the shape of a category page. I'll strip it down to the part that matters:

export default connectCmsPage(async function CategoryPage(props) {
  const { categoryData, productCatalog } = await buildCategoryPageData(props);
  return <CategoryGrid category={categoryData} products={productCatalog} />;
});

Two things are happening here, and they're easy to read right past.

The first is connectCmsPage. It's a higher-order component — a wrapper. Before your CategoryPage body ever runs, the wrapper awaits the CMS. It fetches the page definition (getPage, or getPersonalizedPage when personalization is on), maps the locale, sets up preview-mode rerendering, wraps everything in the live-preview shell. All of that cross-cutting machinery lives in one place, and every page type — category, product, home, a plain CMS page — gets it for free by wrapping.

The second is buildCategoryPageData. Inside it, the commerce data is fetched in parallel, the way you'd want:

const [categoryData, productCatalog] = await Promise.all([
  getCategory(categoryId),
  sdk.unified.searchProducts(searchProductsQuery),
]);

So getCategory and searchProducts race each other. Good. Nobody's waiting on nobody.

Except — look at where the two pieces sit relative to each other. The CMS fetch is in the wrapper. The commerce fetch is in the body. And the body is the wrapper's child. A child can't start until its parent has resolved.

The commerce Promise.all is nested under the awaited CMS call — so it can't start early
connectCmsPagewrapper
await getPageCMS — everything waits on this
CategoryPage body → buildCategoryPageData
Promise.all
getCategory
searchProducts

So even though the product search doesn't need a single byte from the CMS, it can't begin until the CMS call is done. The products wait on the CMS — not because they depend on it, but because of where they live in the tree.

TTFB = getPage + max(getCategory, searchProducts)

It would be tempting to call this a bug. It isn't. It's a tradeoff someone made on purpose, and to know whether it's the right one, you have to know how much that first hop actually costs.


2 Sometimes the Hop Is Free

Here's the thing about the CMS call: content doesn't change very often. A category's merchandised layout — the hero, the banners, the promo blocks — gets edited by a marketer maybe a few times a week. Which means getPage is almost always a cache hit. Something close and warm. Single-digit milliseconds.

So picture the timeline with real numbers. CMS is a cache hit at ~10ms. The product search is the slow one, because it's a real query against a search engine — call it ~200ms.

Cached CMS — the green CMS sliver barely registers
getPage
cache hit
~10ms ✓
getCategory
~120ms
searchProducts
~200ms
TTFB
≈ 210ms
0ms200ms400ms
👍
The waterfall is real — but the cost of it is ~10ms. Nobody can feel ten milliseconds, and you wouldn't rewrite a wrapper every page depends on to claw it back. Here, the design is just correct.

3 Where "Cheap" Comes From

I said getPage is "almost always a cache hit" and waved at "something warm." Let me not wave. Which cache, and why it's warm, is the whole ballgame — because personalization is about to switch some of these off, and you want to know exactly which lights go dark.

There isn't one cache in front of the CMS call. There are four, stacked, each catching what the one above it missed:

🌐Layer 1 · CDN edge — the whole page. HTML cached for ~15s. A hit here means the render never even runs.
🛰️Layer 2 · CDN edge — middleware responses. Out of the box, responses ship a ~5-minute cache default and a deploy rotates the keys — so a CDN can serve repeat reads without much staleness. It's just a default; tune the TTL or turn it off if your needs differ.
🔥Layer 3 · Redis — whatever you tell it to. Explicit, shared across instances, invalidated by tag the instant content changes.
⚛️Layer 4 · React cache() — one render. Just stops the same render from fetching the same thing twice.
A request falls through the layers until something is warm — or all the way to the origin
Visitor request
Layer 1 · CDN edge — full HTML s-maxage 15s
HIT → serve cached page · render skippedMISS ↓ SSR render
Layer 4 · React cache() per-render dedup
Layer 3 · Redis getOrSetCache TTL + tag invalidation · shared across instances
HIT → ~10ms warmMISS ↓
Layer 2 · CDN edge — middleware response ~5-min cache default · configurable
HIT → ~10ms warmMISS ↓
Origin · SAP commerce — full round-trip …though the CMS's own delivery API is itself a CDN (see below)

And there's a layer you don't even run: the CMS's own delivery API is already a CDN. Contentful's Content Delivery API is, in their words, "available via a globally distributed content delivery network (CDN)" that purges on publish; Contentstack serves its CDA from edge caches too, purging "only the changed content" when you publish. So for published pages, that "origin" at the bottom of the ladder is rarely a cold read — it's another edge hit, and it's very likely the real reason getPage is cheap in this demo even with no Redis installed. The documented exceptions line up exactly with the slow case: preview (a separate, non-CDN endpoint) and per-user content (nothing shared to cache) don't get that benefit.

The layer that makes "~10ms getPage" cheap on your own infrastructure — once you outgrow the provider's CDN, or go per-user — is Layer 3, Redis, and it's the one people forget. Alokai's Redis integration is an SDK module — sdk.redis.getOrSetCache(key, fetcher, { tags }). Unlike the edge caches, it's explicit (you wrap the call) and tag-based (you invalidate by meaning, not by clock). Wrap the CMS read in it and the first render populates Redis; every later render — across every SSR instance — reads from Redis until a CMS publish webhook invalidates the page: tag. You drop exactly the page that changed, the instant it changes.

💡
So "CMS reads are cheap" doesn't mean "the CMS is fast." It means there's a warm, shared, tag-invalidated Redis cache behind the federation layer. The cheapness is infrastructure you stood up — not a property of the CMS.

Which is exactly the setup for what comes next. Because the moment you personalize, you start switching these layers off, from the top down.


4 Sometimes the Hop Is the Whole Story

Now turn on personalization. But "personalization" isn't one thing — and the difference is the whole story for caching.

🎫 Segment / "experiences"
Still cached
A bounded set of variants. The cache key is the variant combination — shared by everyone in that segment.
👤 True 1:1 / per-user
Can't be cached
Effectively unique per visitor. Nothing shared to cache → origin round-trip.

This isn't a hand-wave — it's how the products actually work. Contentful's edge rendering says the list of assigned Experiences "can serve as a cache key so that subsequent visitors can be performantly served the same combination of Experience content." Contentstack likewise "caches the personalized web page against the request URL and the applied variants." The key is the segment, not the person — so a thousand shoppers in the same audience share one cached page.

So the real lever isn't "personalized or not." It's the cardinality of the cache key. Anonymous is one key. Segmented is N keys — fine while N stays small. Per-user is roughly one key per visitor, and that is the only case where the cache genuinely can't help. (Watch for combinatorial blow-up, though: stack enough audiences × geo × A/B tests and N quietly explodes, thinning the traffic per variant until the hit rate sags back toward the 1:1 cost.)

The expensive case below is that last one — true 1:1. When the page is a function of this user's identity, cart, and history, there's nothing shared to cache, so getPersonalizedPage becomes an origin round-trip with a session attached. Call it ~180ms, and it sits fully in front of the commerce calls.

🧊 Cached CMS
~10ms
The hop is a rounding error. Leave it alone.
👤 Per-user (1:1) CMS
~180ms
Same code — now it's almost half your TTFB.
True 1:1 personalization — same scale as before, but now the CMS bar dominates
getPersonalized
Page
~180ms
getCategory
~120ms
searchProducts
~200ms
TTFB
≈ 380ms
0ms200ms400ms

Same code. Same tree. Same waterfall. But the cost went from ~10ms to ~180ms, because the assumption underneath it — "the CMS call is basically free" — quietly stopped being true. The page didn't change. The cache behavior did.

True 1:1 personalization switches the shared layers off, top-down — you fall through to the origin
Per-user request
Layer 1 · edge HTML — DARK
Layer 2 · edge data — DARK
Layer 3 · Redis (shared) — MOSTLY DARKper-user = sliver of hit rate
Origin round-trip · ~180ms

The per-user page can't sit on a shared edge — layer 1, dark. The per-user data can't either — layer 2, dark. And Redis can't share a per-user entry; at best it caches per user, a sliver of the hit rate — layer 3, mostly dark. You fall all the way through to the origin. That isn't a metaphor; it's the literal mechanism.

🎫
Crucially, this only happens at true 1:1. With segment or "experiences" personalization, the cache key is the variant combo — so these layers stay lit and every shopper in a segment shares the cached page. Lights only go fully dark when the page becomes unique per person.
🪧
The waterfall didn't get worse. The thing it was hiding behind got slower. A design that's obviously fine and one that's obviously wasteful can be the exact same code — separated only by a config flag you flipped three sprints ago.

And there's a sharper point hiding here, one that should make you more cautious about parallelizing, not less. On a personalized page you may not even know you're fetching the right products yet. In B2B, contract pricing, or customer-group catalogs, the commerce call carried with a session can return different prices — and a different set of visible products — per shopper. If the personalized CMS is what decides the merchandising, then firing searchProducts in parallel risks fetching the wrong catalog, not merely a catalog you discard. Now you're not paying for wasted work, you're paying for incorrect work, plus the refetch — which spends the exact latency you parallelized to save. (In the code we looked at, the product query is built from props, not the CMS payload, so this only bites once personalization actually reshapes the product context. But the moment it does, it's the best argument on the page for staying CMS-first.)


5 What Federation Could Do Instead

The federation layer's whole job is to sit between the storefront and a pile of backends — CMS here, commerce there, search somewhere else — and decide how to talk to all of them. Which means it's exactly the layer that could fetch them together. If the three independent calls ran in one batch — by hoisting the commerce promises above the CMS await, or by adding a middleware endpoint that orchestrates all three server-side — you'd get this:

All three in one batch — the CMS bar now hides underneath the product search
getPersonalized
Page
~180ms
getCategory
~120ms
searchProducts
~200ms
TTFB
≈ 200ms
0ms200ms400ms
Before — serial
380ms
CMS hop stands in front of everything.
After — parallel
200ms
~180ms saved. The hop hides under the search.

So why isn't it just always done this way? Because "for free" is doing a lot of work in that sentence. It's free in latency. It is not free in everything else.


6 The Honest Ledger

Here's the whole trade laid out, because the latency win is real but it is not the only column.

DimensionParallelizing winsParallelizing costs
TTFB, cached CMS~10ms saved — i.e. nothingComplexity for no real payoff
TTFB, uncached / personalizedThe whole CMS hop (~180ms)
Preview / personalization machineryconnectCmsPage centralizes preview rerender, the personalized→default fallback, locale mapping. Parallelizing means duplicating that per route or threading un-awaited promises through the wrapper
ConsistencyEvery page fetches CMS the same way today. The PLP would now diverge from the pure CMS page — which must stay CMS-first, because its SKUs come out of the CMS payload (a real dependency)
CMS short-circuitToday, CMS can return notFound, or a fully custom override, and you skip the commerce fetch. Fetch in parallel and you've already fired searchProducts for a page you might not render
Wasted backend loadOn that 404 / override, the product search ran for nothing
Correctness under personalizationIf pricing or catalog visibility depends on the session/CMS context, a parallel searchProducts can fetch the wrong products, forcing a refetch that erases the latency win
Cache granularity (merged-endpoint variant)One page, one cache keyThe merged response inherits the shortest TTL — volatile price/stock — so you lose the cheap, independent, long-lived CMS cache (e.g. a tag-invalidated Redis entry that outlives the commerce data)
Failure isolationCMS failure and commerce failure are handled separately today. Merge them and you need explicit partial-failure handling or it's all-or-nothing
ReusabilityFix it once in the shared wrapper, every page benefitsOnly PLP-type pages qualify; the pure CMS page can't be parallelized at all
🧭
One row wins clearly — TTFB when the CMS call is slow. Every other row is neutral-to-negative. The CMS-first waterfall isn't laziness; it's the rent you pay for putting preview, personalization, locale, and short-circuiting all in one reusable place. When the CMS is a cache hit, that rent is ten milliseconds.

7 The Principle

A design that's correct under an assumption is only as correct as the assumption. The CMS-first ordering is right because CMS reads are cheap and cacheable — and we now know "cheap" means a warm, shared, tag-invalidated cache is sitting behind the federation layer. So the thing to actually do is not "parallelize everything" and not "leave it alone forever." It's: measure the assumption.

🎛️ Lever 1 — cheaper, try first
Warm the cache
Is Redis configured? Warm at p95? Tag-invalidated on publish? Far less invasive than touching the wrapper.
🔀 Lever 2 — only if still slow
Reorder fetches
Hoist commerce above the CMS await, or add an orchestration endpoint — once personalization keeps the hop slow.

A cold or absent Redis layer will make the CMS hop look expensive no matter how you order the calls — and so will a page that's truly per-user when it didn't need to be. Check the cache-key cardinality before you blame the waterfall: if the page is segmented, it's cacheable and the hop should be cheap; if it's genuinely 1:1, no ordering trick brings the shared caches back. Only once the cache is warm, the page is as shared as it can be, and the hop is still slow does reordering start to pay. Watch getPage / getPersonalizedPage latency at p75 and p95, and let the numbers tell you which world you're in.

🎯
The waterfall was never the problem. The unexamined assumption was.
An app that fetches everything up front does the most work. A federation layer that knows what each page actually depends on does the least.

8 Tips — Do's & Don'ts

Most cache problems come down to something small — a single header or key that quietly flips a shareable response into a per-user one. Here's what keeps the layers warm, and what silently turns them off.

✅ Do❌ Don'tWhy it matters
Set cookies on a separate, uncached request, and strip Set-Cookie from pages you want shared-cached. Return Set-Cookie on a page you expect the CDN to cache. Most CDNs treat Set-Cookie as a cache-killer and return a BYPASS on every request.
Keep Vary minimal (e.g. Accept-Encoding); use Cache-Control: private for genuinely personal pages. Add Vary: Cookie (or Vary: Authorization). Every unique cookie value becomes its own cache entry — a well-known hit-rate killer.
Key shared pages on the segment / variant combo, and route authenticated traffic to no-store (the demo already does this for /cart, /checkout, /my-account). Put session ids or auth tokens into the URL or cache key. Responses carrying Authorization must not be stored in a shared cache; per-user keys collapse the hit rate to ~1 entry per visitor.
Strip or normalize non-semantic params (utm_*, fbclid, gclid) and round volatile inputs before they reach the key. Let unbounded query strings flow into the cache key untouched. Each unique string is a separate entry. Contentful notes even exact geo coordinates "can't take advantage of our caching layer" — round them to 2–3 decimals.
Invalidate by tag / surrogate key — Redis getOrSetCache({ tags }) plus the CMS's purge-on-publish. Rely on short TTLs alone to paper over staleness. Tag purge drops exactly what changed; Contentful and Contentstack purge only the changed entry on publish, leaving the rest warm.
Add stale-while-revalidate (and stale-if-error) so refreshes happen in the background. Ship hot pages as max-age=0, must-revalidate with no SWR. Otherwise every expiry blocks a real user on the origin instead of serving slightly-stale instantly.
Prefer bounded segments / "experiences" so the page stays shared-cacheable, and keep the variant space small. Make a page truly 1:1 unless it has to be — or let audiences × geo × tests multiply unchecked. Segment variants share one cached entry (key = variant combo); true per-user, or a combinatorial blow-up of variants, forfeits the shared layers.
Serve shoppers from the delivery API (CDN-backed); keep preview for editors only. Route live shopper traffic through the Preview API. Preview is a separate, non-CDN endpoint (preview.contentful.com) and won't cache — it's for drafts, not scale.
Verify real cache status — X-Cache: HIT/MISS, Age — at p95 in production. Assume "it's cached" because you set a header. An upstream cookie or Vary can silently bypass the cache; CDNs expose HIT/MISS headers precisely so you can confirm.
Tie the cache-busting id to the release (GIT_SHA) so deploys rotate keys cleanly. Reuse a static busting id across deploys. A stale id serves yesterday's federated data after a release; a per-release id invalidates it the moment you ship.
🧪
The through-line: a cache is only as good as its key. Almost every "why isn't this cached?" comes down to something — a cookie, a token, a stray query param, a Vary — sneaking per-user identity into a key that should have been shared.

9 What Not to Cache in Commerce

Some commerce data should never sit in a shared cache — not because of latency, but because caching it is wrong: it serves one shopper another's data, oversells stock you don't have, or quotes a price the customer was never entitled to. Alokai's own guidance splits the offenders into two buckets — time-sensitive and session-specific data — and both are commerce-specific.

❌ Never shared-cacheCommerce risk if you do✅ Do instead
Cart contents The classic leak: shopper B is served shopper A's cached cart, or a stale total. A privacy breach, not just a bug. private, no-store; fetch the cart client-side after load.
Checkout & payment pages Contain addresses, payment details, and user-specific tax/shipping. Caching them risks exposing data and breaking PCI compliance. private, no-store — always origin. (The demo marks /checkout exactly this way.)
Account, order history, wishlist Pure per-user PII; one cache key per shopper at best, cross-user leakage at worst. no-store; render from the session on the client.
Live stock / inventory Over-cache and you show "in stock" on a sold-out item — overselling, cancellations, angry customers. Very short TTL, or fetch fresh at the PDP/cart; don't bake stock into long-cached HTML.
Customer-group / contract (B2B) pricing & promotions The wrong shopper sees a price or discount they were never entitled to — a margin and, sometimes, legal problem. Cache the shared shell with base price; resolve the entitled price per-segment key or client-side.
Personalized recommendations / recently viewed Per-user by definition; a shared cache serves the last visitor's "Products you may like." Defer to a client-side fetch with a skeleton to hold the layout.

And here's the part that's really about federation, not just caching: the cleanest rule isn't to maintain a per-endpoint blocklist — it's to not federate session- or time-sensitive data into the cached page at all. Keep the SSR payload identical for every shopper, and pull the personal, volatile bits in a second step on the client (Alokai's docs show exactly this — a useEffect fetch behind a skeleton). The shared shell caches beautifully; the per-user fragment never touches the cache. The same logic kills the tempting "one merged endpoint for the whole page" idea: fuse volatile price/stock onto stable content and the combined entry inherits the shortest TTL, so you've cached the slow-changing content for as long as the fastest-changing field allows — which is to say, barely.

🛒
Cache the page everyone shares; federate-in what's personal, late. A storefront isn't one document — it's a stable, shareable shell with a few volatile, per-user holes punched in it. Cache the shell hard; fill the holes per request.

Built on the Alokai Middleware. The numbers here are illustrative — yours are in your traces. Go look at them.