Engineering Insights

The Cache Feedback Gap: Why Your Prefetcher Doesn't Learn

Jonathan Corners | January 2026

Server push and predictive caching have existed for years. Why do caches still waste bandwidth on data nobody uses?

The One-Way Street

HTTP/2 Server Push was supposed to revolutionize web performance. The server, knowing what the client would need next, could proactively push resources before they were requested. CSS files. JavaScript bundles. Critical fonts.

The problem? The server never found out if any of it was used.

Server: "Here's bundle.js, you'll need it"
Client: *already has bundle.js cached*
Server: *pushes 500KB anyway*
Client: *discards it*
Server: *does the same thing next time*

This is why HTTP/2 Server Push is effectively dead. Chrome removed support in 2022. HTTP/3 doesn’t include it. The feature that was supposed to eliminate round-trips became a bandwidth-wasting liability.

The core issue wasn’t the concept of proactive delivery. It was the absence of feedback.


The Prefetch Graveyard

Server Push isn’t alone in this graveyard. Consider how prefetching works across the stack:

CDN Prefetching: CDNs analyze access patterns and speculatively warm edge caches. But when they prefetch assets that never get requested, the only signal is a cache eviction log that nobody reads. The prefetcher doesn’t adapt.

GraphQL DataLoader: DataLoader brilliantly batches requests within a single execution context. But it doesn’t learn across requests. If 90% of users who fetch an author also fetch their books, DataLoader won’t prefetch books proactively. It waits to be asked.

Database Query Caching: Query caches store results for repeated queries. But they don’t anticipate related queries. Fetching a user doesn’t pre-warm the cache for that user’s orders, even if the access pattern is predictable.

Browser Resource Hints: <link rel="prefetch"> tells the browser what to fetch next. But the browser doesn’t report back whether those resources were actually used. The developer flies blind.

All of these systems share a common flaw: they’re open-loop. They make predictions. They don’t learn from outcomes.


What’s Missing: The Feedback Loop

Imagine if your prefetcher worked like a recommendation system. Every prediction would have an identifier. Every outcome would be tracked. The system would learn which predictions were useful and which were waste.

This is the insight behind the HINT protocol.

The missing piece is a standardized feedback signal that reports:

  • HIT: The prefetched data was used. Latency was saved.
  • PARTIAL_HIT: Some of the prefetched data was used.
  • MISS: The data wasn’t prefetched, and was subsequently needed.
  • EVICTED_UNUSED: The data was prefetched, cached, and evicted without ever being accessed.
  • STALE_HIT: The prefetched data was used but was stale.
  • ERROR: The prefetch operation failed.

With this taxonomy, a Predictive Caching Engine (PCE) can correlate every proactive decision with its outcome. The EVICTED_UNUSED signal is particularly valuable—it identifies pure waste.


The HINT Approach

HINT introduces a three-message pattern:

1. Hint (Client → Server)

The client declares what it anticipates needing:

{
  "scope": { "tenant": "app-123" },
  "ttl": 30000,
  "anticipated_fields": ["user.profile", "user.orders"]
}

This is a hint, not a demand. The server is free to ignore it, partially fulfill it, or use it to inform broader caching decisions.

2. Proactive Payload (Server → Client)

The server prefetches the anticipated data and tags it with a stable prediction identifier:

{
  "prediction_id": "pred-abc-123",
  "proactive": {
    "user.profile": { ... },
    "user.orders": [ ... ]
  }
}

The prediction_id is deterministically derived from the hint, ensuring that replays produce identical identifiers for correlation.

3. Feedback Signal (Client → Server)

The client reports what happened:

{
  "outcomes": [{
    "prediction_id": "pred-abc-123",
    "outcome": "HIT",
    "latency_saved_ms": 45
  }]
}

Now the server knows. The prediction worked. Latency was saved. This pattern should be reinforced.


Budget Gating: Prefetch with Guardrails

Predictive caching without constraints is a denial-of-service vector. A malicious or buggy client could hint at terabytes of data.

HINT addresses this with budget gating. Before executing a proactive plan, the PCE checks against configurable limits:

  • Concurrency: Maximum simultaneous prefetch operations
  • Byte quota: Maximum bytes that can be prefetched per request
  • CPU time: Maximum computation for pre-computation tasks
  • Cost: Maximum monetary cost (for cloud resources)
  • Energy: Maximum energy budget (for edge devices)

If a budget is exceeded, the plan is downgraded (fewer fields) or canceled entirely. The client receives a signal indicating budget constraints, allowing it to adjust future hints.

Negative prefetch is equally important. Hints can explicitly exclude fields:

{
  "anticipated_fields": ["user.*"],
  "exclude": ["user.password_hash", "user.payment_methods"]
}

This prevents sensitive data from being proactively cached, even if the pattern would otherwise match.


Where This Matters

Closed-loop predictive caching delivers outsized value in specific contexts:

High-Latency Links: Satellite connections, intercontinental requests, and mobile networks where round-trip times are measured in hundreds of milliseconds. Every avoided round-trip is meaningful.

Expensive Backends: LLM APIs charging per token. Database queries against large datasets. Third-party services with rate limits. Prefetching the right data is valuable; prefetching the wrong data is costly.

Multi-Tenant Systems: SaaS platforms where access patterns vary by tenant. Scope isolation ensures that one tenant’s hints don’t pollute another’s cache. Feedback enables per-tenant model adaptation.

Compliance-Sensitive Environments: Healthcare, finance, and government systems where data residency matters. HINT’s scope boundaries and negative prefetch support enforcement of data handling policies.


The Standard

HINT is designed as an open, transport-agnostic standard. It works over HTTP (via headers), GraphQL (via extensions), and gRPC (via metadata). The message formats are JSON-serializable with well-defined semantics.

The full specification is available at /hint/. Key design principles:

  1. Correlation is deterministic: The prediction_id is derived from the hint content, enabling reliable matching of outcomes to predictions.

  2. Feedback is batched: Clients aggregate outcomes and send them periodically, not per-request, to minimize overhead.

  3. Scope is first-class: Every hint includes a scope definition that bounds its effects. This enables multi-tenant safety and privacy preservation.

  4. Budgets are mandatory: Implementations must enforce budgets. Unbounded prefetching is not compliant.

  5. Adaptation is optional: The protocol specifies feedback but not how to use it. Simple implementations can log outcomes. Sophisticated ones can train ML models.


Closing the Loop

The cache feedback gap has persisted because feedback was treated as an afterthought. Server Push didn’t include it. Prefetch APIs don’t report outcomes. DataLoader doesn’t learn across contexts.

HINT makes feedback a first-class protocol concern. Every prediction gets an identifier. Every outcome gets reported. The system improves.

If you’re building caching infrastructure, consider whether you’re operating open-loop or closed-loop. The difference is the difference between a system that guesses and one that learns.


HINT is an open standard developed by Voxell. The specification is available at /hint/. Voxell products including ARC and ART implement HINT natively.

Author

Jonathan Corners - Founder, Voxell. I build GPU-native infrastructure for real-time AI systems.

If you're working on latency + consistency problems, I'd like to hear about it.

Contact 24h reply • NDA ok • No IP needed

Ready to see this in practice?

Get hands-on with Voxell Coherence.

Request Access