Learning to Evict from Key-Value Cache

The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression…

Executive Summary

The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce cost but rely on heuristics, such as recency or past attention scores, which serve only as indirect proxies for a token’s future utility and introduce computational overhead. We reframe KV cache eviction as a reinforcement learning (RL) problem: learning to rank tokens by their predicted usefulness for future decoding. To this end, we introduce KV Policy (KVP), a framework of…

Key Insights

Key takeaways from this article

Technical Deep Dive

Why This Matters

This article provides valuable insights into…

Original Article

This post was automatically curated from RSS. Published on 2026-02-23T17:00:38.980Z.

Learning to Evict from Key-Value Cache

Executive Summary

Key Insights

Technical Deep Dive

Why This Matters

Join Newsletter

Written by Cui Follow

Learning to Evict from Key-Value Cache

Executive Summary

Key Insights

Technical Deep Dive

Why This Matters

Related Resources

Join Newsletter

Written by Cui Follow