MIT researchers developed Attention Matching, a KV cache compaction technique that compresses LLM memory by 50x in seconds — ...
Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory ...
The US Court of Appeals for the Federal Circuit, addressing the issue of whether certain factual and legal conclusions relating to obviousness were supported by substantial evidence, held that the ...