DeepSeek V4 attention: how it handles longer context (English video)
I made a video for this. Here is the Youtube link. 1. Long context challenges 2. From MLA, DSA to HCA/CSA 3. MLA: low-rank KV cachePurpose: reduce KV cache size Method: compress KV into a low-