26 posts in total
2026
DeepSeek V4 attention: how it handles longer context (English video)
Rotary Position Embedding (RoPE) deep dive
vLLM-Omni deep dive
Pallas examples by Sharad Vikram (Pallas author)
jax.jit, torch.compile & CUDA graph
KV cache in sliding-window attention
XLA02 - shapes, layout & tiling
XLA01 - architecture & workflows
Knowledge Distillation 101
GPU mode - lecture2 - CUDA 101