24 posts in total
2026
vLLM-Omni deep dive
Pallas examples by Sharad Vikram (Pallas author)
jax.jit, torch.compile & CUDA graph
KV cache in sliding-window attention
XLA02 - shapes, layout & tiling
XLA01 - architecture & workflows
Knowledge Distillation 101
GPU mode - lecture2 - CUDA 101
Pallas 101 - multi-backend kernel for JAX
5D parallelism in LLM training