Archives

07-06

Ragged Paged Attention on TPU (Video + Blog)

06-29

DSpark: DeepSeek's new speculative decoding work (Video + Blog)

04-27

DeepSeek V4 attention: how it handles longer context (Video + Blog)

04-19

Rotary Position Embedding (RoPE) deep dive

04-04

vLLM-Omni deep dive

03-08

Pallas examples by Sharad Vikram (Pallas author)

03-07

jax.jit, torch.compile & CUDA graph

03-02

KV cache in sliding-window attention

02-26

XLA02 - shapes, layout & tiling

02-25

XLA01 - architecture & workflows