5D parallelism in LLM training source: The Ultra-scale Playbook 0. High-level overview Targed on large-scale (like 512 GPUs) training Tradeoff among the following factors memory usage: params, optimizer states, gradients compute ef 2026-02-07 #Training
Memory usage breakdown during Training 1. Memory Composition Model Parameters Intermediate Activations (Forward pass) will be used to calculate gradiants during backward Gradients (Backward pass) Optimizer States 2. Static Memory (Weigh 2026-01-25 #Training
JAX 101 Given the length of the official JAX tutorial, this note distills the core concepts, providing an quick reference after reading the original tutorial. High-level JAX stack Source: Yi Wang’s linkedin 2025-12-22 #JAX #TPU
Jeff Dean & Gemini team QA at NeurIPS ‘25 Q1: are we running out of pretraining data? are we hitting the scaling law wall? I don’t quite buy it. Gemini only use a portion of the video data to train. We spent plenty of time on filtering the r 2025-12-05 #meetup #LLM #Gemini
Pytorch Conference & Ray Summit 2025 summary 1. OverallMany inference talks, but more RL talks. RL RL101 3 RL challenges: training collapses, training slow, hardware errors New frameworks / API: Tinker, SkyRL, Slime, SGLang’s Slime-based fr 2025-11-11 #RL #LLM inference #meetup #Training
Intro to PPO in RL 1. From Rewards to Optimization In RL, an agent interacts with an environment by observing a state , taking an action , and receiving a reward . In the context of LLM, state is the previous tokens, wh 2025-11-09 #RL #Training
Truncated Importance Sampling (TIS) in RL truncated importance sampling (tis) this blog is from feng yao (ucsd phd student)’s work. i added some background and explanations to make it easier to understand. slides: on the rollout-training mis 2025-11-08 #RL
speculative decoding 02 Speaker: Lily Liu Working at OpenAI Graduated from UC Berkeley in early 2025 vLLM speculative decoding TL 1. Why is LLM generation slow? GPU memory hierarchy. A100 example: SRAM is super fast (19 TB 2025-09-19 #LLM inference #vLLM
vLLM 05 - vLLM multi-modal support Speaker: Roger Wang 1. Overview large multi-modal models (LMMs) most SOTA Large Multimodal Models leverage a language model backbone with an encoder for a non-text modality. E.g., LLaVA, Qwen VL, Qwen 2025-06-06 #LLM inference #vLLM
Perplexity DeepSeek MoE Speaker: Lequn Chen Sources https://www.perplexity.ai/hub/blog/lower-latency-and-higher-throughput-with-multi-node-deepseek-deployment https://github.com/ppl-ai/pplx-kernels 1. SetupMultiple nodes t 2025-05-16 #MoE #LLM inference