gdymind's blog
  • Home
  • Archives
  • Categories
  • Tags
  • About

Memory usage breakdown during Training

1. Memory Composition Model Parameters Intermediate Activations (Forward pass) will be used to calculate gradiants during backward Gradients (Backward pass) Optimizer States 2. Static Memory (Weigh
2026-01-25
#Training

JAX 101

Given the length of the official JAX tutorial, this note distills the core concepts, providing an quick reference after reading the original tutorial. High-level JAX stack Source: Yi Wang’s linkedin
2025-12-22
#JAX #TPU

Jeff Dean & Gemini team QA at NeurIPS ‘25

Q1: are we running out of pretraining data? are we hitting the scaling law wall? I don’t quite buy it. Gemini only use a portion of the video data to train. We spent plenty of time on filtering the r
2025-12-05
#meetup #LLM #Gemini

Pytorch Conference & Ray Summit 2025 summary

1. OverallMany inference talks, but more RL talks. RL RL101 3 RL challenges: training collapses, training slow, hardware errors New frameworks / API: Tinker, SkyRL, Slime, SGLang’s Slime-based fr
2025-11-11
#RL #LLM inference #meetup #Training

Intro to PPO in RL

1. From Rewards to Optimization In RL, an agent interacts with an environment by observing a state , taking an action , and receiving a reward . In the context of LLM, state is the previous tokens, wh
2025-11-09
#RL #Training

Truncated Importance Sampling (TIS) in RL

truncated importance sampling (tis) this blog is from feng yao (ucsd phd student)’s work. i added some background and explanations to make it easier to understand. slides: on the rollout-training mis
2025-11-08
#RL

speculative decoding 02

Speaker: Lily Liu Working at OpenAI Graduated from UC Berkeley in early 2025 vLLM speculative decoding TL 1. Why is LLM generation slow? GPU memory hierarchy. A100 example: SRAM is super fast (19 TB
2025-09-19
#LLM inference #vLLM

vLLM 05 - vLLM multi-modal support

Speaker: Roger Wang 1. Overview large multi-modal models (LMMs) most SOTA Large Multimodal Models leverage a language model backbone with an encoder for a non-text modality. E.g., LLaVA, Qwen VL, Qwen
2025-06-06
#LLM inference #vLLM

Perplexity DeepSeek MoE

Speaker: Lequn Chen Sources https://www.perplexity.ai/hub/blog/lower-latency-and-higher-throughput-with-multi-node-deepseek-deployment https://github.com/ppl-ai/pplx-kernels 1. SetupMultiple nodes t
2025-05-16
#MoE #LLM inference

MoE history and OpenMoE

IntroThis article is compiled from a livestream.The guest speaker is Fuzhao Xue, a Google Deepmind Senior Research Scientist and the author of OpenMoE Main research areas: Gemini Pretraining, Model A
2025-04-25
#MoE #LLM inference
12

Search

Hexo Fluid
visited times unique visitors: