gdymind's blog
  • Home
  • Archives
  • Categories
  • Tags
  • About

JAX 101

Given the length of the official JAX tutorial, this note distills the core concepts, providing an quick reference after reading the original tutorial. High-level JAX stack Source: Yi Wang’s linkedin
2025-12-22
#JAX #TPU

Jeff Dean & Gemini team QA at NeurIPS ‘25

Q1: are we running out of pretraining data? are we hitting the scaling law wall? I don’t quite buy it. Gemini only use a portion of the video data to train. We spent plenty of time on filtering the r
2025-12-05
#meetup #LLM #Gemini

Pytorch Conference & Ray Summit 2025 summary

1. OverallMany inference talks, but more RL talks. RL RL101 3 RL challenges: training collapses, training slow, hardware errors New frameworks / API: Tinker, SkyRL, Slime, SGLang’s Slime-based fr
2025-11-11
#RL #LLM inference #meetup #Training

Intro to PPO in RL

1. From Rewards to Optimization In RL, an agent interacts with an environment by observing a state , taking an action , and receiving a reward . In the context of LLM, state is the previous tokens, wh
2025-11-09
#RL #Training

Truncated Importance Sampling (TIS) in RL

truncated importance sampling (tis) this blog is from feng yao (ucsd phd student)’s work. i added some background and explanations to make it easier to understand. slides: on the rollout-training mis
2025-11-08
#RL

speculative decoding 02

Speaker: Lily Liu Working at OpenAI Graduated from UC Berkeley in early 2025 vLLM speculative decoding TL 1. Why is LLM generation slow? GPU memory hierarchy. A100 example: SRAM is super fast (19 TB
2025-09-19
#LLM inference #vLLM

vLLM 05 - vLLM multi-modal support

Speaker: Roger Wang 1. Overview large multi-modal models (LMMs) most SOTA Large Multimodal Models leverage a language model backbone with an encoder for a non-text modality. E.g., LLaVA, Qwen VL, Qwen
2025-06-06
#LLM inference #vLLM

Perplexity DeepSeek MoE

Speaker: Lequn Chen Sources https://www.perplexity.ai/hub/blog/lower-latency-and-higher-throughput-with-multi-node-deepseek-deployment https://github.com/ppl-ai/pplx-kernels 1. SetupMultiple nodes t
2025-05-16
#MoE #LLM inference

MoE history and OpenMoE

IntroThis article is compiled from a livestream.The guest speaker is Fuzhao Xue, a Google Deepmind Senior Research Scientist and the author of OpenMoE Main research areas: Gemini Pretraining, Model A
2025-04-25
#MoE #LLM inference

vLLM 04 - vLLM v1 version

Official V1 blog https://blog.vllm.ai/2025/01/27/v1-alpha-release.html Why V1? V0 is slow: CPU overhead is high V0 is hard to read and develop e.g., V0 scheduler is 2k LOC, V1 is 800 LOC V0 code decou
2025-04-18
#LLM inference #vLLM
123

Search

Hexo Fluid
visited times unique visitors: