gdymind's blog
  • Home
  • Archives
  • Categories
  • Tags
  • About

vLLM 01 - P/D disaggregation

Why P/D disaggregation? Initial scheduler logic in vLLM: prioritize prefill for good throughput Problem: prefill may slow down other requests’ decode How to mix P and D together? Well, even thei
2025-03-28
#LLM inference #vLLM
123

Search

Hexo Fluid
visited times unique visitors: