vLLM 01 - P/D disaggregation Why P/D disaggregation? Initial scheduler logic in vLLM: prioritize prefill for good throughput Problem: prefill may slow down other requests’ decode How to mix P and D together? Well, even thei 2025-03-28 #LLM inference #vLLM