Feature Support

Feature Support#

The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.

You can check the support status of vLLM V1 Engine. Below is the feature support status of vLLM Ascend:

Feature	Status	Next Step
Chunked Prefill	🟢 Functional	Functional, see detail note: Chunked Prefill
Automatic Prefix Caching	🟢 Functional	Functional, see detail note: vllm-ascend#732
LoRA	🟢 Functional	vllm-ascend#396, vllm-ascend#893
Prompt adapter	🔴 No plan	This feature has been deprecated by vLLM.
Speculative decoding	🟢 Functional	Basic support
Pooling	🟢 Functional	CI needed and adapting more models; V1 support rely on vLLM support.
Enc-dec	🟡 Planned	vLLM should support this feature first.
Multi Modality	🟢 Functional	Tutorial, optimizing and adapting more models
LogProbs	🟢 Functional	CI needed
Prompt logProbs	🟢 Functional	CI needed
Async output	🟢 Functional	CI needed
Multi step scheduler	🔴 Deprecated	vllm#8779, replaced by vLLM V1 Scheduler
Best of	🔴 Deprecated	vllm#13361
Beam search	🟢 Functional	CI needed
Guided Decoding	🟢 Functional	vllm-ascend#177
Tensor Parallel	🟢 Functional	Make TP >4 work with graph mode
Pipeline Parallel	🟢 Functional	Write official guide and tutorial.
Expert Parallel	🟢 Functional	Dynamic EPLB support.
Data Parallel	🟢 Functional	Data Parallel support for Qwen3 MoE.
Prefill Decode Disaggregation	🚧 WIP	working on 1P1D and xPyD.
Quantization	🟢 Functional	W8A8 available; working on more quantization method support(W4A8, etc)
Graph Mode	🔵 Experimental	Experimental, see detail note: vllm-ascend#767
Sleep Mode	🟢 Functional

🟢 Functional: Fully operational, with ongoing optimizations.
🔵 Experimental: Experimental support, interfaces and functions may change.
🚧 WIP: Under active development, will be supported soon.
🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
🔴 NO plan / Deprecated: No plan or deprecated by vLLM.