NB this is correct as of 02/05/2026 – things are changing literally from day to day so mileage varies
Problem: vLLM nightly – tool calling broken and loops in self hosted RedHatAI/Qwen3.6-35B-A3B-NVFP4
Broken tool calls
- Disable speculative decoding (MTP). There are a few vllm regression bugs that seem to be triggered by streaming, which in turn is triggered by MTP even if the LLM client has streaming off
Looping
- Set a higher temperature (
1.0) - Manually set the
--kv-cache-dtypetofp8- I usually leave this as
autobut this was probably defaulting tobf16, setting it tofp8seems to have resolved looping, whenpresence_penaltyandrepitition_penaltydid not
- I usually leave this as