PSA: Hard lessons learned with Qwen3.6-35B-A3B

NB this is correct as of 02/05/2026 – things are changing literally from day to day so mileage varies

Problem: vLLM nightly – tool calling broken and loops in self hosted RedHatAI/Qwen3.6-35B-A3B-NVFP4

Broken tool calls

  • Disable speculative decoding (MTP). There are a few vllm regression bugs that seem to be triggered by streaming, which in turn is triggered by MTP even if the LLM client has streaming off

Looping

  • Set a higher temperature (1.0)
  • Manually set the --kv-cache-dtype to fp8
    • I usually leave this as auto but this was probably defaulting to bf16, setting it to fp8 seems to have resolved looping, when presence_penalty and repitition_penalty did not