At the time of writing, the two main approaches to speculative decoding with Qwen are Using layers from the model itself. This is only supported in specifically trained models such as Qwen 3.6, and is the default approach recommended by Qwen. In vLLM:--speculative-config '{"method":"mtp","num_speculative_tokens": 2}' Using a "drafter" model. In this case a much smaller … Continue reading Speculative decoding with Qwen 3.6-35B-A3B
Troubleshooting System Prompts in AI-Powered SIEM Solutions
The last three months have been frantic at Cybersift development - mainly due to the inclusion of generative AI into the SIEM. The backend currently powering the LLM chat interface on our SIEM is based off the excellent PydanticAI library. I like the approach taken by this library - while it helps reduce the boilerplate … Continue reading Troubleshooting System Prompts in AI-Powered SIEM Solutions
You must be logged in to post a comment.