Speculative decoding with Qwen 3.6-35B-A3B

At the time of writing, the two main approaches to speculative decoding with Qwen are Using layers from the model itself. This is only supported in specifically trained models such as Qwen 3.6, and is the default approach recommended by Qwen. In vLLM:--speculative-config '{"method":"mtp","num_speculative_tokens": 2}' Using a "drafter" model. In this case a much smaller … Continue reading Speculative decoding with Qwen 3.6-35B-A3B

Troubleshooting System Prompts in AI-Powered SIEM Solutions

The last three months have been frantic at Cybersift development - mainly due to the inclusion of generative AI into the SIEM. The backend currently powering the LLM chat interface on our SIEM is based off the excellent PydanticAI library. I like the approach taken by this library - while it helps reduce the boilerplate … Continue reading Troubleshooting System Prompts in AI-Powered SIEM Solutions