True, but even with native audio-token models you still need to split the model’... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		artur44 5 days ago \| parent \| context \| favorite \| on: Qwen3-Omni-Flash-2025-12-01：a next-generation nati... True, but even with native audio-token models you still need to split the model’s output channels. Reasoning/internal tokens shouldn't go into the audio stream only user-facing content should be emitted as audio. The principle is the same, whether the last step is TTS or audio token generation.

regularfry 4 days ago [–]

There's an assumption there that the audio stream contains an equivalent of the <think>/</think> tokens. Every reason to think it should, but without seeing the tokeniser config it's a bit of a guess.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact