AI voice is catching up fast ! Qwen3-TTS the open source text-to-speech model from Qwen.ai is out and it's a big deal!
3 details in the latest release really change the conversation around AI voice:
- 1st: ~97 ms end-to-end streaming latency => That's no longer "fast TTS", it's interactive.
- 2nd: voice cloning from ~3 seconds of reference audio => That dramatically lowers the cost of personalization.
- 3rd: 10-language support with cross-lingual cloning. This is subtle but important.
Put together, this starts to look like a real shift.
AI Voice used to be a premium, closed, API-only feature but it's becoming infrastructure. And once voice becomes infrastructure, a lot of products that currently default to text will quietly start talking.
The fact that this is open source is huge as it puts pressure on closed providers like ElevenLabs and accelerates experimentation.
We're probably going to see a wave of real-time voice agents that don't feel like demos anymore!
Have a great week!
Aymen