It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.
Oh this is sweet, thanks for sharing! I've been a huge fan of Kokoro and event setup my own fully-local voice assistant [1]. Will definitely give Pocket TTS a go!
I echo this. For a TTS system to be in any way useful outside the tiny population of the world that speaks exclusively English, it must be multilingual and dynamically switch between languages pretty much per word.
I love that everyone is making their own TTS model as they are not as expensive as many other models to train. Also there are plenty of different architecture.
It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.
Just made it an MCP server so claude can tell me when it's done with something :)
https://github.com/Marviel/speak_when_done
[1] https://github.com/acatovic/ova
For voice cloning, pocket tts is walled so I can't tell
Cool tech demo though!
Another recent example: https://github.com/supertone-inc/supertonic
https://huggingface.co/spaces/Supertone/supertonic-2
It seems like it is being trained by one person, and it is surprisingly natural for such a small model.
I remember when TTS always meant the most robotic, barely comprehensible voices.
https://www.reddit.com/r/LocalLLaMA/comments/1qcusnt/soprano...
https://huggingface.co/ekwek/Soprano-1.1-80M