What have you tried, for example the likes of Tortoise?
Free Open-Source Artificial Intelligence
Welcome to Free Open-Source Artificial Intelligence!
We are a community dedicated to forwarding the availability and access to:
Free Open Source Artificial Intelligence (F.O.S.A.I.)
More AI Communities
LLM Leaderboards
Developer Resources
GitHub Projects
FOSAI Time Capsule
- The Internet is Healing
- General Resources
- FOSAI Welcome Message
- FOSAI Crash Course
- FOSAI Nexus Resource Hub
- FOSAI LLM Guide
Bark, SpeechT5, MMS, and looked at Elevenlabs and Silero - the last 2 because they are enable options in Oobabooga, the first 3 because they are on hugging face.
I have used all of the above. In my experience, Elevenlabs is the most natural sounding (and easy-to-use) with open-source alternatives (kind of) close behind it.
Unfortunately, Elevenlabs code is proprietary, so there’s a bit of a compromise there (unless you want to use one of the open-source alternatives you mentioned). To your point though, they aren’t the most user friendly.
TTS has definitely been a neglected field of interest for some of the new tech to accompany this wave of AI development, but I think it’s only a matter of time before new options emerge as startups and other projects take flight this year and next. It will be a crucial area to nail for immersive video game dialogue, I’m sure someone will come up with a new platform or approach. Fingers crossed they make it open-source.
For now, my suggestion is sticking to whatever TTS workflow works best with your current tech stack until something new comes out.
If you end up finding something worth sharing, let us know! I’m very curious to see how audio and speech synthesis develops alongside all of this other fosai tech we’ve been seeing.
Well I tried tortoise TTS today and got a bit farther than others but it still doesn't work for me. I almost have it working, but figuring out the API and playing the audio from a conda container inside a distrobox container just to shield my system from the outdated stuff used in the project may prove to be too much for my skills. The documentation for offline execution is crap.
I'm actually getting farther into these configurations by keeping a Wizard LM 30B GGLM running in instruct mode the whole time and asking it questions. It is quite capable of taking in most output errors from a terminal and giving almost useful advice in many cases. That 30B model in GGML setup with 10 CPU threads and 20 layers on a 3080Ti-16GB is very close to the speed of a Llama2 7B running on just the GPU. It only crashes if I feed it something larger than what might fit on a single page of a PDF. My machine has 32GB of system memory. I think I need to get the max 64GB. As far as I have seen, a 7B model lies half the time, a 13B lies 20% of the time and my 30B lies around 10% at 4 bit. With a ton of extra RAM I want to see how much better a 30B is at 8 bit, or if a 70B is feasible and maybe closes the gap.
Really appreciate the info and insights. Helps me adjust and test my benchmarks a ton. It’s remarkable what we’re able to do with consumer hardware now. It’s exciting to imagine where we’ll be at even a year from now!
Let us know if you find a better setup and workflow in the future. Sounds pretty effective though. Curious to see how it powers up for you throughout the rest of the year.
Thanks again. All this info is very helpful for others looking to get something similar running.