this post was submitted on 01 Aug 2023
4 points (100.0% liked)

Free Open-Source Artificial Intelligence

3012 readers
17 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 2 years ago
MODERATORS
 

hi,

I got the whisper_stt extension running in Oobabooga and it (kinda) works. However it seems really, really bad in understanding my speech and recognition has been spotty at best.

I saw some youtube tutorials where it seemed to have no problem in understanding - even when spoken to in quite a bit of an accent - and in my own experience it performs knowhere near as good as shown there.

So - are there things I can do to improve its performance? Or may the yt tutorials have been edited to give a wrong impression and spotty performance is what to expect?

I'm very happy with the silero_tts and if I can get the speech-to-text to work at the same level, I'd be a happy camper already

Edit: It seems to be a memory problem. I can select several models in the extension interface - tiny, small, base, medium, ... If I choose the tiny or small model, it does work but with the poor results I mentioned above. If I select the medium model I get an OOM error (something like: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 11.99 GiB total capacity; 11.14 GiB already allocated; 0 bytes free; 11.22 GiB reserved in total by PyTorch) It looks to me as if the language model reserves the whole of my VRAM (which is 12GB) and doesn't leave any for the extension - is there a way to tweak it?

Edit 2:

Ok so, if I use a smaller language model (like a 6B model) it seems to be working perfectly fine with the medium whisper model ... so it is probably a memory issue. I have already tried to start with the command flag "--gpu-memory 5/8/10" which doesn't seem to do anything. Are there other ways of memory management?

you are viewing a single comment's thread
view the rest of the comments
[–] Blaed 1 points 2 years ago* (last edited 2 years ago)

You could try reducing your memory overhead by going down to 3B parameters. If you want to avoid that - maybe experiment with different models between both GPTQ & GGML formats?

If you're willing to spend a few dollars an hour, you could drastically increase overall memory and power and see if you can get it running on a rented GPU through something like vast.ai or runpod.ai. Might be worth exploring for any test of yours that might need extra oomph.

Given time, I think many of these models will become easier to run as new optimization and runtime methods begin to emerge.