My senior does not wake up thinking about phoneme models. Fair enough. What they do think about, apparently every few weeks, is: “Can we turn this paragraph into audio?”

Somewhere between the third and fourth TTS request in a single month, the free-tier internet started giving me the cold shoulder. You know the look—polite HTTP errors, surprise limits, and the emotional equivalent of “we’re not angry, we’re just disappointed.”

Enough. It was time to host my own chaos.

Fork in the road

I pointed my terminal at psiTTSa—an offline text-to-speech stack you can run at home (FastAPI, Piper, optional pyttsx3, MP3s, queue, cancel, the whole polite engineering package). Then I forked it and did a light rewrite so it wouldn’t look like a photocopy. Same DNA, different haircut. If you squint, it’s a cousin.

The README promises optional GPU joy. My server read that sentence and laughed quietly to itself, because my box has no GPU. We are on a strict CPU diet, thank you very much. Piper still runs; it just does its little marathon instead of a sprint.

The Indonesian voice support group

Here’s where Piper and I had an awkward conversation.

Indonesian? Technically available. Natural? Debatable. And the catalogue feels less like a choir and more like a solo act—one voice, doing its best, while English gets a whole boy band of options.

So for anything that had to sound good, the pragmatic move was: English voices with variety. For Indonesian, you get loyalty points for honesty, not Polyglot of the Year.

Scope creep was not invited

Important disclaimer for anyone expecting a saga: this whole thing was about one hour. Not a quarter. Not a roadmap. A fun-sized project between meetings—the engineering equivalent of building a treehouse and claiming you’re now in real estate.

Would I production-harden this for a Fortune 500 launch? No. Would I do it again to stop clicking through yet another “try again tomorrow” banner? Absolutely.

Lessons learned

  1. Boss requests scale faster than API generosity. Who knew.
  2. Self-hosting is just trauma bonding with ffmpeg.
  3. GPUs are optional; coping skills are not.
  4. One Indonesian Piper voice is still one more friend than zero—just don’t ask it to host a podcast duel against five English narrators.

If you’re in the same boat: fork something solid, embrace the CPU life, and remember—when the queue lights up, it’s not personal. It’s just Tuesday… and also maybe Thursday… and definitely the week after that.