Finrandojin
Your first time is always over so quickly, isn't it?
- Joined
- Nov 4, 2015
- Messages
- 2
- Likes received
- 6
Hi everyone,
I'm a long time reader and dev I've tried most TTS services and programs that convert books to audio and just coudn't find something that satisfied me. I wanted something that felt more like a directed performance and less like a flat narration reading a spreadsheet, so I built Alexandria.
It is 100% free and open source. It runs locally on your own hardware, so there are no character limits, no subscriptions, and no one is looking over your shoulder at what you're generating.
Audio Sample: https://vocaroo.com/1cG82gVS61hn (Uses the built-in Sion LoRA)
GitHub Repository: https://github.com/Finrandojin/alexandria-audiobook/
Natural Non-Verbal Sounds
Unlike most tools that just skip over emotional cues or use tags like [gasp], the scripting engine in Alexandria actually writes out pronounceable vocalizations. It can handle things like gasps, laughter, sighs, crying, and heavy breathing. Because it uses Qwen3-TTS, it doesn't treat these as "tags" but as actual audio to be performed alongside the dialogue.
LLM-Powered Scripting
The tool uses a local LLM to parse your manuscript into a structured script. It identifies the different speakers and narration automatically. It also writes specific "vocal directions" for every line so the delivery matches the context of the scene.
Advanced Voice System
Production Editor
Full control over the final output. You can review / edit lines and change the instructions for the delivery. If a specific "gasp" or "laugh" doesn't sound right, you can regenerate lines or use a different instruction like "shaking with fear" or "breathless and exhausted."
Local and Private
Everything runs via Qwen3-TTS on your own machine. Your stories stay private and you never have to worry about a "usage policy" flagging your content.
Export Options
You can export as a single MP3 or as a full Audacity project. The Audacity export separates every character onto their own track with labels for every line of dialogue so you can see on the timeline what is being said and search the timeline for dialog. which makes it easy to add background music or fine-tune the timing between lines.
Supported configurations
I'm around to answer any technical questions or help with setup if anyone runs into issues.
I'm a long time reader and dev I've tried most TTS services and programs that convert books to audio and just coudn't find something that satisfied me. I wanted something that felt more like a directed performance and less like a flat narration reading a spreadsheet, so I built Alexandria.
It is 100% free and open source. It runs locally on your own hardware, so there are no character limits, no subscriptions, and no one is looking over your shoulder at what you're generating.
Audio Sample: https://vocaroo.com/1cG82gVS61hn (Uses the built-in Sion LoRA)
GitHub Repository: https://github.com/Finrandojin/alexandria-audiobook/
The Feature Set:
Natural Non-Verbal Sounds
Unlike most tools that just skip over emotional cues or use tags like [gasp], the scripting engine in Alexandria actually writes out pronounceable vocalizations. It can handle things like gasps, laughter, sighs, crying, and heavy breathing. Because it uses Qwen3-TTS, it doesn't treat these as "tags" but as actual audio to be performed alongside the dialogue.
LLM-Powered Scripting
The tool uses a local LLM to parse your manuscript into a structured script. It identifies the different speakers and narration automatically. It also writes specific "vocal directions" for every line so the delivery matches the context of the scene.
Advanced Voice System
- Custom Voices: Includes 9 high-quality built-in voices with full control over emotion, tone, and pacing.
- Cloning: You can clone a voice from any 5 to 15 second audio clip.
- LoRA Training: Includes a pipeline to train permanent, custom voice identities from your own datasets.
- Voice Design: You can describe a voice in plain text, like "a deep male voice with a raspy, tired edge," and generate it on the fly.
Production Editor
Full control over the final output. You can review / edit lines and change the instructions for the delivery. If a specific "gasp" or "laugh" doesn't sound right, you can regenerate lines or use a different instruction like "shaking with fear" or "breathless and exhausted."
Local and Private
Everything runs via Qwen3-TTS on your own machine. Your stories stay private and you never have to worry about a "usage policy" flagging your content.
Export Options
You can export as a single MP3 or as a full Audacity project. The Audacity export separates every character onto their own track with labels for every line of dialogue so you can see on the timeline what is being said and search the timeline for dialog. which makes it easy to add background music or fine-tune the timing between lines.
Supported configurations
| GPU | OS | Status | Driver Requirement | Notes |
|---|---|---|---|---|
| NVIDIA | Windows | Full support | Driver 550+ (CUDA 12.8) | Flash attention included for faster encoding |
| NVIDIA | Linux | Full support | Driver 550+ (CUDA 12.8) | Flash attention + triton included |
| AMD | Linux | Full support | ROCm 6.3 | ROCm optimizations applied automatically |
| AMD | Windows | CPU only | N/A | GPU acceleration is not supported — the app runs in CPU mode. For GPU acceleration with AMD, use Linux |
I'm around to answer any technical questions or help with setup if anyone runs into issues.