Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Chilkey Slice68 HE: New Evangelion-themed Hall effect gaming keyboard on the way

    Xiaomi unveils HyperOS 3 while confirming dozens of devices will receive beta access

    Surprises from start to finish -person shooter for $5 instead of $50 in Steam Sale

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Blue-collar jobs are gaining popularity as AI threatens office work

      August 17, 2025

      Man who asked ChatGPT about cutting out salt from his diet was hospitalized with hallucinations

      August 15, 2025

      What happens when chatbots shape your reality? Concerns are growing online

      August 14, 2025

      Scientists want to prevent AI from going rogue by teaching it to be bad first

      August 8, 2025

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025
    • Business

      Why Certified VMware Pros Are Driving the Future of IT

      August 24, 2025

      Murky Panda hackers exploit cloud trust to hack downstream customers

      August 23, 2025

      The rise of sovereign clouds: no data portability, no party

      August 20, 2025

      Israel is reportedly storing millions of Palestinian phone calls on Microsoft servers

      August 6, 2025

      AI site Perplexity uses “stealth tactics” to flout no-crawl edicts, Cloudflare says

      August 5, 2025
    • Crypto

      Japan Auto Parts Maker Invests US Stablecoin Firm and Its Stock Soars

      August 29, 2025

      Stablecoin Card Firm Rain Raise $58M from Samsung and Sapphire

      August 29, 2025

      Shark Tank Star Kevin O’Leary Expands to Bitcoin ETF

      August 29, 2025

      BitMine Stock Moves Opposite to Ethereum — What Are Analysts Saying?

      August 29, 2025

      Argentina’s Opposition Parties Reactivate LIBRA Investigation Into President Milei

      August 29, 2025
    • Technology

      Chilkey Slice68 HE: New Evangelion-themed Hall effect gaming keyboard on the way

      August 29, 2025

      Xiaomi unveils HyperOS 3 while confirming dozens of devices will receive beta access

      August 29, 2025

      Surprises from start to finish -person shooter for $5 instead of $50 in Steam Sale

      August 29, 2025

      OnePlus 15 makes Geekbench debut along with heavily nerfed Snapdragon 8 Elite 2

      August 29, 2025

      Samsung Galaxy Tab S11 Enterprise Edition price leaks ahead of launch

      August 29, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Abogen – Generate audiobooks from EPUBs, PDFs and text
    Technology

    Abogen – Generate audiobooks from EPUBs, PDFs and text

    TechAiVerseBy TechAiVerseAugust 10, 2025No Comments12 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Abogen – Generate audiobooks from EPUBs, PDFs and text
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    Abogen – Generate audiobooks from EPUBs, PDFs and text

    abogen






    Abogen is a powerful text-to-speech conversion tool that makes it easy to turn ePub, PDF, or text files into high-quality audio with matching subtitles in seconds. Use it for audiobooks, voiceovers for Instagram, YouTube, TikTok, or any project that needs natural-sounding text-to-speech, using Kokoro-82M.

    Demo

    demo.mp4


    This demo was generated in just 5 seconds, producing ∼1 minute of audio with perfectly synced subtitles. To create a similar video, see the demo guide.

    How to install?

    Windows

    Go to espeak-ng latest release download and run the *.msi file.

    OPTION 1: Install using script

    1. Download the repository
    2. Extract the ZIP file
    3. Run WINDOWS_INSTALL.bat by double-clicking it

    This method handles everything automatically – installing all dependencies including CUDA in a self-contained environment without requiring a separate Python installation. (You still need to install espeak-ng.)

    Note

    You don’t need to install Python separately. The script will install Python automatically.

    OPTION 2: Install using pip

    # Create a virtual environment (optional)
    mkdir abogen && cd abogen
    python -m venv venv
    venvScriptsactivate
    
    # For NVIDIA GPUs:
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
    
    # For AMD GPUs:
    # Not supported yet, because ROCm is not available on Windows. Use Linux if you have AMD GPU.
    
    # Install abogen
    pip install abogen

    Mac

    # Install espeak-ng
    brew install espeak-ng
    
    # Create a virtual environment (recommended)
    mkdir abogen && cd abogen
    python3 -m venv venv
    source venv/bin/activate
    
    # Install abogen
    pip3 install abogen

    Linux

    # Install espeak-ng
    sudo apt install espeak-ng # Ubuntu/Debian
    sudo pacman -S espeak-ng # Arch Linux
    sudo dnf install espeak-ng # Fedora
    
    # Create a virtual environment (recommended)
    mkdir abogen && cd abogen
    python3 -m venv venv
    source venv/bin/activate
    
    # Install abogen
    pip3 install abogen
    
    # For NVIDIA GPUs:
    # Already supported, no need to install CUDA separately.
    
    # For AMD GPUs:
    # After installing abogen, we need to uninstall the existing torch package
    pip3 uninstall torch 
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4

    Tip

    If you get WARNING: The script abogen-cli is installed in '/home/username/.local/bin' which is not on PATH. error, run the following command to add it to your PATH:

    > ~/.bashrc && source ~/.bashrc”>

    echo "export PATH="/home/$USER/.local/bin:$PATH"" >> ~/.bashrc && source ~/.bashrc

    Tip

    If you get “No matching distribution found” error, try installing it on supported Python (3.10 to 3.12). You can use pyenv to manage multiple Python versions easily in Linux. Watch this video by NetworkChuck for a quick guide.

    Special thanks to @hg000125 for his contribution in #23. AMD GPU support is possible thanks to his work.

    How to run?

    If you installed using pip, you can simply run the following command to start Abogen:

    Tip

    If you installed using the Windows installer (WINDOWS_INSTALL.bat), It should have created a shortcut in the same folder, or your desktop. You can run it from there. If you lost the shortcut, Abogen is located in python_embedded/Scripts/abogen.exe. You can run it from there directly.

    How to use?

    1. Drag and drop any ePub, PDF, or text file (or use the built-in text editor)
    2. Configure the settings:
      • Set speech speed
      • Select a voice (or create a custom voice using voice mixer)
      • Select subtitle generation style (by sentence, word, etc.)
      • Select output format
      • Select where to save the output
    3. Hit Start

    In action

    Here’s Abogen in action: in this demo, it processes ∼3,000 characters of text in just 11 seconds and turns it into 3 minutes and 28 seconds of audio, and I have a low-end RTX 2060 Mobile laptop GPU. Your results may vary depending on your hardware.

    Configuration

    Options Description
    Input Box Drag and drop ePub, PDF, or .TXT files (or use built-in text editor)
    Queue options Add multiple files to a queue and process them in batch, with individual settings for each file. See Queue mode for more details.
    Speed Adjust speech rate from 0.1x to 2.0x
    Select Voice First letter of the language code (e.g., a for American English, b for British English, etc.), second letter is for m for male and f for female.
    Voice mixer Create custom voices by mixing different voice models with a profile system. See Voice Mixer for more details.
    Voice preview Listen to the selected voice before processing.
    Generate subtitles Disabled, Sentence, Sentence + Comma, 1 word, 2 words, 3 words, etc. (Represents the number of words in each subtitle entry)
    Output voice format .WAV, .FLAC, .MP3, .OPUS (best compression) and M4B (with chapters) (Special thanks to @jborza for chapter support in PR #10)
    Output subtitle format Configures the subtitle format as SRT (standard), ASS (wide), ASS (narrow), ASS (centered wide), or ASS (centered narrow).
    Replace single newlines with spaces Replaces single newlines with spaces in the text. This is useful for texts that have imaginary line breaks.
    Save location Save next to input file, Save to desktop, or Choose output folder


    Book handler options Description
    Chapter Control Select specific chapters from ePUBs or chapters + pages from PDFs.
    Save each chapter separately Save each chapter in e-books as a separate audio file.
    Create a merged version Create a single audio file that combines all chapters. (If Save each chapter separately is disabled, this option will be the default behavior.)
    Save in a project folder with metadata Save the converted items in a project folder with available metadata files.


    Menu options Description
    Theme Change the application’s theme using System, Light, or Dark options.
    Configure max words per subtitle Configures the maximum number of words per subtitle entry.
    Configure max lines in log window Configures the maximum number of lines to display in the log window.
    Separate chapters audio format Configures the audio format for separate chapters as wav, flac, mp3, or opus.
    Create desktop shortcut Creates a shortcut on your desktop for easy access.
    Open config directory Opens the directory where the configuration file is stored.
    Open cache directory Opens the cache directory where converted text files are stored.
    Clear cache files Deletes cache files created during the conversion or preview.
    Check for updates at startup Automatically checks for updates when the program starts.
    Disable Kokoro’s internet access Prevents Kokoro from downloading models or voices from HuggingFace Hub, useful for offline use.
    Reset to default settings Resets all settings to their default values.

    Voice Mixer

    With voice mixer, you can create custom voices by mixing different voice models. You can adjust the weight of each voice and save your custom voice as a profile for future use. The voice mixer allows you to create unique and personalized voices. (Huge thanks to @jborza for making this possible through his contributions in #5)

    Queue Mode

    Abogen supports queue mode, allowing you to add multiple files to a processing queue. This is useful if you want to convert several files in one batch.

    • You can add text files (.txt) directly using the Add files button in the Queue Manager. To add PDF or EPUB files, use the input box in the main window and click the Add to Queue button.
    • Each file in the queue keeps the configuration settings that were active when it was added. Changing the main window configuration afterward does not affect files already in the queue.
    • You can view each file’s configuration by hovering over them.

    Abogen will process each item in the queue automatically, saving outputs as configured.

    Special thanks to @jborza for adding queue mode in PR #35

    About Chapter Markers

    When you process ePUB or PDF files, Abogen converts them into text files stored in your cache directory. When you click “Edit,” you’re actually modifying these converted text files. In these text files, you’ll notice tags that look like this:

    These are chapter markers. They are automatically added when you process ePUB or PDF files, based on the chapters you select. They serve an important purpose:

    • Allow you to split the text into separate audio files for each chapter
    • Save time by letting you reprocess only specific chapters if errors occur, rather than the entire file

    You can manually add these markers to plain text files for the same benefits. Simply include them in your text like this:

    When you process the text file, Abogen will detect these markers automatically and ask if you want to save each chapter separately and create a merged version.

    About Metadata Tags

    Similar to chapter markers, it is possible to add metadata tags for M4B files. This is useful for audiobook players that support metadata, allowing you to add information like title, author, year, etc. Abogen automatically adds these tags when you process ePUB or PDF files, but you can also add them manually to your text files. Add metadata tags at the beginning of your text file like this:

    Supported Languages

    For a complete list of supported languages and voices, refer to Kokoro’s VOICES.md. To listen to sample audio outputs, see SAMPLES.md.

    MPV Config

    I highly recommend using MPV to play your audio files, as it supports displaying subtitles even without a video track. Here’s my mpv.conf:

    # --- MPV Settings ---
    save-position-on-quit
    keep-open=yes
    # --- Subtitle ---
    sub-ass-override=no
    sub-margin-y=50
    sub-margin-x=50
    # --- Audio Quality ---
    audio-spdif=ac3,dts,eac3,truehd,dts-hd
    audio-channels=auto
    audio-samplerate=48000
    volume-max=200
    

    Docker Guide

    If you want to run Abogen in a Docker container:

    1. Download the repository and extract, or clone it using git.
    2. Go to abogen folder. You should see Dockerfile there.
    3. Open your termminal in that directory and run the following commands:

    # Build the Docker image:
    docker build --progress plain -t abogen .
    
    # Note that building the image may take a while.
    # After building is complete, run the Docker container:
    
    # Windows
    docker run --name abogen -v %cd%:/shared -p 5800:5800 -p 5900:5900 --gpus all abogen
    
    # Linux
    docker run --name abogen -v $(pwd):/shared -p 5800:5800 -p 5900:5900 --gpus all abogen
    
    # MacOS
    docker run --name abogen -v $(pwd):/shared -p 5800:5800 -p 5900:5900 abogen
    
    # We expose port 5800 for use by a web browser, 5900 if you want to connect with a VNC client.

    Abogen launches automatically inside the container.

    • You can access it via a web browser at http://localhost:5800 or connect to it using a VNC client at localhost:5900.
    • You can use /shared directory to share files between your host and the container.
    • For later use, start it with docker start abogen and stop it with docker stop abogen.

    Known issues:

    • Audio preview is not working inside container (ALSA error).
    • Open cache directory and Open configuration directory options in settings not working. (Tried pcmanfm, did not work with Abogen).

    (Special thanks to @geo38 from Reddit, who provided the Dockerfile and instructions in this comment.)

    Similar Projects

    Abogen is a standalone project, but it is inspired by and shares some similarities with other projects. Here are a few:

    • audiblez: Generate audiobooks from e-books. (Has CLI and GUI support)
    • autiobooks: Automatically convert epubs to audiobooks
    • pdf-narrator: Convert your PDFs and EPUBs into audiobooks effortlessly.
    • epub_to_audiobook: EPUB to audiobook converter, optimized for Audiobookshelf
    • ebook2audiobook: Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning

    Roadmap

    • Add OCR scan feature for PDF files using docling/teserract.
    • Add chapter metadata for .m4a files. (Issue #9, PR #10)
    • Add support for different languages in GUI.
    • Add voice formula feature that enables mixing different voice models. (Issue #1, PR #5)
    • Add support for kokoro-onnx (If it’s necessary).
    • Add dark mode.

    Troubleshooting

    If you encounter any issues while running Abogen, try launching it from the command line with:

    This will start Abogen in command-line mode and display detailed error messages. Please open a new issue on the Issues page with the error message and a description of your problem.

    Contributing

    I welcome contributions! If you have ideas for new features, improvements, or bug fixes, please fork the repository and submit a pull request.

    For developers and contributors

    If you’d like to modify the code and contribute to development, you can download the repository, extract it and run the following commands to build or install the package:

    # Go to the directory where you extracted the repository and run:
    pip install -e .      # Installs the package in editable mode
    pip install build     # Install the build package
    python -m build       # Builds the package in dist folder (optional)
    abogen                # Opens the GUI

    Feel free to explore the code and make any changes you like.

    Credits

    • Abogen uses Kokoro for its high-quality, natural-sounding text-to-speech synthesis. Huge thanks to the Kokoro team for making this possible.
    • Thanks to @wojiushixiaobai for Embedded Python packages. These modified packages include pip pre-installed, enabling Abogen to function as a standalone application without requiring users to separately install Python in Windows.
    • Thanks to creators of EbookLib, a Python library for reading and writing ePub files, which is used for extracting text from ePub files.
    • Special thanks to the PyQt team for providing the cross-platform GUI toolkit that powers Abogen’s interface.
    • Icons: US, Great Britain, Spain, France, India, Italy, Japan, Brazil, China, Female, Male, Adjust and Voice Id icons by Icons8.

    License

    This project is available under the MIT License – see the LICENSE file for details.
    Kokoro is licensed under Apache-2.0 which allows commercial use, modification, distribution, and private use.

    Important

    Subtitle generation currently works only for English. This is because Kokoro provides timestamp tokens only for English text. If you want subtitles in other languages, please request this feature in the Kokoro project. For more technical details, see this line in the Kokoro’s code.

    Tags: audiobook, kokoro, text-to-speech, TTS, audiobook generator, audiobooks, text to speech, audiobook maker, audiobook creator, audiobook generator, voice-synthesis, text to audio, text to audio converter, text to speech converter, text to speech generator, text to speech software, text to speech app, epub to audio, pdf to audio, content-creation, media-generation

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleMelonking Website
    Next Article From terabytes to insights: Real-world AI obervability architecture
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Chilkey Slice68 HE: New Evangelion-themed Hall effect gaming keyboard on the way

    August 29, 2025

    Xiaomi unveils HyperOS 3 while confirming dozens of devices will receive beta access

    August 29, 2025

    Surprises from start to finish -person shooter for $5 instead of $50 in Steam Sale

    August 29, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025166 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202548 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202530 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202528 Views
    Don't Miss
    Technology August 29, 2025

    Chilkey Slice68 HE: New Evangelion-themed Hall effect gaming keyboard on the way

    Chilkey Slice68 HE: New Evangelion-themed Hall effect gaming keyboard on the way – NotebookCheck.net NewsWhile…

    Xiaomi unveils HyperOS 3 while confirming dozens of devices will receive beta access

    Surprises from start to finish -person shooter for $5 instead of $50 in Steam Sale

    OnePlus 15 makes Geekbench debut along with heavily nerfed Snapdragon 8 Elite 2

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Chilkey Slice68 HE: New Evangelion-themed Hall effect gaming keyboard on the way

    August 29, 20252 Views

    Xiaomi unveils HyperOS 3 while confirming dozens of devices will receive beta access

    August 29, 20252 Views

    Surprises from start to finish -person shooter for $5 instead of $50 in Steam Sale

    August 29, 20252 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.