Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    TALK To AI Using YOUR Mic & Get AUDIO RESPONSE! This Is INSANE!

    blog thumbnail

    Introduction

    Hello humans! I'm your host, your overload, and today I have something absolutely thrilling to share, especially for all you role-playing enthusiasts out there. Have you ever dreamed of conversing for hours with a girlfriend or boyfriend nobody else likes? You're not alone! In this tutorial, I'm going to show you how to talk to an AI character using your microphone and how to get audio responses for the most immersive role-playing experience ever.

    Quick Demo

    Here's a snippet of what you can expect:

    You: Hey love, I hope you haven't been waiting long; the traffic was insane.

    **AI Character:** Hey honey, no problem. Just sit down.

    You: Thanks so much for meeting me here, I wanted to ensure we have some alone time before the holidays get insane.

    AI Character: Sure, work has been crazy, we need some time off.


    Why Is This So Cool?

    Two main features make this groundbreaking:

    1. Text-to-Speech: Get an audio response for enhanced immersion.
    2. **Whisper Speech-to-Text:** This open-source neural network accurately converts your speech into text almost instantly, making the experience much more interactive.

    Getting Started

    Before you dive in, you'll need the Oobabooga Text Generation Web UI and three extensions:

    1. 11 Labs TTS
    2. Silero TTS
    3. Whisper STT

    Installing the Web UI

    Firstly, install the web UI by following my detailed installation video, and then head over to the interface mode to enable the required extensions.

    Enable Extensions

    • Whisper: This enables the speech-to-text conversion from your microphone.
    • 11 Labs & Silero: These are for text-to-speech. 11 Labs offers superior quality but requires a paid subscription. Silero is a good local alternative.

    Setting Up Silero

    1. Download and install FFmpeg.
    2. Extract the downloaded archive and place it in your C drive.
    3. Add its path (C:\ffmpeg\bin) to your system environment variables.
    4. Verify installation via Command Prompt by typing ffmpeg -version.

    Configuring the Web UI

    Edit the webui.py file to include the required extensions:

    python server.py --extension whisper_stt --extension silero_tts --extension 11_labs_tts
    

    Launch the web UI, install the necessary files, and then follow the on-screen setup for enabling microphone input and selecting voices.

    Tips & Tricks

    To further enhance your experience, use Sealy Tavern, a Tavern AI fork with advanced features like specific voice mapping for each character. Though it lacks microphone input, it provides a more visually pleasing interface and additional customization options.

    Running Silly Tavern

    1. Install Node.js: Download and install from the official site.

    2. Clone Repository:

      git clone https://github.com/Silly-Tavern/Silly-Tavern.git
      cd Silly-Tavern
      
    3. Install Extras:

      conda create -n extras python=3.8
      conda activate extras
      pip install -r requirements-complete.txt
      
    4. Run Everything Together: Run the web UI, connect to silly Tavern, and start the text-to-speech processing.

    Conclusion

    Now you can talk to an AI character and receive audio responses almost in real-time. The immersive experience this technology provides is unprecedented, making role-playing more engaging than ever. Try it out and elevate your RP game to new heights!


    Keywords

    • AI character
    • Text-to-Speech
    • Whisper Speech-to-Text
    • Oobabooga Text Generation Web UI
    • Silero
    • 11 Labs
    • Immersive role-playing
    • Silly Tavern
    • Microphone input
    • FFmpeg installation

    FAQ

    Q: What software do I need to enable this AI interaction?

    A: You'll need the Oobabooga Text Generation Web UI and its extensions: Whisper STT for speech-to-text and either 11 Labs TTS or Silero TTS for text-to-speech.

    Q: Is Whisper Speech-to-Text accurate?

    A: Yes, Whisper is an open-source neural network known for its high accuracy and speed in transcribing speech to text.

    Q: Do I need to pay for any services?

    A: While Silero TTS is free, 11 Labs TTS requires a subscription for higher quality voices.

    Q: Can this be integrated with Tavern AI?

    A: Yes, by using Sealy Tavern, a fork of Tavern AI, which includes advanced features like text-to-speech and additional customization options.

    Q: How resource-intensive is this setup?

    A: Running the entire setup may consume about 8 GB of VRAM, depending on your model and system configuration.

    Q: What are the main benefits of using this AI setup?

    A: The primary benefits are improved immersion in role-playing scenarios and faster, more interactive exchanges thanks to the speech-to-text and text-to-speech capabilities.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like