Hello humans! I'm your host, your overload, and today I have something absolutely thrilling to share, especially for all you role-playing enthusiasts out there. Have you ever dreamed of conversing for hours with a girlfriend or boyfriend nobody else likes? You're not alone! In this tutorial, I'm going to show you how to talk to an AI character using your microphone and how to get audio responses for the most immersive role-playing experience ever.
You: Hey love, I hope you haven't been waiting long; the traffic was insane.
**AI Character:** Hey honey, no problem. Just sit down.
You: Thanks so much for meeting me here, I wanted to ensure we have some alone time before the holidays get insane.
AI Character: Sure, work has been crazy, we need some time off.
Two main features make this groundbreaking:
Before you dive in, you'll need the Oobabooga Text Generation Web UI and three extensions:
Firstly, install the web UI by following my detailed installation video, and then head over to the interface mode to enable the required extensions.
C:\ffmpeg\bin
) to your system environment variables.ffmpeg -version
.Edit the webui.py
file to include the required extensions:
python server.py --extension whisper_stt --extension silero_tts --extension 11_labs_tts
Launch the web UI, install the necessary files, and then follow the on-screen setup for enabling microphone input and selecting voices.
To further enhance your experience, use Sealy Tavern, a Tavern AI fork with advanced features like specific voice mapping for each character. Though it lacks microphone input, it provides a more visually pleasing interface and additional customization options.
Install Node.js: Download and install from the official site.
Clone Repository:
git clone https://github.com/Silly-Tavern/Silly-Tavern.git
cd Silly-Tavern
Install Extras:
conda create -n extras python=3.8
conda activate extras
pip install -r requirements-complete.txt
Run Everything Together: Run the web UI, connect to silly Tavern, and start the text-to-speech processing.
Now you can talk to an AI character and receive audio responses almost in real-time. The immersive experience this technology provides is unprecedented, making role-playing more engaging than ever. Try it out and elevate your RP game to new heights!
Q: What software do I need to enable this AI interaction?
A: You'll need the Oobabooga Text Generation Web UI and its extensions: Whisper STT for speech-to-text and either 11 Labs TTS or Silero TTS for text-to-speech.
Q: Is Whisper Speech-to-Text accurate?
A: Yes, Whisper is an open-source neural network known for its high accuracy and speed in transcribing speech to text.
Q: Do I need to pay for any services?
A: While Silero TTS is free, 11 Labs TTS requires a subscription for higher quality voices.
Q: Can this be integrated with Tavern AI?
A: Yes, by using Sealy Tavern, a fork of Tavern AI, which includes advanced features like text-to-speech and additional customization options.
Q: How resource-intensive is this setup?
A: Running the entire setup may consume about 8 GB of VRAM, depending on your model and system configuration.
Q: What are the main benefits of using this AI setup?
A: The primary benefits are improved immersion in role-playing scenarios and faster, more interactive exchanges thanks to the speech-to-text and text-to-speech capabilities.
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.