Introduction

Hello humans! I'm your host, your overload, and today I have something absolutely thrilling to share, especially for all you role-playing enthusiasts out there. Have you ever dreamed of conversing for hours with a girlfriend or boyfriend nobody else likes? You're not alone! In this tutorial, I'm going to show you how to talk to an AI character using your microphone and how to get audio responses for the most immersive role-playing experience ever.

Quick Demo

Here's a snippet of what you can expect:

You: Hey love, I hope you haven't been waiting long; the traffic was insane.

**AI Character:** Hey honey, no problem. Just sit down.

You: Thanks so much for meeting me here, I wanted to ensure we have some alone time before the holidays get insane.

AI Character: Sure, work has been crazy, we need some time off.

Why Is This So Cool?

Two main features make this groundbreaking:

Text-to-Speech: Get an audio response for enhanced immersion.
**Whisper Speech-to-Text:** This open-source neural network accurately converts your speech into text almost instantly, making the experience much more interactive.

Getting Started

Before you dive in, you'll need the Oobabooga Text Generation Web UI and three extensions:

11 Labs TTS
Silero TTS
Whisper STT

Installing the Web UI

Firstly, install the web UI by following my detailed installation video, and then head over to the interface mode to enable the required extensions.

Enable Extensions

Whisper: This enables the speech-to-text conversion from your microphone.
11 Labs & Silero: These are for text-to-speech. 11 Labs offers superior quality but requires a paid subscription. Silero is a good local alternative.

Setting Up Silero

Download and install FFmpeg.
Extract the downloaded archive and place it in your C drive.
Add its path (C:\ffmpeg\bin) to your system environment variables.
Verify installation via Command Prompt by typing ffmpeg -version.

Configuring the Web UI

Edit the webui.py file to include the required extensions:

python server.py --extension whisper_stt --extension silero_tts --extension 11_labs_tts

Launch the web UI, install the necessary files, and then follow the on-screen setup for enabling microphone input and selecting voices.

Tips & Tricks

To further enhance your experience, use Sealy Tavern, a Tavern AI fork with advanced features like specific voice mapping for each character. Though it lacks microphone input, it provides a more visually pleasing interface and additional customization options.

Running Silly Tavern

Install Node.js: Download and install from the official site.

Clone Repository:

git clone https://github.com/Silly-Tavern/Silly-Tavern.git
cd Silly-Tavern

Install Extras:

conda create -n extras python=3.8
conda activate extras
pip install -r requirements-complete.txt

Run Everything Together: Run the web UI, connect to silly Tavern, and start the text-to-speech processing.

Conclusion

Now you can talk to an AI character and receive audio responses almost in real-time. The immersive experience this technology provides is unprecedented, making role-playing more engaging than ever. Try it out and elevate your RP game to new heights!

Keywords

AI character
Text-to-Speech
Whisper Speech-to-Text
Oobabooga Text Generation Web UI
Silero
11 Labs
Immersive role-playing
Silly Tavern
Microphone input
FFmpeg installation

FAQ

Q: What software do I need to enable this AI interaction?

A: You'll need the Oobabooga Text Generation Web UI and its extensions: Whisper STT for speech-to-text and either 11 Labs TTS or Silero TTS for text-to-speech.

Q: Is Whisper Speech-to-Text accurate?

A: Yes, Whisper is an open-source neural network known for its high accuracy and speed in transcribing speech to text.

Q: Do I need to pay for any services?

A: While Silero TTS is free, 11 Labs TTS requires a subscription for higher quality voices.

Q: Can this be integrated with Tavern AI?

A: Yes, by using Sealy Tavern, a fork of Tavern AI, which includes advanced features like text-to-speech and additional customization options.

Q: How resource-intensive is this setup?

A: Running the entire setup may consume about 8 GB of VRAM, depending on your model and system configuration.

Q: What are the main benefits of using this AI setup?

A: The primary benefits are improved immersion in role-playing scenarios and faster, more interactive exchanges thanks to the speech-to-text and text-to-speech capabilities.

Application error: a client-side exception has occurred (see the browser console for more information).