Creating a vocal AI assistant that communicates via speech is simpler than you might think. In this article, we'll walk through building a basic AI assistant using ChatGPT and Python, all with fewer than 80 lines of code. We'll deploy speech recognition to transcribe spoken words, ChatGPT for generating responses, and text-to-speech (TTS) to return vocal replies.
We'll need the following Python libraries:
pip install openai
pip install SpeechRecognition
pip install pyttsx3
To interact with ChatGPT, you’ll require an API key from OpenAI.
To recognize speech, we'll use the speech_recognition
library. Below is an outline of how to set up and use the library to listen to a user's voice and transcribe it into text:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Chat is ready, say something!")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("You said: " + text)
except sr.UnknownValueError:
print("Sorry, could not understand the audio")
except sr.RequestError:
print("Could not request results; check your network connection")
Next, integrate the openai
library to interface with ChatGPT.
import openai
openai.api_key = 'YOUR_API_KEY'
response = openai.Completion.create(
engine="text-davinci-003",
[prompt="Hello ChatGPT](https://www.topview.ai/blog/detail/Funny-ChatGPT-Conversations)!",
max_tokens=150,
n=1,
stop=None,
temperature=0.5
)
reply = response.choices[0].text.strip()
print(reply)
Utilize the pyttsx3
library to convert ChatGPT's text responses into speech:
import pyttsx3
engine = pyttsx3.init()
engine.say(reply)
engine.runAndWait()
Finally, we glue everything together with threading to ensure smooth execution.
import threading
def generate_response(text):
response = openai.Completion.create(
engine="text-davinci-003",
prompt=text,
max_tokens=150,
n=1,
stop=None,
temperature=0.5
)
return response.choices[0].text.strip()
def speak(text):
engine = pyttsx3.init()
engine.say(text)
engine.runAndWait()
while True:
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print(f"User said: (text)")
if "stop" in text.lower():
break
response = generate_response(text)
print(f"ChatGPT: (response)")
speak(response)
except:
print("Sorry, I couldn't understand that.")
We've created a basic vocal AI assistant using ChatGPT and Python. However, there is ample room for improvement:
Feel free to explore and enrich this assistant further!
You'll need openai
, speech_recognition
, pyttsx3
, threading
, and time
.
Use the command pip install openai
.
You can get an API key by signing up or signing in at OpenAI’s official website. The key can be found in the API section of your account.
You can instruct the AI to listen for a specific keyword like “stop” to break the listening loop.
You can adjust the voice type and speed using engine.setProperty('voice', voice_id)
and engine.setProperty('rate', rate)
functions in pyttsx3
.
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.