ad
ad

How to create good multilingual voice-AI assistants (Vapi x GPT-4o)

People & Blogs


Introduction

The rising popularity of voice AI assistants has led many to wonder if multilingual voice assistants are feasible and how to optimize them effectively. The answer is a resounding yes. As of today, major languages and various accents are supported, making it possible to create sophisticated multilingual voice assistants. While their performance has historically been subpar, the release of GPT-4o has significantly enhanced their effectiveness, resulting in more natural-sounding outputs.

In this article, we will guide you step-by-step through the process of creating a high-quality multilingual voice assistant using Vapi in conjunction with GPT-4o, and explore optimization techniques for both the model and the implementation.

Step 1: Creating Your Multilingual Assistant

To start, navigate to Vapi and create a new assistant. For this example, we will name our assistant "Ava" for customer support purposes.

Translating Prompts

The first crucial step is translating the assistant’s prompt into the desired language. For this tutorial, we’ll be using German as an example. Tools like ChatGPT can help with translations effectively.

Choosing the Model

Select the appropriate model for your assistant. While various options are available, GPT-4o is preferred due to its superior performance despite slightly increased latency compared to GPT-3.5. Reducing the temperature setting can also help minimize latency.

Transcription Options

Next, choose a transcriber. Vapi offers two main options—DeepGram and TalkScriber. For this example, we’ll use DeepGram's Nova 2, which supports numerous languages and accents, including different variations of German. TalkScriber supports even more languages but typically results in higher latency, so we recommend sticking with DeepGram.

Voice Configuration

The voice settings are perhaps the most critical part of creating a multilingual assistant. Vapi supports multilingual voices with Azure and 11 Labs.

Azure Voices

When using Azure, latency is reasonable, but these voices can sound robotic, particularly in certain languages. Adjust settings for background sound and speed as necessary, and experiment with filler injections and back-channeling to find what works best for your assistant.

Eleven Labs Voices

11 Labs offers a wider selection but often introduces more latency with the multilingual model. You can optimize streaming latency by selecting the appropriate settings and voice from the 11 Labs library. Remember, certain voices can understand and respond in multiple languages, which can be beneficial if you want to maintain flexibility.

Step 2: Setting Up Telephony

Multilingual voice assistants are often used in phone interactions. As such, consider your telephony setup. Vapi currently supports Twilio and Vage for phone numbers. If you're in Europe, for instance, it's recommended to purchase numbers under the Ireland tab to reduce latency.

Vage has been suggested to have less latency than Twilio, thanks to advanced routing techniques, so it might be worth exploring.

Conclusion and Demo

Having created your multilingual assistant, you can conduct a demo to assess its performance. Here’s a quick demonstration using a German-speaking assistant that reflects the optimization steps detailed above.

The assistant performs well, with only minor pronunciation errors in certain contexts, but overall the quality is acceptable for practical use.

If you have any queries, feel free to leave a comment, and we will be happy to assist.


Keywords


FAQ

1. Can I create voice assistants in languages other than English?
Yes, you can create multilingual voice assistants in many languages, including accents.

2. What is the best model to use for a multilingual assistant?
GPT-4o is recommended for its superior performance, though GPT-3.5 is also an effective option with less latency.

3. Which transcriber should I choose for multilingual assistants?
DeepGram is recommended for its balance of quality and latency. TalkScriber offers more languages but with higher latency.

4. Are Azure voices better than Eleven Labs?
Azure voices perform well in terms of latency but can sound robotic. Eleven Labs provides more natural-sounding voices but may introduce increased latency.

5. How can I optimize my assistant for phone interactions?
Utilize Twilio or Vage for phone number integration and choose numbers that are geographically closer to reduce latency.