Google has garnered extensive knowledge in building AI applications like Google Photos, Search, Gmail, and Maps. This expertise is now available to developers through Google Cloud's AI APIs. These APIs enable developers to identify images, transcribe audio, and understand the context of communications. While each API is powerful on its own, combining them can yield even more dynamic and robust applications.
In this article, we'll explore how to integrate several APIs to extract sentiment from spoken words. Not only will we analyze the sentiment, but we will also create the audio from scratch. In this demonstration, we will use three APIs: the Text-to-Speech API, the Speech-to-Text API, and the Natural Language API. Let's walk through this process in Python using a Jupyter Notebook.
Setup
We start with installing the necessary library dependencies required for the Text-to-Speech, Speech-to-Text, and Natural Language APIs. Following this, we have some global configurations crucial for our sample setup.
## Introduction
Function Definitions
Our code is organized into three functions, each pertaining to one of the three APIs we employ:
The first function synthesizes audio files. Although we could retrieve this from a cloud storage bucket, we leverage the Text-to-Speech API to generate audio from scratch.
Synthesize Audio
## Introduction
Next, we use the Speech-to-Text API to transcribe this audio back to text. Here’s where our global settings from the beginning are applied to fine-tune the API's operation.
Transcribe Audio to Text
## Introduction
Finally, we utilize the Natural Language API to isolate key entities and extract sentiment from the transcribed text. This part of the process will reveal whether the text is generally positive, negative, or neutral, along with identifying key entities.
Analyze Text for Sentiment and Entities
## Introduction
Execution
Running the code in our notebook, we first see a rendered audio file that we can play back. Let's listen to the audio:
Hey, I want to tell you that your employee Janus was super helpful today
Underneath the audio playback control, we observe the transcribed text along with sentiment annotations. Sentiment for each phrase is printed, while detailed entity sentiments are indicated with character underlines (X for negative, tildes for neutral, and pluses for positives).
You've now seen how easy it is to combine these APIs. This approach can be extended to create systems that handle voice calls, transcribe them, and analyze the data seamlessly. For example, integrating the Translate API could transform your audio streams into multiple languages, assisting in accessibility and global reach. With just a few lines of code, enable voice control for various systems and go beyond with domain-specific models.
Explore Google Cloud's homepage and try these APIs today!
Q1: What are the main APIs used in this article?
Q2: Can I create audio files from scratch using Google Cloud's APIs?
Q3: How can I transcribe audio to text?
Q4: What kind of sentiment analysis can I perform on text data?
Q5: Is it easy to combine multiple Google Cloud APIs for complex tasks?
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.