Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    Combining AI APIs to work together

    blog thumbnail

    Combining AI APIs to Work Together

    Google has garnered extensive knowledge in building AI applications like Google Photos, Search, Gmail, and Maps. This expertise is now available to developers through Google Cloud's AI APIs. These APIs enable developers to identify images, transcribe audio, and understand the context of communications. While each API is powerful on its own, combining them can yield even more dynamic and robust applications.

    In this article, we'll explore how to integrate several APIs to extract sentiment from spoken words. Not only will we analyze the sentiment, but we will also create the audio from scratch. In this demonstration, we will use three APIs: the Text-to-Speech API, the Speech-to-Text API, and the Natural Language API. Let's walk through this process in Python using a Jupyter Notebook.

    Step-by-Step Guide

    Setup

    We start with installing the necessary library dependencies required for the Text-to-Speech, Speech-to-Text, and Natural Language APIs. Following this, we have some global configurations crucial for our sample setup.

    ## Introduction
    

    Function Definitions

    Our code is organized into three functions, each pertaining to one of the three APIs we employ:

    1. Text-to-Speech API
    2. Speech-to-Text API
    3. Natural Language API

    The first function synthesizes audio files. Although we could retrieve this from a cloud storage bucket, we leverage the Text-to-Speech API to generate audio from scratch.

    Synthesize Audio

    ## Introduction
    

    Next, we use the Speech-to-Text API to transcribe this audio back to text. Here’s where our global settings from the beginning are applied to fine-tune the API's operation.

    Transcribe Audio to Text

    ## Introduction
    

    Finally, we utilize the Natural Language API to isolate key entities and extract sentiment from the transcribed text. This part of the process will reveal whether the text is generally positive, negative, or neutral, along with identifying key entities.

    Analyze Text for Sentiment and Entities

    ## Introduction
    

    Execution

    Running the code in our notebook, we first see a rendered audio file that we can play back. Let's listen to the audio:

    Hey, I want to tell you that your employee Janus was super helpful today
    

    Underneath the audio playback control, we observe the transcribed text along with sentiment annotations. Sentiment for each phrase is printed, while detailed entity sentiments are indicated with character underlines (X for negative, tildes for neutral, and pluses for positives).

    Practical Applications

    You've now seen how easy it is to combine these APIs. This approach can be extended to create systems that handle voice calls, transcribe them, and analyze the data seamlessly. For example, integrating the Translate API could transform your audio streams into multiple languages, assisting in accessibility and global reach. With just a few lines of code, enable voice control for various systems and go beyond with domain-specific models.

    Explore Google Cloud's homepage and try these APIs today!


    Keywords

    • Google Cloud
    • AI APIs
    • Text-to-Speech
    • Speech-to-Text
    • Natural Language
    • Sentiment Analysis
    • Python
    • Jupyter Notebook

    FAQ

    Q1: What are the main APIs used in this article?

    • A1: The main APIs discussed are the Text-to-Speech API, Speech-to-Text API, and Natural Language API.

    Q2: Can I create audio files from scratch using Google Cloud's APIs?

    • A2: Yes, you can use the Text-to-Speech API to synthesize audio files from text.

    Q3: How can I transcribe audio to text?

    • A3: You can use the Speech-to-Text API to convert audio files to text.

    Q4: What kind of sentiment analysis can I perform on text data?

    • A4: The Natural Language API can be used to determine if the sentiment of the text is positive, negative, or neutral.

    Q5: Is it easy to combine multiple Google Cloud APIs for complex tasks?

    • A5: Yes, with minimal lines of code, you can integrate multiple APIs to perform complex workflows, such as combining speech synthesis, transcription, and sentiment analysis.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like