ad
ad
Topview AI logo

Let's build a Text to Music Generation App using Generative AI

Science & Technology


Introduction

In this guide, we will develop a text-to-music generation application using Meta's AudioCraft library, specifically leveraging the MusicGen model. This application will allow end users to input a text prompt, from which it will generate corresponding music. In this step-by-step tutorial, we will make use of Streamlit for a user-friendly interface while implementing various functions for model loading, music generation, audio saving, and file downloading. Let’s dive in!

Prerequisites

Before we get started, make sure you have the necessary libraries installed. Clone the AudioCraft GitHub repository and install the requirements.

git clone https://github.com/facebookresearch/audiocraft.git
cd audiocraft
pip install -e .

Note: It's recommended to check the dependencies and install them carefully, especially if you're using Python 3.8 or higher.

Setting Up the Project

  1. Open your VS Code and create a new file named app.py.

  2. Import Necessary Libraries:

    import streamlit as st
    import os
    import torch
    import numpy as np
    import base64
    from audiocraft.models import music_gen
    
  3. Load the MusicGen Model:

    Create a function to load the pre-trained MusicGen model.

    @st.cache_resource
    def load_model():
        model = music_gen.from_pretrained("facebook/musicgen-small")
        return model
    

Creating the Streamlit Interface

  1. Set Up the Streamlit App:

    Define the layout and page configuration for the application.

    st.set_page_config(page_title="Music Gen", page_icon="?")
    st.title("[Text to Music](https://www.topview.ai/blog/detail/text-to-music) Generation")
    
    with st.expander("See Explanation"):
        st.write("""
        This app is a music generation application built using Meta's AudioCraft library and
        it can generate music based on your natural language description.
        """)
    
  2. Get User Input:

    Add a text area for user prompts and a slider to select the audio duration.

    description = st.text_area("Enter your description:")
    duration = st.slider("Select time duration (seconds)", 2, 20, 5)
    

Implement Music Generation Functionality

  1. Generate Music from Text:

    Create functions to generate music based on user input.

    def generate_music_tensors(description, duration):
        model = load_model()
        generation_params = (
            "use_sampling": True,
            "top_k": 50,
            "duration": duration
        )
        output = model.generate([description], **generation_params)
        return output[0]
    
    def save_audio(samples):
        sample_rate = 32000
        save_path = "audio_output/"
        os.makedirs(save_path, exist_ok=True)
        audio_path = f"(save_path)audio.wav"
        torch.aud.save(audio_path, samples, sample_rate)
        return audio_path
    
  2. File Downloading:

    Implement a helper function to allow users to download the generated audio file.

    def get_binary_file_downloader_html(bin_file, file_label):
        with open(bin_file, "rb") as f:
            data = f.read()
        b64 = base64.b64encode(data).decode()
        href = f'<a href="data:application/octet-stream;base64,(b64)" download="(file_label)">Download your audio</a>'
        return href
    

Integrate Everything

Use the above functions in the main application logic, handling the user input and generating the appropriate output.

if description and duration:
    music_tensor = generate_music_tensors(description, duration)
    audio_file_path = save_audio(music_tensor)
    download_link = get_binary_file_downloader_html(audio_file_path, "Generated_Audio.wav")
    st.markdown(download_link, unsafe_allow_html=True)

Running the Application

Run your Streamlit application using the command:

streamlit run app.py

Conclusion

After implementing the above code, your application will be capable of generating music based on text prompts. With this functionality, you can experiment with various musical genres, styles, and prompts, giving rise to unique audio outputs.

Keyword

  • Music Generation
  • Generative AI
  • AudioCraft
  • MusicGen Model
  • Streamlit
  • Text Prompt
  • Audio Output

FAQ

Q: What is the MusicGen model?
A: MusicGen is an AI model developed by Meta that generates music from natural language descriptions.

Q: How do I run the application?
A: After creating the app.py file and adding the necessary code, run the command streamlit run app.py in your terminal.

Q: Can I customize the duration of the generated audio?
A: Yes, you can use the slider in the Streamlit app to select the audio duration between 2 and 20 seconds.

Q: What kind of music can I generate?
A: You can input any description or genre, and the MusicGen model will generate music based on your input.

Q: Is the generated music free to use?
A: The generated music can generally be used, but it's advisable to check the copyright guidelines associated with the MusicGen model and Meta’s policies.