Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    Lex Fridman Podcast Chatbot with LangChain Agents + GPT 3.5

    blog thumbnail

    Introduction

    Today, we're going to focus on building tools using the LangChain library. These tools can be used by agents, essentially large language models that can decide and utilize tools to enhance their abilities beyond just text completion. In our case, we're creating a tool to handle queries related to Lex Fridman podcasts.

    Introduction

    When I refer to an agent in this context, it's a large language model (LLM) that can decide and use tools to answer queries more effectively. These tools allow the agent to fetch, process, and utilize additional data beyond its initial training set.

    Visualization of the Process

    Typically, you have a query that is inputted into an LLM, and it outputs a completion. An agent, however, will ask whether any available tools can help answer the query better. It then decides which tool to use, generates input for that tool, and processes the tool's output to form the final answer.

    For instance, if the tool is a Lex Fridman database, the agent will generate a query for this database, send it, process the response, and eventually produce a final thought for the user.

    Implementation

    Prerequisites

    First, we need to install several libraries, including Hugging Face datasets, Pinecone, LangChain, OpenAI, and TQDM.

    !pip install datasets pod-gpt grpcio-tools langchain openai tqdm
    

    You'll also need API keys for OpenAI and Pinecone.

    Data Setup

    Next, we'll download the dataset of Lex Fridman transcripts:

    from datasets import load_dataset
    
    data = load_dataset("lexfridman/lex-transcripts")
    

    Reformatting Data

    We'll reformat this data to fit the Pod GPT indexer, which involves creating a specific structure with IDs, text, and metadata.

    formatted_data = [
        ("id": row["id"], "text": row["transcript"], "metadata": {"title": row["title"], "url": row["source"])}
        for row in data
    ]
    

    Indexing Data

    Initialize the indexer object and add the reformatted data. This will process the data and chunk it into smaller parts suitable for embedding.

    from pod_gpt import GPTIndexer
    
    indexer = GPTIndexer()
    for item in formatted_data:
        indexer.add(item)
    

    Setting Up Pinecone

    Initialize Pinecone, create the index object, and set up retrieval components within LangChain.

    import pinecone
    
    pinecone.init(api_key="YOUR_API_KEY", environment="us-west1-gcp")
    index = pinecone.Index("pod-gpt")
    
    from langchain import OpenAI
    from langchain.embeddings import OpenAIEmbeddings
    from langchain.vectorstores import Pinecone
    
    llm = OpenAI(api_key="YOUR_OPENAI_KEY")
    embeddings = OpenAIEmbeddings()
    vectorstore = Pinecone(index, embeddings)
    

    Creating the Retrieval QA Tool

    We create a tool that the agent will use to query the Lex Fridman database.

    from langchain.agents import Tool
    
    def retrieval_function(query):
        return retriever.run(query)
    
    tool_description = "Use this tool to answer queries using Lex Fridman podcasts."
    tool = Tool(name="LexFridmanDB", func=retrieval_function, description=tool_description)
    

    Initializing the Agent

    Configure the agent with memory, agent type, and maximum iterations.

    from langchain.memory import ConversationBufferWindowMemory
    from langchain.agents import initialize_agent
    
    memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history")
    
    agent = initialize_agent(agent_type="chat-bot", tools=[tool], llm=llm, memory=memory, max_iterations=2)
    

    Customizing the Prompt

    Customize the initial system message for the prompt.

    from langchain.prompts import ChatPromptTemplate
    
    system_message = ChatPromptTemplate(system_message="Hello! I'm a knowledgeable assistant here to help with Lex Fridman podcast queries.")
    
    agent.update(system_message=system_message)
    

    Testing the Agent

    Let's test the agent with some initial queries:

    response = agent(("query": "Hi, how are you?"))
    print(response)
    
    response = agent(("query": "Ask Lex, what is the future of AI?"))
    print(response)
    
    response = agent(("query": "What does he think about space exploration?"))
    print(response)
    

    Conclusion

    By using LangChain and GPT-3.5, we've built a sophisticated agent capable of querying a database of Lex Fridman podcasts. This setup allows the model to leverage external data sources dynamically, making it far more versatile than a standard LLM.

    Keywords

    • LangChain
    • GPT-3.5
    • Pinecone
    • Lex Fridman Podcast
    • Conversational Agent
    • Large Language Model

    FAQ

    What is an agent in the context of large language models?

    An agent is a large language model that can decide and use various tools to answer queries more effectively than simple text completion.

    What tools did we use to build the Lex Fridman podcast chatbot?

    We used the LangChain library, OpenAI's GPT-3.5, and Pinecone for vector database indexing and retrieval.

    How does the agent decide to use a particular tool?

    The agent uses a reasoning-action loop to determine whether using a specific tool will improve its ability to answer the query. It then generates an input for the tool based on the user's query.

    Can this setup be applied to other data sources?

    Yes, the same principles can be applied to other podcasts, media forms, internal company documents, PDFs, and more.

    What is the role of conversational memory in this setup?

    Conversational memory allows the agent to remember previous interactions, enhancing the context and coherence of multi-turn conversations.

    Why do we set a maximum number of iterations for the agent?

    Setting a maximum number of iterations prevents the agent from getting stuck in an infinite loop of tool usage and query refinement.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like