Good morning, folks. My name is Aaron Friel, and today I'm talking to you about building AI-powered applications here at Pulumi. This workshop discusses the target audience, those familiar with Pulumi, and interested in AI generative applications, large language models, etc.
I'll present Pulumi AI and Pulumi Copilot's building process, key learnings from creating the AI team a year ago, and the fundamentals of AI applications. I aim to make this educational and helpful for everyone.
Pulumi is an infrastructure as code tool, offering a declarative way of specifying desired configurations in a cloud provider. This ensures consistent and repeatable application instances across different environments. Pulumi distinguishes itself by letting you use familiar programming languages such as Python, JavaScript, TypeScript, Go, etc., along with existing tooling like code completion and inline documentation.
Pulumi supports over 150 cloud providers, including Kubernetes, and offers both open-source and commercial SaaS solutions. Key benefits include creating reusable workflows and collaborating freely with teams.
AI applications require solid fundamentals, like a traditional full-stack application but with added complexity. Managing this complexity using Pulumi simplifies building production-grade AI applications. For example, creating an EKS cluster within AWS can be streamlined using Pulumi, reducing manual orchestration into six lines of code.
In March 2023, OpenAI launched the GPT 3.5 API, which enabled creating applications resembling the ChatGPT functionality. Pulumi AI was launched on April 17th, roughly six weeks later. The critical learning involved transitioning from merely wrapping GPT to creating retrieval-augmented generation (RAG) applications.
A typical first AI app involves a static site using an AI API with streaming enabled for seamless user experience. A standard interaction involves a user prompt, system prompt, and assistant response. RAG applications, however, introduce embedding and vector databases to find the appropriate context for user queries, enhancing the accuracy and reliability of responses generated by the AI model.
Establish early with developer APIs from providers like OpenAI or Anthropic. Identify your application requirements, especially if it’s latency-sensitive.
Choose frameworks aligning with your team’s current skills. Utilize tools like Langchain, Pinecone, Versell AI SDK, and Azure AI search. Pinecone integration provides a reproducible setup for creating RAG applications efficiently.
Selecting the right model, like GPT-4 or Claude 3.5, depends on the task at hand. Evaluate using benchmarks like MMLU or tools like LLMc AIs Arena. Performance measurement focuses on metrics like time to first token (TTFT) and latency, balancing them against the quality trade-offs.
Pulumi Copilot added capabilities like answering or processing user queries into functional actions. Using Pulumi for our deployment and management process, secret management, and developer environment setup significantly enhanced our deployment efficiency.
Pulumi Copilot development involved leveraging internal tools and feedback loops, adapting to fulfill user needs effectively.
Q1: What is Pulumi?
A: Pulumi is an infrastructure as code tool that allows specifying cloud configurations using familiar programming languages and existing tooling.
Q2: How does RAG (Retrieval-Augmented Generation) work?
A: RAG uses embeddings that convert text into searchable vectors. These vectors help find relevant documents or context to supplement user queries, improving AI's response accuracy.
Q3: What are some practical AI development tools?
A: Tools like Langchain, Pinecone, Azure AI Search, and the Versell AI SDK are recommended for building and managing AI applications.
Q4: How do you measure AI model performance?
A: Performance measurement focuses on metrics like time to first token, latency, and the accuracy of responses using benchmarks like MMLU, guided by tools like LLMc AIs Arena.
Q5: What frameworks should we use for AI application development?
A: It depends on your team's current skill set. For Python developers, Langchain is suitable, while for Node.JS, the Versell AI SDK is recommended.
Q6: Why is embedding crucial in RAG applications?
A: Embedding helps in converting textual context into vectors, making it easier to search and find relevant information which augments the AI's response accuracy.
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.