I'm absolutely delighted and thrilled to be with you this afternoon. My name is Phil T, and I'm listed as the CEO of Moogsoft. I had a funny experience coming into the conference; while traveling on a plane, I was working on my laptop, sharing all the deep trade secrets of Moogsoft. A guy next to me noticed and asked if I was with Moogsoft. I confirmed and humorously explained that I founded the company by unblocking the first toilet there.
He then inquired if we were exhibiting at re:Invent. I confirmed, noting the importance of Amazon and AWS as critical go-to-market partners for Moogsoft. I ended up boasting about our substantial booth size to avoid underestimating our presence.
Moving on, I'm here to talk about the journey of Moogsoft, founded eight years ago in London, where we've been pioneering AI in service assurance. Service assurance has been around for a long time, but with the seismic shift of workloads moving to the cloud, AI has become essential to manage the increasing complexity. Today, I'll discuss how we began focusing on false incidents and what happens when you introduce metrics into that equation, combining AI, metrics, and logs.
Back in the 1980s, we dealt with a different environment dominated by mainframes. These systems were highly integrated with low event rates and low change rates. However, as we transitioned through the decades, distributed computing and client-server models complicated things. The introduction of virtualization and now containerization and microservices have added layers of complexity.
This has led us to a state where our digital infrastructure generates vast amounts of data – billions of metrics, events, and logs. Maintaining service quality in this complicated and ever-changing environment is daunting, but it must be done given the high demands of end-users.
Most significant challenges stem from the complexity and constant change. Traditional monitoring systems are often ineffective in this new world, leading to frequent systematic failures and poor service quality. Traditional approaches, such as rules-based monitoring, are rigid, lack flexibility, and often fail to detect incidents effectively.
Moogsoft advocates for AI as the solution to overcome these challenges. AI allows combining the system state and measurements without needing static analysis and manual configurations. Unlike traditional rules-based system monitoring, AI adapts, scales, and provides more precise Incident detection. AI enables continuous service assurance, leveraging machine learning to analyze and respond to changes dynamically.
In practice, AI enhances observability by turning raw metrics into meaningful data. Metrics on their own can be overwhelming and lack context, but AI can identify anomalies from metrics and logs and correlate them into actionable alerts and incidents.
To maintain high service quality, Moogsoft developed an approach to push AI to the edge – where metrics data is collected and analyzed in real time. This method reduces data overload and ensures relevant data is surfaced and acted upon quickly. Integrating supervised machine learning at the core and closed-loop automation, AI Ops facilitates continuous assurance and self-healing systems.
In conclusion, integrating AI into observability is vital for navigating the dense complexity of modern digital infrastructures. Combining AI, metrics, and logs allows for better service assurance, continuous monitoring, and improved overall service quality.
Q: What is Moogsoft's approach to handling the vast amounts of metric data? A: Moogsoft uses AI to push intelligence to the edge, where live models of metrics are built to detect anomalies in real time.
Q: How does Moogsoft differentiate important metrics from noise? A: Moogsoft applies algorithmic techniques such as information theory to measure the informational content of metrics and effectively filter out noise.
Q: What kind of challenges does AI face in the realm of digital infrastructure? A: One significant challenge is the volume and variety of data generated, requiring sophisticated algorithms and continuous learning to provide useful insights.
Q: How does AI Ops help in ensuring continuous service assurance? A: AI Ops uses AI and machine learning for real-time incident detection, correlating anomalies, and driving automated remediation, thus facilitating continuous assurance.
Q: Are humans still needed in operations with the advent of AI Ops? A: Yes, humans play a crucial role in higher-level decision-making and dealing with more complex issues that AI cannot solve autonomously.
In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.
TopView.ai provides two powerful tools to help you make ads video in one click.
Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.
Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.