PagerDuty's Digital Ops Update Promotes Proactivity with Advanced ML, Automation

PagerDuty is adding new capabilities for event management, incident response, and AIOps. The features, along with others, look to support advanced analytics to promote a “proactive” approach to digital ops. 

Tags: AIOps, analytics, automation, event management, intelligent data, PagerDuty,

PagerDuty is adding new capabilities for event management, incident response, and AIOps, as well as advanced insights and analytics that enable a proactive approach to digital ops. 


The features look to help enterprises lower cost and reduce incidents; it also applies machine learning and automation to break down the complexity of managing digital operations by reducing interruptions and minimizing the time to resolve issues.


PagerDuty's latest platform release empowers teams in both large enterprises and high growth disruptors to prevent incidents that cause customer dissatisfaction and negative business impact, so they can confidently scale services and accelerate initiatives that capitalize on strong consumer demand for digital services.


"We now live, work, and learn primarily online. Digital is the new operating system, and operations teams are now on the frontline that keeps businesses running as they manage the technologies that deliver the customer experience and revenue," said Jennifer Tejada, CEO at PagerDuty. "These teams deserve a cloud-native, a real-time platform designed for unpredictable emergent work that automates in service of people. PagerDuty is the modern platform for action in the digital default world."

PagerDuty’s DevOps Advocate Mandi Walls detailed why the company says automation is the “key to modern IT” in a recent blog post.

Now that many of even the most basic applications have complex ecosystems, sprawling dependency chains, and fast-evolving platforms, old processes are no longer effective for dealing with components in a timely manner. This includes not only when things are running smoothly, but also when there is a problem. Automating tasks during an incident response workflow can save time and help your team deal with the demands of modern system architectures.


We can use automation in many different areas to help our teams with three major pain points:


Reducing toil
Reducing mistakes
Keeping pace with fast-moving development cycles


Let’s take a closer look at each of these points.


Reducing Toil
Toil is how we describe boring, repetitive tasks that can make up a significant part of the work day in large environments, such as deploying new instances, installing updates, and configuring connections to various services. All of these things need to be done—and done correctly—for our environment to continue serving our users. They often aren’t particularly challenging, but if you are prepping a new environment of even several dozen instances, creating and configuring them all manually can be a mind-numbing exercise.


Toil contributes to burnout and employee disengagement, so it’s important to minimize it as much as we can. When we automate basic tasks, including tasks that will be done over and over again, we free up time for other, more challenging tasks.


Reducing Mistakes
Repetitive tasks can also lead to mistakes. When a task has many steps or complex commands, it’s easy for things to be missed or entered incorrectly. When we create a piece of automation, whether with a tool or a script, we have the opportunity to preserve those tasks in their correct form for future use.


For example, maybe we have some applications that run in virtual machines in a public cloud. Automation can leverage APIs in our environment, which allows our team to always deploy new instances using the same configuration options as the existing systems. This helps ensure everything matches and meets our requirements. We don’t have to create a complex set of documentation to outline which things to select or click on in a graphical user interface ; our team instructions simply rely on the automation and the API it integrates with to produce what we need.


Keeping Pace With Fast-Moving Development Cycles
All parts of the software development lifecycle are seeing the use of more and more automation. For example, the use of Continuous Integration/Continuous Delivery methodologies pretty much requires every step to be heavily automated to keep up with the pace of changes.


Putting changes through a continuous pipeline and into production—whether they are new features, bug fixes, or operational changes—will have your team looking at automating the delivery of files, the updating of configurations, and the deployment of new resources without manual intervention to avoid holding back progress.

PagerDuty’s Platform To Promote Intelligence, Automation

The new capabilities in PagerDuty’s Event Intelligence (event management and AIOps solution within te company’s platform) are as follows:


Intelligent recommendations: New machine learning capabilities in PagerDuty's Event Intelligence offering automatically identify the noisiest services and provide recommendations to reduce noise so teams can focus on the incidents that matter.

Change impact mapping: By linking changes in a customers' software deployment pipeline with incidents in its digital operations, PagerDuty allows teams to quickly find and resolve an incident's root cause. This new capability integrates change events from code repositories via new integrations with GitHub, Puppet, and Evolven.


Dynamic service dependencies: New capabilities in PagerDuty's Service Directory, which provides a single view of a customers' entire digital operations, dynamically identifies dependencies between people, changes, incidents, and services in real-time. It then applies machine learning to automatically keep a company's service directory up to date, preventing redundant work between teams, and surfacing recommendations for automation of incidents without needing to follow manual steps or learn advanced skills.


Flexible automation controls: Applying AI and automation to something as critical as a company's digital operations requires complete trust. The platform now includes flexible automation controls to safely ensure that a human is in control at all times by pausing incident notifications, to give systems a chance to auto-remediate before the responder steps in, and by providing push-button automation so teams can run automated response play and monitor results.

One high-profile customer, Zoom, described how PagerDuty’s automation has supported the company’s explosive growth. 


As Eric Yuan, CEO and founder of Zoom, explains, "We have experienced unprecedented growth in the last six months, requiring us to scale our service without compromising the great video experience our customers expect from Zoom. We could not have done this without PagerDuty underpinning us, and this new platform release will only make things easier for our digital teams."


In addition to the platform release, PagerDuty is simplifying pricing, execs said.