Top Down, Bottom Up or a Bit of Both? Process and Deployment Considerations for AIOps
AIOps – a must-have rather than a nice to have
Where IT is concerned, there’s no longer a valid business case for the old argument of “doing more with less.” The stakes are too high given the tightly connected global economy, the 24/7 speed of business, digital security threats, and their corresponding data protection regulations. On top of that, the shift to hybrid operations has provided valuable flexibility but multiplied potential failure points. Put simply, it’s no longer a question of if your organization needs to fully optimize its IT production environments, but why haven’t you optimized them already?
The only hitch is that effective IT management takes work. Even when nothing is breaking and your data centres aren’t being battered by hurricanes or holiday-driven demand spikes, software always needs to be updated or patched; security certificates need reissuing, and the interns forgot their passwords again. But since you can’t simply hire your way to seamless IT operations, you need to make them less reliant on human intervention. And artificial intelligence is the way to make them more autonomous.
Integrating AI into IT operations, or ITOps creates “AIOps.” This technique leverages the power of sophisticated algorithms to capture human insights into how your whole IT estate behaves – not just when everything is running smoothly, but what behaviours are early warnings of potential crashes. AIOps can go beyond detecting and diagnosing IT problems to proactively solving or even preventing them, closing the loop without requiring a human to step in.
Quantifying the value of AIOps
Compelling evidence for the value of AIOps is out there. According to a recent Forrester Total Economic Impact study, Digitate’s AIOps technology makes IT operations teams about 60% more efficient – a result of the teams’ increased productivity and ability to scale. The study concluded that a typical company with a small, 10-person ITOps team could save $1.4 million in labour costs (contract or permanent) over a three-year period. For a large enterprise, that figure could be multiplied around 25-50 times.
To take one real-world example, retail giant Walgreens has 9,000 stores and 4,500 call centre agents at four locations. During the COVID pandemic, the company would experience sporadic spikes in demand for vaccinations as the number of cases rose and fell. Supported by Digitate’s AIOps technology, Walgreens was able to determine when those spikes were most likely to happen and adjust store hours and staffing accordingly.
In addition, AIOps enabled Walgreens to optimize its Salesforce usage and automate the resolution of IT tickets. As a direct result, Digitate was responsible for resolving approximately 31% of Walgreens’ total IT tickets, along with successfully monitoring and managing 95% of all IT events since deployment.
Clear definitions: The key to successful AIOps implementation
Making the commitment to implement AIOps requires a strategic plan of action, of course. So it’s important to establish the rationale and context in which AIOps will be deployed. What problems are to be addressed? Is there a focus on specific areas or will there be a more holistic strategy? You need to define these requirements clearly, right from the start.
Typically, the first steps required in order to implement an AIOps solution are:
People: It’s important to assemble a project team to agree on the scope of work, set the criteria for potential vendors, and map out the entire engagement and deployment project. Identify a platform owner and executive sponsors, supported by strong IA architects and IA delivery leads. Key deliverables at this stage include:
Assessing the maturity of your current ITOps and IT production environment.
Assessing the most recurring issues.
Building a business case and defining a clear path to ROI.
Process: This is often the most difficult step for organizations because IT support usually relies on the “tribal knowledge” of the IT support team. These team members may belong to other organizations, for example, a System Integrator, which could mean the knowledge of the IT support function is not documented locally. Successful implementation requires the team to first:
Document each Standard Operating Procedure (SOP) that describes how IT support is provided. This is critical because AIOps tools need to be “educated” on how to perform support tasks.
Define and describe what are the organization’s most critical data flows. For example, what is normal and what is not for each observable element? (Such as IT service, traffic volume, or component state.)
Technology: Selecting the right solution from the right technology partner is a hugely significant decision, given the importance of the task at hand, the significant investment in resources, time, and money, and the assumed longevity of the relationship with the vendor. Typical considerations here include:
Listing the specific challenges and tangible deliverables.
Balancing short-term and long-term needs and cost-benefits analysis/ROI.
Qualities such as scalability, platform flexibility, and ease of use.
Whether to opt for best-of-breed point solutions or a single, unified IA platform that can handle both vertical and horizontal data flows. (I recommend the unified approach, which will facilitate integration points and the adoption of ML algorithms.)
Budget: Beyond the licensing, hardware, delivery, installation, and training costs associated with the platform of choice, the team should also consider wider organizational implications, such as change management. For example, they may need to retrain people whose tasks are now managed by IA for deployment elsewhere.
Top-down or bottom-up?
To actually deploy AIOps, there are two general reference models, which we refer to as Bottom-Up or Top-Down deployment. To better understand how these models are applied, Figure 1 below shows possible data flows for an enterprise with a typical technology stack, including ERP and other business applications, with a standard IT maintenance team.
Figure 1: An example of organizational data flow with a typical technology stack
The vertical dimension represents the technical layers needed to sustain a specific solution. The bottom and most fundamental layer is the hardware layer or infrastructure. Above that is the operating system that manages the communication and relationships of applications and hardware.
Above that lies the application layer, representing the actual business applications an organization might use – for example, an ERP suite, CRM system, email, website software, and databases, plus all the middleware or integration tools that connect them. The top layer illustrates the horizontal flow of data from one solution (column) to another.
During each transition, this data can trigger actions or decisions – or become enriched for future steps. All these layers, both horizontal and vertical, are constantly communicating among themselves, to keep the whole data flow running smoothly.
The choice of Bottom-Up or Top-Down deployment can be affected by a number of factors. For example:
What is the organization’s operational maturity? Are all stakeholders completely ready for a change? Have they successfully captured and prioritized their entire ITOps processes? Are all their SOPs documented?
What are the immediate versus longer-term organizational needs? Are there specific areas that they need to address right away? Or are the needs more holistic?
How fast is an enterprise looking to transform? Depending on the size, nature and structure of an organization, it might not be realistic to achieve complete transformation at the same time, globally.
What is the overall production environment architecture? What are the most problematic IT solutions and is any major change happening in production?
What is the architecture for IT support tools, for example, monitoring, messaging, and ticket management?
Who owns production support knowledge? How available is this knowledge?
What is the driver of this transformation?
Based on the answers to these questions, alongside other considerations and rationale, the appropriate deployment model can be selected. Each method has its own benefits and challenges and is best suited to specific scenarios.
Bottom-up deployment model
Deploying AIOps via the “Bottom-Up” model means it is applied at the very foundational levels of the organizational infrastructure IT layer and across all SOPs within that framework. This type of deployment has a longer lead time. However, once all the SOPs have been learned, AIOps can handle any number of typical situations that may arise operationally on a daily basis. Once the SOP learning is in place, AIOps can look at dataflow, how an organization manages master data and start applying organizational use cases to the situations it identifies as actionable.
This methodology requires a bigger investment in the beginning, and it has a slower ROI, but it creates a very solid base that provides broader business improvements over time.
Achieving effective autonomous IT operation support requires the automation of around 80% of all ITOps SOPs, which means achieving the following Intelligent Automation (IA) index target percentages:
50% of total tickets resolved by IA
95% of total alerts are managed by IA
80% of non-ticket support activities resolved by IA
Based on our experience it requires a minimum of 500 IA use cases to be deployed. So, if 50 are deployed each month it will take 10 months for deployment plus two months to set up a program, for a total of 12 months. This is very fast when compared to the average of two to three years.
Top-down deployment model
In the “Top Down” model, AIOps are applied to the most critical business data flows first, then automating others one by one. This approach, while providing a faster ROI, is usually a response to a specific problem that an organization has identified. It might create the illusion that the IA journey is no longer needed.
To avoid such a problem, a top-down model requires a carefully planned architecture to fit all data flow requirements into one single IA solution and an equally well-planned deployment strategy, so that each deployment improves the overall Intelligent Automation indexes. Organizations must consider all data flows, not just one, along with having an excellent understanding of just how the different end-to-end data flows connect with each other. While this can create short-term business value, benefits, and ROI, it might also be more expensive in the longer term.
The best of both worlds?
While these two deployment models outlined are very much “horses for courses,” dependent on the reasoning and needs of an organization, they are not necessarily mutually exclusive. As Boston Consulting Group (BCG) stated in its October 2020 report, AI is a Powerful Weapon in the Fight Against IT Problems, “by prioritizing use cases, you can start reaping the benefits of AI quickly — in as little as three months if you know how you want to use AI and can access the relevant data. Contrast that with an all-encompassing ‘big-bang’ approach, where you may wait two years for a grand unveiling.”
BCG goes on to assert that “by prioritizing high-value use cases, you visibly demonstrate the benefits of AI” in the short term by tackling immediate challenges, which “helps build support and funding for a continuing effort and for the necessary changes to processes and organization. This kind of progressive approach also lets you deploy your target operating model in a gradual, value-driven way. Use cases and operating models developed in parallel and in sync.”
This “hybrid” approach, where organizations can realize value from triaging immediate key problem areas through top-down quick fixes, while simultaneously committing to a bottom-up approach to AIOps deployment can, if carefully planned, present very good options.
CIOs are under constant pressure to provide good news to their bosses and boards of directors, and IT is all too often the favourite target. In such environments, a quick win to solve an immediate issue can spur a commitment to more major changes. A hybrid approach can be a perfect compromise if it is properly planned, explained, and executed.
AIOps delivers proven benefits. Customer satisfaction increases as mean time to recovery (MTTR) and incident management improve. Operational resources are used more efficiently, overall operating costs decrease, intelligent observation instantaneously flags, and can even pre-empt, potential operational problems. Employee satisfaction can also improve, thanks to the automation of lower-value and often tedious tasks, allied to greater control of operations and empowerment to focus on higher value-add work.
The key to unlocking all of this value is ensuring that the deployment of AIOps is optimized right from day one. The team needs to create an objective view of organizational needs that can prioritize focus areas and choose the correct path to intelligent automation.
The AIOps journey is a necessary path and organizations must plan how to make it a wanted one, too. Implementing IA at scale is akin to hiking a mountain; the challenge can be great but the rewards and satisfaction are well worth the time and effort.