In a digitized economy, IT service slow-downs can make a difference between making a little or a lot of money, losing or retaining customers and DevOps and business processes that are rusty and slow or well-oiled and speedy.
A new class of IT monitoring and analytics software, AIOps, is primed to address this challenge. We recently discussed how AIOps software lets you see the overall health scores for all your core and edge IT systems to make the best decisions fast. We’ll now explore how Dell’s CloudIQ AIOps software shows you unusual activity so you can take proactive action to manage and protect the storage devices where your data lives.
Acting on the Unusual with Performance Anomaly Detection
Storage systems encounter constantly changing workloads. Hence performance varies depending on the type of workload such as write-heavy (data snapshots), read-heavy (retrieving cold data) and heavy or low bandwidth functions. Storage performance of each workload type may be impacted differently under different situations such as work overload, hardware malfunction, virus or cyberattack, to name a few. Ultimately, storage performance anomalies significantly impact application performance.
Technical Solution: Performance Anomaly Detection
Three main factors distinguish CloudIQ Performance Anomaly detection:
- Accurately detecting performance impact in storage systems is challenging due to ever-changing workload patterns. A patented performance impact algorithm gives CloudIQ an edge by detecting performance impacts on workloads whose characteristics remain static in a particular range for at least an hour. Hence, it ignores transient performance impacts on workloads that are only changing over a brief time period. Hence the solution accurately detects persistent performance impacts ̶ the impacts that matter to the business.
- The solution finds performance impacts for different workload types using different types of performance metrics such as percentage of read/write and bandwidth size. Using what data scientists call a “little law and bucketing” approach, the algorithm builds the model every day to learn the drift in performance and keep the accuracy of predictions trustworthy.
- Using a unique model based on IOPS (input/output per second) and latency for each workload type, the performance impact is displayed on a simple graph showing the time, duration and size of the performance Impact. For a performance-impacted region, you can see the top three possible cause and resource contention analyses.
Performance Analytics for Other Types of Systems
CloudIQ provides Performance Anomaly Detection for key servers performance indicators, such as CPU and memory utilization, power, data protection appliances, IP network switches and SAN switches. It also monitors potential activity on those devices that could lead to anomalies such as incoming and outgoing read/write throughput, errors, link resets, congestion spread and more.
Performance Impact Analysis for those systems is also on the horizon.
What Users Say About CloudIQ’s Impact
“It makes it insanely easy to get alerts and analytics from servers, storage and networks that I have deployed to multiple clients without having to log into each client individually – the single pane of glass is very helpful. I also have the CloudIQ app on my phone, so I get push notifications as soon as something happens and know when an alert is generated before the client notices if they notice at all.” – Senior Systems Engineer, Service Provider
Seeing More Clearly
Having a way to see unusual infrastructure behavior and what will happen if you’re not proactive is essential. AIOps can be a long lasting impact for your whole business. Surveys show that CloudIQ enables IT teams to resolve infrastructure issues 2X to 10X faster¹ and saves them one workday per week on average.¹
1 Based on a Dell Technologies survey of CloudIQ users conducted May through June 2021. Actual results may vary.