When was the last time operations came to a halt due to an unplanned server downtime? Do you remember the moments when you had to work extra hours to make up for productivity loss? Well, those are the times we point fingers at the Infrastructure team for the unexpected server failures or for not being more prepared.
Organizations tried to solve this issue by moving from a corrective maintenance approach to a preventative maintenance approach. This is when one goes from reacting to a problem to being more proactive by replacing components before its full lifetime. One problem with that is poor asset utilization.
This is where predictive maintenance analytics come in to help organizations achieve balance, high asset utilization, and savings in operational costs, plus experience a boost in productivity with just-in-time component replacements.
How does Predictive Maintenance work?
Customer servers: A Predictive Maintenance case study
Now the question is: how do you implement predictive analytics in this business operation? To predict probable downtimes for a customer’s servers, we used Analance Advanced Analytics for data modeling and Analance Business Intelligence for reporting, dashboarding, and alerts.
We first monitored the customer’s server utilization and laid out a solution plan by forecasting key system metrics like CPU, RAM, and memory utilization. We then combined the values of these metrics to perform a multiclass classification analysis to predict possible server downtimes.
We also defined server utilization thresholds. If usage exceeded the set threshold, auto alerts would be sent to the required stakeholders for corrective action.
The Solution Plan
We had data from an ELK stream, which had indexes that were dedicated for capturing the system’s key metric values in real time. We sourced this data through Analance’s Elasticsearch connector, which allows for live streaming data to be made available inside the platform.
Data source – Metricbeat, ELK stream
Connector – Analance Elasticsearch Live
The data had the real-time values of all the major system metrics like CPU, Memory, RAM, inbound and outbound traffic, number of processes, etc.
Running the forecast
Once the data was prepared, the next step was to forecast the key metrics chosen for the prediction. There were two options to run this forecasting in Analance: use any of the 41 prebuilt machine learning algorithms or add a custom script through Jupyter which could be integrated in Analance.
Since we had to perform a complex multivariate forecasting, not a part of the predefined set, we decided to write the scripts in Python.
We used the statsmodels.tsa.vector_ar.var_model for a multivariate forecasting analysis using Jupyter notebooks, which was the most convenient IDE to use the required modules. Once the script was written and verified, we checked the performance of the code to meet our requirements. The accuracy of the forecast was good and accepted.
The scripts were then imported into Analance Advanced Analytics using the Custom Notebooks option to use the output of the forecasting model as the data sources for next steps.
Integrated Jupyter scripts in Analance
We forecasted the CPU, Disk, and Memory utilization and monitored for possible surges. These forecasted values were then combined into a single and used as input for the classification for outage.
Analance Advanced Analytics has eight prebuilt algorithms for classification. We ran the classification model with all the available algorithms in an ensemble mode and checked results to find the best performing model for this scenario.
The Multiclass Random Forest algorithm had the highest accuracy and optimum Recall, F1 score, and Precision compared to the other classification models.
Analance Multiclass classification model results
Setting up custom alerts
The algorithm would classify the utilization metric values as Normal (Green), Warning (Yellow), and Outage (Red) based on the forecasted values and rules for the maintenance.
When the model predicts a Warning (Yellow) or an Outage (Red), Service Managers are alerted in real time about server conditions. These smart alerts allow Service Managers to be more agile to prevent outages and avoid the unexpected downtimes.
Visualizing insights at a glance
The solution goes beyond real-time predictions and alerts. This model can be seamlessly deployed into the Analance Business Intelligence module to visualize data models through a live dashboard, showing all the current levels and predicted values of key metrics.
A real-time server performance dashboard with defined red and yellow threshold levels
Analance live dashboards empower Service Engineers and Managers with real-time server performance metrics. They can monitor predictions with actual performance to ensure everything is up and running optimally.
Predictive maintenance is the next best form of maintenance as it:
Reduces unplanned downtime
Reduces maintenance costs
Increases overall equipment effectiveness
Guarantees on time delivery
Simulates a stress-free environment
Analance can help you solve your toughest business operational challenges too. Find out how you can benefit from predictive analytics. Book a demo today!
About The Author
Salma Aziz, MSc
Salma Aziz leads the go-to market strategy and collaborates with product, sales, solutions, and the marketing teams to help realize how solutions designed by Ducen accelerates business transformation.
The Ducen blog is a platform for challenging your perspectives and intelligence. It is a way for us to keep learning and to share that knowledge to our industry. Improve your industry intelligence with DucenIQ.