Analytics and Data Science in Telecoms:
Network Congestion Forecasting
In the epilogue to “As You Like It” Shakespeare wrote, “If it be true that good wine needs no advertisement, ‘tis true that a good play needs no epilogue; yet good advertisement adds to the popularity of good wine, and good plays prove the better by the help of good epilogues.” Following in his footsteps this blog intends to give a ‘good advertisement’ to the work being done at Sonalake in the field of data analytics for telecom service providers (telcos).
The telecoms industry, with its huge volumes of rich data, presents a ripe target for advanced analytics. As we combine deep telecoms domain knowledge with a breadth of experience building analytics software across verticals, it’s a great space for us to pitch in with our expertise. It also presents an opportunity to leverage our VisiMetrix executive dashboard solution that is already deployed within several telcos.
In this series of posts, I will describe some of the research areas we are exploring at Sonalake. For the first post, we will concentrate on congestion forecasting.
Network congestion is a widespread problem for telcos as it diminishes quality of service (QoS) for users. Telcos monitor congestion round the clock in an effort to highlight QoS issues so that they can be addressed promptly. They also preemptively address issues where they can, but the tools to flag such issues are limited. Most of them rely on guesswork of experienced operators and to our knowledge a completely automated solution does not exist.
We recently partnered with Vodafone Ireland to determine if we could help automatically predict 4G congestion with a view to:
- Help determine where network investment can have the greatest impact
- Cell congestion issues impact operational costs and they are keen to upgrade only where necessary due to cost involved
- With increasing data volume and the introduction of 5G, more sophisticated and automated methods to predict congestion are clearly needed
Before we partnered, Vodafone used cell-based KPIs with predefined thresholds. Using this information about threshold breaching, they were able to determine congested cells across the network. The graph below shows the KPIs and their thresholds for a cell.
Fig 1: KPIs with thresholds
So, ours was a two-pronged problem:
- Firstly, we needed to predict these KPIs for all the cells in their entire 4G network, covering many thousands of cells across Ireland
- Secondly, we had to find a way to combine the congestion from all three KPIs and rank cells from the most to least congested
Divide and Conquer
Coming to the first problem, the challenge was to come up with a KPI-agnostic method to predict each of the KPIs. This need arises from the fact that though we were predicting congestion for 4G network, our long term goal is to extend this method to 5G and other new KPIs. To this end, we used a decomposition technique and attuned it to telecoms data. This involved experimenting with many techniques and creating methods to automate finding the right parameters for decomposition which otherwise are handcrafted. Though it was applied to the three KPIs shown above, we did try it on several other typical telecoms KPIs and found its results satisfactory. This method decomposes a KPI into seasonal, trend and noise components as shown in the following graphs. The seasonal component is the periodic daily or weekly pattern that repeats itself. The trend is the overall level of the time-series (or signal) and noise (or remainder), as the name suggests, is the random component of the signal.
At first sight, decomposition seems to aggravate the problem as it adds new components to our model. But it simplifies prediction as we have separated various underlying processes into components which are easier to predict.
We then set about predicting each of these components separately. We attempted a variety of techniques ranging from neural networks to time-series forecasting, as well as developing our own custom methods for some components. We then took the most accurate methods for the entire network.
The graph below shows how we predict each signal component for a cell separately using the method best suited to each, and add them together to get the predicted KPI.
Fig 3: Predicting each component separately
Some of the methods used for prediction performed poorly when averaged across the whole network but were accurate when dealing with congested cells. Therefore, they might be useful in the future to further refine our predictions.
Double, double toil and trouble!
Having predicted the KPIs we could readily determine the number of threshold breaches. But the story got more complex here, awaiting us was the arduous task of combining these breaches for the three KPIs.
From here we had to identify a measure that:
- Combines the congestion in the three KPIs in a way that allows them to be ranked
- The error introduced by it should either be miniscule as compared to the error in prediction or it should mitigate that error
After a great deal of work, we fashioned the silver bullet for both these problems. The measure segregated non-congested cells from the congested ones, and not only that, it was also KPI agnostic and could include any number of KPIs.
Fig 4: Percentile scores histogram
Finally, to rank the cells from most to least congested, percentile scores of this measure were used. The histogram above shows the percentile and number of cells above each percentile, we chose 97 percentile cut-off and as such, 183 cells from the whole 4G network were congested. Similarly, we could set different cut-offs for the percentile scores and categorize the cells above a certain score as congested (e.g., if we set the cut-off at 98th percentile we get 122 congested cells).
When the hurlyburly’s done
Once we had our predictions and ranking measure sorted, we started the task of measuring accuracy. This is complicated because not only does it change with the percentile cut-off, but also there are two kinds of errors that affect accuracy:
- Errors in prediction when compared to the actual
- Errors in ranking measure
We separately analysed both of these errors but with similar methodology.
For the first one, at each percentile cut-off we chose the following accuracy measure:
The measure takes into account the difference between the predicted list of congested cells and the same list but from the actual data observed over the same time period. The histogram below shows the accuracy for different cut-offs: