It’s hard to imagine a modern society without telecoms networks. So, on the rare occasion when these networks go down, the impact is huge and keenly felt, both by the operator itself and by the public/consumer.
But the factors leading to such outages are on the increase. Extreme weather events, human errors, cybersecurity, or just plain old negligence when it comes to handling maintenance or construction are some common examples of networks outages. Telecoms networks are also much more likely to be attacked by hackers, of the criminal and the terrorist varieties. A recent analysis1 noted twelve cyber attacks on telcos in 2024, some of which affected more than one operator– for example, in July four networks in France suffered a major assault2 timed to coincide with the Paris Olympics.
For operators, this means lost revenue from the calls not carried, increased churn by dissatisfied customers, and reputational damage. A half-day outage in November 2023 cost Australian telco Optus USD 40m in direct lost revenues3, with knock-on effects on the financial results and share price of parent company SingTel. CEO Kelly Bayer Rosmarin resigned after the network outage.
A crucial component in the 5G network, which differs from its predecessors in being completely based on IP networking standards, is the IP Multimedia Subsystem (IMS). This is an architectural framework that allows the network to provide all those services that have their origins in the way that the circuit switched network functioned, but are now an important part of users’ expectations – such as voice and video calls, and text messages. If the IMS goes down then the network simply can’t deliver these services, even if other components are still working.
Historically network operators have adopted a two-pronged approach to protection, including: Geographic Redundancy, whereby multiple sites across the network are capable of handling its full capacity, so protecting against localised outages or site-specific failures; and Disaster Recovery (DR), intended to handle more catastrophic situations where either there is a failure of the core network or multiple sites simultaneously.
However, the traditional approach, of building a DR site within the network operator’s own premises, comes with prohibitive costs. Operators need to invest heavily in infrastructure, including hardware, Heating, Ventilation and Air Conditioning (HVAC), software licenses, and operational staff – all for a site which will only be used in emergencies, and won’t generate any revenue under normal circumstances.
Happily, for network operators, a cloud-based DR solution offers a compelling alternative, enabling them to reduce costs while ensuring service continuity. Public cloud is not only a new technology model but also a new business model, which enables scalability without a big upfront cost.
The public cloud has emerged as a modern and efficient solution for IMS disaster recovery, largely due to, IMS being less intensive on the data plane and the increasing adoption of cloud-native technologies in telecommunications. Unlike traditional infrastructure, which requires significant upfront CAPEX for hardware and network elements, the public cloud offers a cost-efficient alternative that can dynamically scale resources based on network traffic demand. This scalability allows operators to respond quickly to demand, handle outages while minimizing unnecessary expenses during normal operation.
With cloud-native IMS solutions, disaster recovery systems can scale dynamically in response to network traffic needs. By leveraging threshold-based autoscaling triggers and continuous monitoring of hardware and software KPIs, operators can scale resources in seconds, not hours, ensuring the network meets real-time traffic demands. This elasticity ensures the system can handle sudden increases in traffic, such as emergency communications, while keeping the operational footprint minimal in regular conditions to reduce costs.
Operators can realize significant Total Cost of Ownership (TCO) savings by adopting a cloud-based DR strategy. For instance, when there's no disaster, a small footprint IMS consumes minimal cloud resources. In the event of a disaster, the IMS system can scaleup rapidly to handle peak loads. Since operators only pay for full capacity during the outage, the cloud model ensures cost efficiency—especially compared to traditional on-premises deployments, which require continuous investment in real estate, computing, networking, HVAC, and other infrastructure, regardless of usage.
Adopting a cloud-based approach to DR involves several sequential steps.
Option 1. A full mesh connection between the CSCF components (Call Session Control Functions) across both networks, which is a cleaner technical approach.
Option 2. Breakout via the IBCF (Interconnection Border Control Function), which is simpler and more suited to brownfield deployments. In brownfield environments, this method is preferable due to complexities surrounding integration and ensuring service parity. However, in this approach, specific provisions must be made for subscriber re-routing to the DR IMS in case of a disaster.
When planning an IMS disaster recovery (DR)solution, operators must balance technical and business factors to ensure their DR strategy meets operational needs without unnecessary complexity or cost. Here are the key considerations:
Designing an effective disaster recovery IMS solution requires careful consideration of both technical and commercial aspects. The solution must be scalable, cost-effective, and robust enough to handle critical workloads, especially during crises when operational continuity is paramount.
At ng-voice, we have developed a truly cloud-native, Hyperscale IMS Solution that is infrastructure-agnostic, cost-efficient and highly automated - designed for exceptional flexibility and scalability. Our containerized network function scan operate with minimal resource usage — serving 10’s of thousands of subscribers on as few as tens of virtual CPUs (vCPUs) — and can scale automatically to accommodate demands of thousands of vCPUs within minutes. This ensures operators are well-prepared for any disaster scenario, delivering operational continuity and peace of mind without the burden of large upfront investments or ongoing operational costs.
The integration of cloud-native technologies allows operators to leverage the full potential of public cloud infrastructure, reducing capital expenditures while enhancing service availability. Our solution not only meets the immediate needs of disaster recovery but also positions operators to adapt swiftly to future challenges and opportunities.
1 “12 recent cyber-attacks on the telco sector”, Wisdiam, 31 July 2024.
2 “French internet cables cut in act of sabotage that caused outages across country”, The Register, 29 July 2024.
3 “Optus network crash cost the company $40M”, Light Reading, 23 February 2024.
By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.