Network Resilience Will Determine Business Continuity – A CIO Checklist

In the era of the distributed enterprise, network architecture models have become more complex. They have also introduced more points of failure.  Opengear’s CTO Marcio Saito shares tips with IDN for developing an agile, responsive lifeline in 2020.

Tags: apps, automation, business continuity, checklist, CIO, Opengear, resilience,

Marcio Saito, Opengear
Marcio Saito

"End-to-end network resilience requires a specific set of network resilience tools and processes."

Application Architecture Summit
Modern Application Development for Digital Business Success
Online Conference

Today, we’re discovering that network resilience has never been more critical for business continuity.


A recent fiber cut outage to Century Link’s Level 3 network disrupted Merrill Lynch’s brokerage business, illustrating the vulnerability networks face in this critical time of unprecedented data-rich application usage.


And, of course, we’re seeing thousands of companies of all-sizes looking for ways to keep systems up and workforces operating during the pandemic.


The lesson here is that any network can suffer disruptions – whether from technology or natural events - despite the use of highly scalable cloud services and backup components across core network infrastructure.


This is why ensuring end-to-end network resilience should be top of mind for every CTO and CIO.

End-to-End Network Resilience – A Comprehensive Solution Beyond Redundancy

Many companies reinforce their infrastructure with extra components to ensure redundancy in case something fails. To do this, organizations may purchase additional parts like backup generators or cooling units, they may host and run apps in multiple locations, or they may set up a secondary data center, colocation or hybrid cloud environment as a failover.


While redundancy is important, it may not be enough if something other than a redundant element falters or network management tools cannot reach a remote location. This is where going beyond redundancy and looking at the resilience of both the data and management planes of the network is important.


End-to-end network resilience is about being able to quickly continue normal operations after a network outage and prevent failures based on visibility into all equipment in a data center or edge site. This requires a specific set of network resilience tools and processes that provide capabilities like always-on remote monitoring and management, a separate connectivity pathway for management, continued Internet connectivity during an ISP outage, and minimized need for human intervention.


During a crisis, network resilience is even more vital. Engineers may not be able to physically access core or edge infrastructure, shifting user patterns may strain networks, or an influx of security threats targeting users may threaten the primary network.

An Enterprise Checklist for End-to-End Network Resilience   

To help all-sized enterprises maximize uptime, the below checklist outlines critical lifelines needed for end-to-end network resilience.


#1 Provide a dedicated network management plane

If user data traffic and control commands travel the same network routes as management, engineers and certain automated tools may become paralyzed or “congested” when the primary production network is disrupted. By enabling a separate network management connection to reach console ports, referred to as out-of-band (OOB) management in IT, engineers and management tools can reach any core or edge site in a network, regardless of the status of the production network or location. This enables real-time problem resolution to remediate issues and better visibility into the status of devices to prevent failures.


#2 Utilize a secure method of wireless connectivity for failover

A backup connection should be seamless in its ability to prevent disruptions and scale alongside a business. Plain old telephone lines aren’t scalable or efficient at servicing geographically dispersed locations, as they require local, onsite maintenance to manage, configure and troubleshoot.


A wireless LTE cellular connection, on the other hand, offers scalable and reliable link diversity for failover and new site configuration that can be carried out remotely. When setting up a backup cellular network, organizations should incorporate automatic failover capabilities as well as security protocols for gating traffic and device visibility. These security features can be used to ensure the back-up solution doesn’t become another vector for hackers into the business system.


#3. Automate common labor-intensive tasks for network monitoring and preventative measures

In addition to being decoupled from the data and control planes, many aspects of network management should be automated, and admins should have access to a centralized, easy-to-use system for monitoring network nodes.


Items that can be automated include zero touch provisioning and configuration of a separate management network, continuous event collection with automated analysis and alerts, and keeping image, script and configuration files constantly updated where they are needed. These features can significantly reduce costs, eliminate human error, standardize network management configurations across organizations, and aid in ensuring networks can be managed remotely.


#4. Ensure data-rich applications are supported, even if they are located far away from core data center infrastructure

IDC predicts that in 2025, nearly 30% of the data used in our business and personal lives will be processed in real-time, while IoT devices are expected to create over 90 ZB of data across the globe. And with 97.2% of executives indicating their companies are investing in Big Data and AI initiatives, it’s no surprise that the global big data and business analytics market is forecast to exceed $274 Billion by 2022, or that the global spending on smart city projects will reach over $1 Trillion by 2024.


Picture a plethora of sophisticated IoT devices at the edge that need to respond in real-time for critical tasks like those of self-driving vehicles or intelligently sifting through mountains of information to be taken back to the core data center for further analytics. These applications are much more data-intensive than something like a sensor simply reading temperatures, and latency will be unacceptable, both at the consumer and enterprise level.


Consider the following example. A surveillance system wants to monitor 10,000 cameras and detect when a police car crosses a city intersection. It would be impractical to stream video 24/7 from every camera to a central system. Therefore, an advanced pattern detection algorithm, requiring quick reaction time, will need to be deployed at the edge device to economically send the right data back to the cloud for big data analytics.


According to a March 2020 study, the edge computing market will be worth $43.4 billion by 2027 (Grand View Research). As the adoption of technologies like IoT, rich media content, VR/AR, and AI increase, organizations will need to adopt more edge computing infrastructure to support data-intensive processes locally. Theses edge resources will need efficient methods of continuous provisioning, configuration, monitoring and remediation to provide resilience. This means engineers and management tools will need to operate in concert and have remote connectivity to all devices.


Now that more tools for network automation are available and infrastructure is moving back to the edge to accommodate data-intensive processes, engineers should think about how they can extend management tools and systems at a central location to reach remote infrastructure.

Ensure Business Continuity Today and Future-Proof for Tomorrow

As network architecture models become more complex, they introduce more points of failure. For instance, SD-WAN configurations used to connect to the cloud involve more software-stacks and potential disruptions, such as problematic firmware updates. Additionally, the need for edge support is growing and infrastructure transformations or app migrations leave organizations vulnerable to downtime if the network does not have secondary support.


In this environment, it is essential to have an alternate lifeline if a primary network fails. And it’s become just as critical to separate network management from data and control traffic. Ideally, implementing and managing this lifeline should be a standardized, yet highly flexible process that enables automation and comprehensive visibility at the core or edge.


It’s not about just today, or even tomorrow. For years to come, these capabilities will ensure continuity and eliminate deadlock that could paralyze admins and automated tools, or worse - prevent the ability to respond as a problem rapidly escalates.


As CRO of Opengear, Marcio Saito is responsible for product and technology strategy. He has held several executive level positions in global technology companies. At Cyclades, he was a pioneer in the Open Source Software movement and helped establish the concept of out-of-band management for Data Center Infrastructure. Later, as the VP of Strategy for Avocent, he managed product and engineering teams.