SDN control – 3 rules for successful deployment
Software Defined Networking (SDN) and its control solutions are an architectural component of many Communications Service Provider (CSP) future plans or in some cases, current network implementations. The history of potential SDN control implementations is littered with the software debris of cancelled open-source projects (taking open source control software and implementing it in a network that supports real customer revenue generating traffic is not a reality) or continuous standards development looking for that golden set of capabilities that is going to allow plug-and-play solutions with an off-the-shelf controller ‘in the future’. The take-away from these activities is not that SDN controllers are impossible to build or need to wait for complex standardization to complete. Rather, the message is that implementing SDN control is hard and successful implementations need to work through the challenges.
What makes SDN control so difficult to implement?
CSPs are converging to a version control architecture shown below in figure 1. At the lowest layer attached to the photonic hardware, the Open Line system (OLS) is managed by an OLS controller. This is very similar to existing network management systems (NMS) with two main differences:
- The API northbound of the OLS controller is a standard Transport API (TAPI) interface and,
- The transponders that attach to the OLS are managed via NETCONF and OpenConfig APIs and (often) managed from the Optical Domain controller, not the OLS controller.
The implications of this seemingly simple distribution of functionality are profound. The first implication is all the attributes of a classic NMS solution (identified as ‘Standard Tools’ in figure 1) must be available at both the Optical domain Controller (it is managing the open transponders) and the OLS controller (it is managing the line system). This architecture and the requirements derived from it are the main reason many SDN control solutions have failed – real network solutions need high availability, geo-redundant platforms with a broad set of visualization, PM collection and alarm reporting and correlation tools for network operations. These capabilities are challenging to build at scale and with the reliability required for a revenue-generating infrastructure which underpins most CSP networks.
Figure 1: Open Optical Control Architecture
The initial approach used in some lab evaluations has been to ‘leave the NMS as is’ to operate as an OLS Controller and build (or buy) a new control solution only for the optical domain controller. The initial selling proposition of this solution is that it is ‘lowest risk’ since only the northbound API of the existing controller (typically an NMS) is changing, and the Optical Domain controller will be a new platform which is ‘simple’ to integrate. The reality is the new Optical Domain controller has all the platform and functional complexity of the existing NMS and the CSP is faced with doing extra OSS integration with the new Optical Domain controller, interfacing through a legacy NMS not built for open or modern APIs, and trying to define new operational workarounds based on limitations of both the legacy and new controllers. The limitations of this architectural control approach have become evident and as a result have largely been abandoned by the industry.
If the legacy NMS approach did not work, what are the ‘Rules’ for a successful controller solution?
Rule 1: The Optical Domain controller must have the same platform capabilities and system tools as the OLS controller.
Rule 1 ensures the essential set of platform and operational capabilities are available as open line and open transponders are introduced in the network. The OLS and the Optical Domain controllers must have the same (or better) level of visibility in an open system as previously existed in a vendor integrated solution, in order to ensure a reliable network. They must also be built on high-availability, geo-redundant platforms which deliver all the required NMS tools (remember the Open transponders are usually attached to the Optical Domain controller) as well as all the open APIs required for system integration, alarm correlation and automation. An Optical Domain controller is not a ‘simplified’ version of the OLS controller – if anything, it requires a superset of the capabilities of this platform as it manages one or more OLS from different vendors as well as one or more vendors’ open transponders.
Rule 2: Artificial Intelligence/Machine Learning (AI/ML) applications need to scale and access cloud resources as well as the network hardware.
Rule 2 flows from the drive to reduce opex and automate routine manual tasks in the network. The simplest way to do this is via AI/ML applications running in the background and ensuring the network is performing the way the CSP has specified. There are many types of AI/ML applications, and some are implemented in network controllers today however many applications will require large bursts of compute/storage/memory to perform their function, e.g., path computation engine (PCE) recomputing global restoration paths for multi-failure scenarios using multiple constraints and accommodating multiple vendor transponder optical reach capabilities. In this case, it is possible to build all this power into the Optical Domain controller; however, a more scalable solution would be a cloud-based application framework connected in real time to the network hardware. This provides the best of both worlds – scalable off-box processing of intermittent application demands based on telemetry streamed from the network. This separation of compute-intensive and operational workloads also ensures that process or container issues in the applications space do not impact performance of the real-time controllers managing the network.
Rule 3: The network architecture must ensure transparent information flow.
The use of APIs is critical in an automated network solution based on Optical Domain and OLS controllers to ensure a solid and viable path to a multi-layer optimized solution. Multi-layer optimization will require AI/ML to derive the IP traffic pattern over the optical network (not all transponders will have Link Layer Discovery Protocol (LLDP) snooping or another hardware mechanism for sampling IP flows). The analytics models and algorithms to derive this information will receive network information via firewalled APIs, process the data in the cloud, and return operational insights and recommended optimizations to the operations team. Fully seamless and transparent information flow through the Optical Domain controller APIs to cloud resources over firewalled APIs will deliver a long-term solution for control evolution in a multi-vendor, multi-layer network.
The journey
Successful SDN control implementation in optical networks is not a simple or short journey. Shortcuts using legacy NMS, open controllers and force-fitting APIs have not been successful. Choosing the right long-term control architecture based on platforms built with a full understanding of both the short- and long-term requirements will pay off. Not only will the network be opened to multiple optical vendors, but also there are longer-term operational cost savings as AI/ML are leveraged both within and across multiple domains to optimize the entire network infrastructure.