How quickly can you troubleshoot a routing issue?
Routing needs no introduction. Almost every organization in the world that needs connectivity depends on BGP or IGP routing for data delivery. Every organization with a routed network will have to deal with routing issues at some point. And they can be hard to troubleshoot.
Most routing-related trouble tickets are either closed within a couple of minutes if the issue resolves by itself, or after several hours if the NOC team has to troubleshoot and find the root cause. Now imagine how hard troubleshooting can be in a service provider network that caters to hundreds of customers, and over thousands of routing paths.
Routing issues are hard to troubleshoot for a variety of reasons, primarily the dynamic nature of IP routing. For example, let’s say a customer has complained about intermittent connectivity to a critical application. In the normal course of troubleshooting, a NOC team would figure out where the application is hosted, determine the path taken by traffic from the customer’s source to the destination, find the problematic path, link, or node, troubleshoot and resolve the issue, and connectivity is back up. But in the real world, troubleshooting routing issues is never that easy.
Troubleshooting in hours
To address an intermittent service delivery issue, troubleshooting might begin by trying to determine traffic paths using traceroute. But this won’t work if the issue was reported after convergence. While traceroute will show the current routing path, it holds no records of paths taken before convergence; thus, there is no information about the path that experienced intermittent connectivity. Because the problem can’t be recreated, the ticket is usually marked as “No Problem Found” and closed without resolution.
Now, say the issue persists and the NOC team finds the current path. By running a traceroute, they can find the path from the source to the destination and all the hops along that path. But which node, link, or path is the root cause of the problem? Discovering this requires connecting to and checking each router hop along the path for performance issues, until the problematic node and link are found. It could easily take a couple of hours or more to check all the routers along the path and get everything back up and running. This would be a ticket closed with a resolution—but possibly with an SLA violation.
Troubleshooting in minutes
Now, troubleshooting routing issues can be quick and easy with real-time route analytics and the right tool—Blue Planet® Route Optimization and Assurance (ROA)—to capture real-time and historical routing data with analytics and reports.
Real-time route analytics records all live IGP and BGP routing events. These are used to build and maintain an always-current model of the network control plane. The data is displayed in real time as a live topology map of the network and is stored for troubleshooting and historical analysis. This means that, even if the routing issue was reported after the routes converged, Blue Planet ROA can play back the routing events from the captured routing history, find the problematic link(s), and troubleshoot only those concerned routers and links. This not only reduces the MTTR, but also ensures that routing-related trouble tickets are always closed with a resolution.
This content was originally published on the Packet Design blog and has been updated since the acquisition by Blue Planet.