End-to-end service troubleshooting scenario

Purpose

This process shows you how to troubleshoot issues on services.

In this scenario, a service is experiencing problems.

View service health summary
 

The Service Health dashlet in the Network Health dashboard uses KPIs to show service states.

The Affected Services KPI indicates that there are several unhealthy services in the network.

The Service Configuration Health dashlet indicates that three of the services are misaligned from the templates used to create them.

Click on the Misaligned Services circle.

The Service Management, Services view opens, filtered to show the misaligned services.


Returning to the Network Map and Health dashboard, click on the Affected Services circle.

The Services data page appears, filtered to show the list of services with at least one affected object. The default filter can be changed if needed, for example, to focus on services with more affected objects. From the Services data page, we can see that the Operational State of our service of interest is disabled.

Let’s open the object troubleshooting dashboard to get more details. Select the affected service and choose png2.png (Table row actions), View in Object Troubleshooting.

The Object Troubleshooting dashboard opens, filtered to show the service we’re investigating.

Explore the Object Troubleshooting dashboard
 

Let’s take a look at the Troubleshooting Summary Board. The Service Overview and Current Health Summary dashlets show similar information to what we saw in the Network Health dashboard: The dashboard also includes health summaries for the sites, endpoints, and tunnel bindings. Here we can see that there is a problem with one site and one endpoint. The tunnel bindings look healthy.


Scroll down to the service map, select the link and click Details (png85.png) to see more information in the service summary panel.


Select Show in multi-layer map from the More (png2.png) menu in the service summary panel. The multi-layer map shows the health of the service, the tunnel, and the IGP and physical layers.


From the Object Troubleshooting dashboard, we can also create an OAM test suite (png9.png), or select View OAM test results from the More (png2.png) menu. OAM testing can provide valuable information about traffic flow.


Also from the More menu, we can add the service to the Watchlist. Adding an object to the watchlist allows us to navigate quickly and directly to the object in the future. Choose Add to Watchlist.

Click Watchlist (png72.png) to view the Watchlist.


You can click CHANGE TARGET in the Object Troubleshooting dashboard to troubleshoot other services or objects.


Expand the Sites, Tunnel Bindings, and Endpoints dashlets in the Service Inventory area to see details about service components.


Select a faulty endpoint and choose View in Event Timeline from the Table row actions menu (png2.png) to view a history of events for the service.


Click on an event to see a list of actions and alarms associated with the event.

Investigate service alarms from the News Feed

Another option for investigating alarm details is to start from the News Feed. The News Feed provides a live feed of unacknowledged root cause alarms as they occur in real time. Alarm severity and number of impacts are displayed, and cross launch is available depending on the alarm. All alarms can cross launch to the Current Alarm List.

 

From the News Feed, select the alarm and choose View in Current Alarms from the More menu.

The Current Alarms list opens, with the alarm selected.


From the Current Alarms list, you can click View Impacts from the table row actions menu for the alarm. This alarm has no impacts.


Use the drop-down menu at the top of the view to switch to the Top Unhealthy NEs or Top Problems views to see more information about problems in the network.

View service provisioning details

Returning to the Troubleshooting dashboard, we can also launch Service Management to look at the provisioning of the service.

 

From the Service Overview dashlet, click View in Service Management.

Service Management opens, filtered to show the service in question.


From here, we can open the service for editing to verify that the service provisioning is correct. Click png2.png (Table row actions), Edit Service.


In the Edit form, we can see that all the mandatory fields for the service are populated and the administrative state is unlocked as expected.

We’ll scroll further down to look at the sites.

For both sites, everything looks good. The administration state is unlocked, inner and outer VLAN tags are present and correct.

If we scroll further down to look at the service tunnels, they’re both unlocked and provisioned with a source and destination.

The provisioning looks good on the NSP side, so we’ll close the Edit form.


The next thing we can do is confirm that the provisioning is also correct on the NEs by doing an audit config. This will compare the configuration on the NSP with what is present on the NE.

From the Service Management, Services view, select the service and click png2.png (Table row actions), Audit config.

The audit operation has found a misaligned attribute. The NSP configuration shows that the state of this SAP should be enable, but the actual value is disable.


Now that we have verified that there is a mismatch between NSP provisioning and the NE, we can use the align function to push the configuration to the network.

From the services tab, click png2.png (Table row actions), Align, Push to network and confirm.

When the alignment operation is complete, the Alignment State shows as Aligned. Within a minute or two, the operational state should be changed to enabled and the service should be working.