Executive Summary
The purpose of this document is to provide an assessment of service degradation reported on 1/28/26.
Timeline of issue and root cause:
At 2:11 AM CT Our Session Border Controller vendor Ribbon NOC received an alarm indicating one of the servers (PSX) was unreachable at IAD data center. Upon initial checks, it became clear that the PSX application was not functioning correctly. TeligentIP support also confirmed (3:11) seeing CPU usage was overutilized. We were unable to validate the PSX state at that time because SSH connectivity to the PSX was not possible. Once access was restored, we observed that several PSX application processes were terminated and could not be restarted. To recover the system, we performed a full PSX platform reboot, after which the PSX successfully reconnected to all devices and resumed normal call processing (5:50). Shortly afterward (6:19), new alarms indicated that the PSX was failing to register with Ribbon Application Management Platform (RAMP). Since RAMP stores the license data—which the PSX periodically validates despite retaining a local copy—this registration failure required investigation. Ribbon NOC worked with their GPS team and identified a missing SQL table entry that defines the PSX role. After adding the appropriate entry, the PSX successfully re-registered with RAMP (8:25).
However, later during troubleshooting, it was determined this re-registration caused the PSX to switch from node-locked licensing to domain-locked licensing, and because no domain-locked licenses are available, the licensed Call-Per-Second (CPS) for the PSX at IAD dropped from 100 to 0. This issue was not immediately visible because the ORD PSX continued handling traffic until it reached capacity. As endpoints began registration and call volume increased during business hours, the ORD PSX became overloaded, creating a bottleneck that led to registration failures and call processing issues. This is the time that we started to see service degradation at approximately 8:40. TeligentIP engaged Ribbon NOC at approximately 8:55 and started troubleshooting. Ribbon NOC identified that the licensing mode change was the root cause. Once Ribbon switched the IAD PSX licensing back to node-locked (11:25), the combined CPS load across both PSXs was approximately 900 (11:30). Within minutes, this began to normalize, dropping below 500 (11:44), then gradually to around 200 (12:10), where it stabilized. Subsequently, call attempts decreased and all operational statistics returned to levels consistent with normal activity for this time of day (12:30).
Corrective Actions:
TeligentIP will work with Ribbon to update the PSX licensing model to prevent unintended licensing mode changes and avoid similar incidents in the future. Additionally, the Ribbon NOC has enhanced internal documentation to streamline troubleshooting and reduce time to resolution.