Roller Network AS11170 will be updating our routing policy to reject any IPv4 or IPv6 prefix with a BGP RPKI validation result of “invalid” on both the peering and transit borders of our network. We’ve been running RPKI validation internally for a while with the “bgp bestpath prefix-validate allow-invalid” setting configured. This routing policy change will simply remove this line from our BGP address family configurations.
The Account Control Center will be unavailable for about 7 minutes today (May 26, 2019) between 10:50 and 11:00 Pacific time for network maintenance.
UPDATE: Completed at 11:00:16.
With the activation of the Hurricane Electric POP in Reno, NV the time has finally come to turn down our transit connection to AS20115 Charter/Spectrum. We’ve given our required 30 day termination notice to Charter/Spectrum effective today, April 9, for a end of service date of May 10, 2019.
In the meantime, our routing policy for AS20115 will change to that of a local peering type connection for data collection purposes. We’re curious how much utilization we will see if we restrict it for its last month. Incoming announcements from AS20115 will be filtered with an as-path-access-list of “permit ^20115$” and outgoing announcements will be tagged with community 20115:666 (Do not advertise outside of Charter AS). We will also move the physical connection away from the border router where our policy is one provider per router – a role now assigned to Hurricane Electric on that router – and over to our core peering router. With these filters we only expect to see about 2700 IPv4 prefixes. Charter’s IPv6 BGP session is broken again, but it’s not worth the fight to fix it so this exercise will be IPv4 only.
While we would like to maintain a regional peering connection with Charter/Spectrum, our previous account reps were not able to understand our needs (and our customer’s needs) to successfully negotiate a renewal for interconnection and peering over simply “buying internet”, the latter of which is no longer interesting to us as a colocation datacenter operator.
UPDATE: Effective 4/10/2019, AS20115 has been moved to our core peering router where it will remain until it’s shut down for good.
UPDATE 2: As soon as our Any2 peering port is ready we will remove our connection to Charter/Spectrum. (5/7/2019)
UPDATE 3: Shut down BGP to AS20115. (5/8/2019)
UPDATE 4: Our port to AS20115 Charter/Spectrum is now unplugged and cross connects removed: disconnect complete. (5/8/2019)
Beginning on October 24th at 02:36:56 PDT during a scheduled maintenance window, our circuit to Charter (Spectrum) AS20115 entered loss of service state and was not restored until 57 hours later. Roller Network is disappointed with the delayed response from Charter (Spectrum) to an issue that was created by their own maintenance activities, and the failure of the maintenance group to ensure circuits they removed from service are restored following such activities.
- At 10:02 we called to notify Charter (Spectrum) that our circuit never recovered following maintenance, in which Charter (Spectrum) intentionally placed the circuit into a loss of service state. We were informed that Charter (Spectrum) will not individually troubleshoot our issue because there was a possible related outage and referenced ticket 50233491.
- At 14:45 Oct. 24 we again called asking for an ETA on handling out outage. No ETA was given, however we insisted that a ticket was opened and linked to ticket 50233491 (ticket 50234369). We were assured that someone would follow up with us (this did not happen).
- The following day at 09:26 on Oct. 25 we called to inquire 1) why our circuit was still down and 2) why nobody had followed up with us. We were informed that no followup was made because their notes said the circuit was restored the previous night around 9pm. However, it was not actually restored, and we suggested that ideally someone should have contacted us since we had an open ticket and asked us if it was indeed restored.
- At 12:16 Oct. 25 we called again to inquire on the status. We were informed that according to the notes nobody had looked at it yet since our last call at 09:26. No information as to why not. At this point the circuit has been down for 33 hours.
- At 12:48 Oct. 25, a full 34 hours after first loss of service, we finally received a callback asking us to verify site power (which of course we have power) before they can send a tech out.
- At 13:34 Oct. 25 we received a call from a tech indicating they were en route.
- At around 15:10 Oct. 25 the tech came to the conclusion that the reason for our ongoing outage was that our circuit was migrated to a new core router, however any and all related configuration was discarded with the migration, specifically all of our BGP configuration for both IPv4 and IPv6, which without BGP the circuit is useless.
- At 15:21 Oct. 25 we placed BGP neighbors into “shutdown” state because if Charter (Spectrum) maintenance or whoever is responsible for such work deleted our configurations, they would have to recreate it and we would require an audit on their new configuration before we can restore BGP in a controlled manner since the circuit can no longer be trusted.
- At 21:20 Oct. 25 we stopped actively requesting updates while waiting for Charter (Spectrum) to pass the request to whatever group handled new configurations since Charter (Spectrum) maintenance failed to include migration of existing configurations in their process. However, a decision was made overnight by Charter (Spectrum) to un-migrate our circuit back to its original router and original configuration rather than attempt to migrate our configurations to the new router that maintenance performed to cause the loss of service condition.
- At 11:51 on Oct. 25 we were finally able to obtain a confirmation in writing from Charter (Spectrum) that our circuit was un-migrated and the original configurations that we had last audited with Charter (Spectrum) on August 3rd, and we returned our BGP neighbors to active state. The total outage duration was 2 days, 9 hours (57 hours) from first loss of service to final confirmation that the circuit was restored to its pre-migration condition.
Service was ultimately restored after 57 hours, however internally Charter (Spectrum) does not recognize this since it overlapped two maintenance windows. Since the circuit was physically restored to a location that it was intentionally moved away from, we fully expect Charter (Spectrum) to make a second attempt at a maintenance window for another migration. Whether or not Charter (Spectrum) will be able to perform this task correctly remains to be seen.
Roller Network disagrees with Charter (Spectrum)’s position that “maintenance” is not responsible for failing to return a circuit to service, and we further assert that whether or not an outage is planned – in this case clearly poorly planned – performing maintenance is still an outage. The sole difference is contractual as to what refunds may be owed or whether or not such could be considered as default of contract. Our circuit went into loss of service state directly due to “maintenance” and was not returned to service, thus “maintenance” is the root cause. From a customer service perspective the ethical course of action would be to cancel any future maintenance and revert all changes performed for failing to complete such within its designated window, rare or not (it was argued that doing so is unnecessary because maintenance failing to successfully complete a task is a “rare” occurrence). Roller Network does not believe it is a customer’s responsibility to make sure “maintenance” performs their job(s) correctly.
Editorial Note: This incident highlights why working with a small business like Roller Network is better than a large company. At no time did our account manager (who was CC’d on all correspondence) offer to step in to help or escalate our case, nor did they follow up to see if our issue was being handled properly. Charter (Spectrum)’s maintenance group, the group one would expect to know exactly what they did to break our circuit, disregarded our issue as a problem for another group since it ceases to be their problem past 6AM even if they fail to restore it working condition by that time. At Roller Network, we do not pass blame between departments, and we always strive make sure our customer’s are in working order – it’s literally our job. Our business with Charter (Spectrum) was treated as unimportant and ultimately irrelevant to them. Charter (Spectrum) is only interested in securing new business for short term gains, disregarding the long term interests of their customers. And that’s the biggest point we can make in our favor: as a small business, when you work with Roller Network you are important to us as an individual on an ongoing, long term basis.
Updates have been applied to mail.rollernet.us which include bringing SpamAssassin up to version 3.4.0. No issues were observed with mail2.