Twitter Outages

Twitter is currerntly suffering from outages today (October 21, 2016) and may be unavailable.

As a reminder our third party hosted status page is at:

SMTP Delivery Trouble to Proofpoint

Yesterday a customer’s compromised outbound accounts (albeit brief) caused our SMTP AUTH server to become blacklisted at Proofpoint. Outbound mail authorization was revoked for that customer in accordance with our policies, however the Proofpoint block lingers.

If you are having trouble contacting someone behind Proofpoint you should encourage them to contact their mail host and/or Proofpoint for resolution. Although we are attempting to reach out ourselves, companies like Proofpoint are more likely to listen to their own customer’s complains about losing legitimate mail than they will listen to us.

As far as we are aware this issue is limited to Proofpoint.

UPDATE : This issue has been resolved as of August 23, 2016.

ACC Update; Primary DNS, PayPal eChecks

An update to the account control center was made live today (Sunday, July 10, 2016) that contains major changes to the Primary DNS section, along with minor fixes to other sections. We’ve run through every change and It’s tested OK for us, but if any problems are observed please contact support so we can fix/debug it. Due to the large number of changes to Primary DNS this has been holding us back from updating other parts of the ACC, but that should be out of the way now.

The other major change relates to PayPal payments: the system will now note eCheck pending payments to invoices and automatically place the invoice on hold until a cleared or failed message is sent from PayPal. This will address the issue of eCheck payments placed too close to the shutoff date for them to clear in time.

Office Voice (Phone) Provider Change Maintenance Notice

Within the next several weeks we’re going to be changing voice service (phone) providers at our office. When it comes time to initiate the number port and reconfigure stuff there will be a short period of time where calls will not complete or could be dropped. Additional information will be posted as updates to this post and on our Twitter account as it becomes available.

The reason for this change is mainly for cost savings; our current provider is raising their prices and we don’t really use the phone enough to justify the increase. However, we still prefer a separate circuit instead of internet-based VOIP because of the nature of our business: if there is an internet related problem on our side that’s the time we are most likely to need the phones. Since we want to keep voice and internet out of the same basket as much as possible we continue to utilize separate voice circuits from a provider that we aren’t also using for transit multihoming.

UPDATE 1: New circuit has been delivered to the MMR. (11/17/2015)

UPDATE 2: Currently scheduled date for the cutover is the afternoon of Monday, November 30th.

UPDATE 3: The migration has been successfully completed.

Note that our alternate number has changed to 775-221-8807 (the old one could not be ported).

Prefix hijacking by Charter AS20115

At approximately 11:51 local time we were alerted to degraded performance on paths preferring transit through Charter AS20115. We collected data to open a ticket and attempted to apply a BGP community to lower localpref and move traffic away from AS20115. Oddly, we noticed, the alerts continued and no change was observed.

After attempting to tag a BGP community to lower localpref on announcements to AS20115 we decided to simply shut down the BGP neighbor completely at 11:59. However, we were horrified to discover that even after shutting down the BGP neighbor – effectively withdrawing all routes – Charter continued to announce ours and customer prefixes from AS20115.

The original problem we wanted to work around turns out to be a malfunctioning attenuator in a link bundle somewhere upstream, but this behavior of continuing to announce prefixes after we have withdrawn them or shutdown the BGP neighbor is a catastrophic loss of control over the network announcements from our autonomous system. We did employ what we like to call “stupid routing tricks” like deaggragation in a last ditch effort to drive traffic away from AS20115. However this could not help customer prefixes that were already at the minimum accepted size.

At this time there is no resolution. We’re simply at a loss in stopping Charter’s prefix hijacking other than to wait for them to address it.

UPDATE: The prefixes appear to have finally withdrawn this morning. We will post a complete update later, it’s been a long night.

UPDATE 2: Charter had a second emergency maintenance last night on the same equipment. We haven’t reestablished BGP with AS20115 yet.

UPDATE 3: We’re told that an IOS upgrade was performed on the device that hijacked the prefixes. On the morning of the 29th the affected device was rebooted at approximately 02:30 local time. We were told this solved our problem and our ticket was closed. However, we delayed reestablishing BGP until we could confirm a fix as a reboot would only clear the immediate problem, not fix the underlying issue. a second emergency maintenance occurred the next morning on the 30th with two observed reboots at 05:23 and again at 06:01. We’re told these were due to an IOS upgrade (through two independent sources) that should provide a fix for the bug. We did not reestablish BGP with AS20115 until October 1 at 17:45 local time. The time between our withdraw of prefixes and Charter’s propagation of our withdraw was approximately 14.5 hours. As far as we are aware no traffic was completely lost but was still affected by ~25% packet loss, which initiated our initial desire to withdraw routes.

This information is provided in an effort to maintain transparency in network operations at Roller Network.