Within the next several weeks we’re going to be changing voice service (phone) providers at our office. When it comes time to initiate the number port and reconfigure stuff there will be a short period of time where calls will not complete or could be dropped. Additional information will be posted as updates to this post and on our Twitter account as it becomes available.
The reason for this change is mainly for cost savings; our current provider is raising their prices and we don’t really use the phone enough to justify the increase. However, we still prefer a separate circuit instead of internet-based VOIP because of the nature of our business: if there is an internet related problem on our side that’s the time we are most likely to need the phones. Since we want to keep voice and internet out of the same basket as much as possible we continue to utilize separate voice circuits from a provider that we aren’t also using for transit multihoming.
UPDATE 1: New circuit has been delivered to the MMR. (11/17/2015)
UPDATE 2: Currently scheduled date for the cutover is the afternoon of Monday, November 30th.
UPDATE 3: The migration has been successfully completed.
Note that our alternate number has changed to 775-221-8807 (the old one could not be ported).
At approximately 11:51 local time we were alerted to degraded performance on paths preferring transit through Charter AS20115. We collected data to open a ticket and attempted to apply a BGP community to lower localpref and move traffic away from AS20115. Oddly, we noticed, the alerts continued and no change was observed.
After attempting to tag a BGP community to lower localpref on announcements to AS20115 we decided to simply shut down the BGP neighbor completely at 11:59. However, we were horrified to discover that even after shutting down the BGP neighbor – effectively withdrawing all routes – Charter continued to announce ours and customer prefixes from AS20115.
The original problem we wanted to work around turns out to be a malfunctioning attenuator in a link bundle somewhere upstream, but this behavior of continuing to announce prefixes after we have withdrawn them or shutdown the BGP neighbor is a catastrophic loss of control over the network announcements from our autonomous system. We did employ what we like to call “stupid routing tricks” like deaggragation in a last ditch effort to drive traffic away from AS20115. However this could not help customer prefixes that were already at the minimum accepted size.
At this time there is no resolution. We’re simply at a loss in stopping Charter’s prefix hijacking other than to wait for them to address it.
UPDATE: The prefixes appear to have finally withdrawn this morning. We will post a complete update later, it’s been a long night.
UPDATE 2: Charter had a second emergency maintenance last night on the same equipment. We haven’t reestablished BGP with AS20115 yet.
UPDATE 3: We’re told that an IOS upgrade was performed on the device that hijacked the prefixes. On the morning of the 29th the affected device was rebooted at approximately 02:30 local time. We were told this solved our problem and our ticket was closed. However, we delayed reestablishing BGP until we could confirm a fix as a reboot would only clear the immediate problem, not fix the underlying issue. a second emergency maintenance occurred the next morning on the 30th with two observed reboots at 05:23 and again at 06:01. We’re told these were due to an IOS upgrade (through two independent sources) that should provide a fix for the bug. We did not reestablish BGP with AS20115 until October 1 at 17:45 local time. The time between our withdraw of prefixes and Charter’s propagation of our withdraw was approximately 14.5 hours. As far as we are aware no traffic was completely lost but was still affected by ~25% packet loss, which initiated our initial desire to withdraw routes.
This information is provided in an effort to maintain transparency in network operations at Roller Network.
The AHBL DNSBL is closing down and emptying its DNS zones. As such, we will be removing all *.ahbl.org configurations from customer DNSBL settings.
See the announcement at: http://www.ahbl.org/content/changes-ahbl
The Heartbleed Bug is a major vulnerability in the OpenSSL library. OpenSSL is extremely popular and is used as the cryptography library behind the scenes for countless secure applications. By now you’ve probably heard about it and its widespread implications. We’re not going to rehash it here, see: heartbleed.com
Roller Network uses Debian Linux as the OS of choice for our servers. However, we do not generally stay on the “bleeding edge” of updates, and in this case that has served us well.
OpenSSL 0.9.8 is not, and has not been, vulnerable to “heartbleed”. Only the newer OpenSSL 1.0.1 through 1.0.1f is vulnerable.
So where does that leave us? The good news is that we were still Debian 6.0 “squeeze” at the time of this security fiasco because we don’t like to jump right into the latest release for the sake of updating. The Debian security team still provides security updates to the previous stable release (also known as “oldstable”) for a period of time, so we’re in no rush to upgrade. Specific software that we do want to have newer versions of are either obtained from Debian backports or compiled manually. We like to take a wait-and-see approach before upgrading Debian distributions.
Here’s a rundown of the major services:
- Incoming mail servers (MX servers): Debian 6.0; not vulnerable, no risk.
- Hosted mail services (POP3, IMAP, Sieve): Debian 6.0; not vulnerable, no risk.
- Outbound mail services (SMTP AUTH, smarthost): Debian 6.0; not vulnerable, no risk.
- Webmail clients (Squirrrelmail and Roundcube, EV cert): Debian 6.0; not vulnerable, no risk.
- Primary and Secondary DNS Servers: Debian 6.0; not vulnerable, no risk.
- Account Control Center (acc.rollernet.us, EV cert): Debian 6.0; not vulnerable, no risk.
- LDAP, RADIUS, and SQL database servers: Debian 6.0; not vulnerable, no risk.
This is great news for our customers: at no time were any password-accepting Roller Network servers running a distribution that was affected by “heartbleed”. We did have an internal server in the office running Debian 7.0 and it’s been patched, SSH keys regnerated, and its SSL cert (signed by our internal CA) reissued.
UPDATE 2014-03-16: UPS maintenance successfully completed!
We are working on scheduling an upcoming facility UPS maintenance and start-up with Eaton to take place on Sunday, March 16th.
no earlier than 17:00 Pacific time Friday, March 14. We have requested March 15th or 16th, or the following weekend (21st after 17:00 or 22nd, 23rd). Once we have a firm date and time we will publish a facility maintenance notification online and direct to customers by email. We are also planning to provide live updates during the procedure.
Earlier this year we purchased another Eaton UPS to bus-tie into the existing parallel/redundant tie panelboard. This will add another 30kVA of capacity to the system and allow us to finish selling the remaining colocation space in Phase I with the goal of reaching “sold out” status. However, to tie in another unit requires a factory technician to commission the new UPS on site and placing the existing system into bypass for a short time. There is also a risk of start up failure on the newly installed unit, as with any untested piece of equipment. At this time the new unit is installed and fully wired with input/output breakers open, waiting for start up.
UPDATE: This is scheduled for all day Sunday, March 16th.
UPDATE: We will update this post as needed during the event and possibly live-tweet it. You can follow @rollernetnv on Twitter or watch the feed on rollernetstatus.com