Incident Reports

4/5/2024

Dallas Data Center Migration

Tavas Lattimer

Tavas Lattimer

Director of Systems Administration

Dallas Data Center Migration

As many of you probably know, we had a data center move scheduled for April 1st (no, this wasn’t meant to be an April Fools joke). Unfortunately, our move didn’t go as we would’ve liked it to.

Why we moved.

To get you up to speed (if you don’t know this already, which most people don’t), here’s a little bit about how we ended up with the locations we have. When Xentain opened, we started with VPS hosting in Vancouver. After the high demand for Vancouver and other locations, we decided to start offering virtual private servers in Fremont, California. Once Fremont had sold out after several restocks, we decided it was time to open a US Central location in Dallas, Texas. We ended up getting a cabinet with Flexential in their RCH01 location. One of the main reasons we chose the RCH01 location is because our remote hands technician was close by.
Flexential’s RCH01 data center was a nice data center, and was working well for us, until we had to upgrade to a 10 Gbps connection (we outgrew our 1 Gbps connection quickly). For some reason Cogent, our current carrier in Dallas, wouldn’t properly bill us for a 1 Gbps line (Xentain is a Canadian company in American data centers, and apparently its too hard to bill us in CAD instead of USD). To replace Cogent, we planned to use Hurricane Electric, as we have been happy with their services in the past (and we really like Cat5 the ColoCat!) This is where the issues start. After looking at the carriers in Flexential’s RCH01 data center, we realized that Hurricane Electric wasn’t listed. HE is in most data centers, but apparently they weren’t in RCH01. The only way we were going to be able to get a connection to HE while in RCH01 was going to be getting a cross connect from RCH01 to Flexential’s DAL02 (DAL01? They both have the same suite and address on PeeringDB), Flexential’s suite in Equinix’s Infomart in Downtown Dallas.
We ended up deciding it would be easier to just move data centers entirely, as we would have more opportunities to get other carriers in DAL02 if Hurricane Electric didn’t end up working out for us in the long run. We scheduled our data center move for April 1st, 2024, a few weeks after getting our new cabinet in DAL02.

The day of the move.

April 1st came faster than we anticipated. We had scheduled a maintenance from 6am on April 1st to 12am on April 2nd. Unfortunately, we ended up having to extend the window to 12am on April 3rd. A timeline of the move has been provided below.Timeline
All times provided below are in Eastern Time. Times have also been approximated from the actual times things happened.

April 1st:

  • 0600: The maintenance window was entered
  • 1000: Original move start time
  • 1600: Actual move start time – Technician arrives at RCH01 to start de-racking
  • 1605: Rack documentation begins
  • 1700: De-racking process begins
  • 2100: Technician starts loading his vehicle with equipment to be moved to the new data center
  • 2130: Migration pauses as an updated rack map is made
  • 2200: Technician arrives at the new data center
  • 2305: Technician starts to unload equipment into the new data center
  • 2330: Technician arrives at the rack with all equipment and starts re-racking

April 2nd:

  • 0500: Re-racking and cable management has been completed
  • 0505: Router re-configuration started
  • 0510: Router re-configuration complete
  • 0515: Network debugging starts
  • 0730: The issue originated with one of Hurricane Electric’s optical transceivers. HE was notified of the issue
  • 0800: The technician has left the data center. Waiting on a reply from HE
  • 0900: Call was placed to the HE NOC. HE started running tests
  • 0930: Testing still in progress
  • 1000: HE gets Flexential involved
  • 1030: HE’s optic was fixed.
  • 1200: Internal management network was fixed and debugging continues
  • 1400: Multiple incorrectly configured switch ports on our side were found, as well as a lot of other issues. HE’s link was not set to auto negotiate
  • 1600: Network restored

April 4th:

  • 2100: Router powered off to install 10 Gbps network interface card
  • 2130: Network interface card installed and working, but not at 10 Gbps speeds. Our optic ended up not being able to handle more than 1 Gbps, despite it being rated for 10 Gbps. A new optic was ordered.

To summarize: The data center move started and ended later than we had hoped for and Hurricane Electric’s optic ended up having issues, and a bunch of stuff on our end wasn’t configured properly (including switch ports). A few days later, a technician went to the data center to install a 10 Gbps network interface into our router so we could try to do 10 Gbps, but apparently, our optic didn’t want to work, so a new one was ordered.

Thank you

I owe a big thank you to everyone at Flexential and Hurricane Electric who worked with us as we tried to troubleshoot these issues. I also owe a very big thank you to our remote hands technician, Joseph, for being patient with us while we tried to restore everything.