To the extent that things at BitPusher ever settle down, things have settled down after the move to Seattle.
We’re already seeing benefits from move. One big networking snafu notwithstanding, the combination of better and beefier shared infrastructure with the opportunity to rebuild our customer servers has noticeably reduced the frequency of operational incidents. In practical terms, this has meant fewer pages for our operations team and fewer process restarts — things that most people don’t notice, but they take up our time and can cause brief downtimes or cause end user sessions to be lost.
But perhaps the biggest difference is in the little ways we’ve made the infrastructure simpler and tidier. It may sound silly to dictate that the server in slot 13 in cabinet 2 will be known as sea-c02-s13 and be must be plugged in to port 13 on the both the remote-control power system and each of the three network switches to which it is connected (two redundant data connections plus one for the lights-out management), but in the end it saves time and avoids confusion, as well as eliminating the need to maintain much of a cable map.
Rebuilding customer servers, as we did during the move, has meant that everything (well, almost everything) is now set up according to our latest best practices. While we always kept the servers up to date in terms of patches and such that affect security and stability, many of the details of how we configure networking, choose file paths, etc. have changed in the three years since we moved in to our previous data center, and some of those are harder to update. Rebuilding gave us a chance to update them. (We’ve also learned a lot about what to do and not do when rebuilding customer environments, so it will be easier to do limited rebuilds to keep things up to the latest spec in the future.)
Improvements in the physical data center space make a big difference, too. In addition to better power and cooling and having plenty of room to expand, it helps a great deal to have better cable management, a simplified and easier-to-manage scheme for labeling servers, and more storage space for easier management of spare parts.
When trying to keep a complicated environment running smoothly, the little things make a big difference.