the little things
June 24th, 2007 by Daniel LiebermanTo the extent that things at BitPusher ever settle down, things have settled down after the move to Seattle.
We’re already seeing benefits from move. One big networking snafu notwithstanding, the combination of better and beefier shared infrastructure with the opportunity to rebuild our customer servers has noticeably reduced the frequency of operational incidents. In practical terms, this has meant fewer pages for our operations team and fewer process restarts — things that most people don’t notice, but they take up our time and can cause brief downtimes or cause end user sessions to be lost.
But perhaps the biggest difference is in the little ways we’ve made the infrastructure simpler and tidier. It may sound silly to dictate that the server in slot 13 in cabinet 2 will be known as sea-c02-s13 and be must be plugged in to port 13 on the both the remote-control power system and each of the three network switches to which it is connected (two redundant data connections plus one for the lights-out management), but in the end it saves time and avoids confusion, as well as eliminating the need to maintain much of a cable map.
Rebuilding customer servers, as we did during the move, has meant that everything (well, almost everything) is now set up according to our latest best practices. While we always kept the servers up to date in terms of patches and such that affect security and stability, many of the details of how we configure networking, choose file paths, etc. have changed in the three years since we moved in to our previous data center, and some of those are harder to update. Rebuilding gave us a chance to update them. (We’ve also learned a lot about what to do and not do when rebuilding customer environments, so it will be easier to do limited rebuilds to keep things up to the latest spec in the future.)
Improvements in the physical data center space make a big difference, too. In addition to better power and cooling and having plenty of room to expand, it helps a great deal to have better cable management, a simplified and easier-to-manage scheme for labeling servers, and more storage space for easier management of spare parts.
When trying to keep a complicated environment running smoothly, the little things make a big difference.











July 26th, 2007 at 2:07 pm
Very sweet — congrats on moving to Seattle, which is a great place and near other great places (Vancouver & Portland in particular)!
Dictating things doesn’t sound silly to me at all, it sounds very, very sane. Any chance you guys might share your best practices docs with, say, LOPSA? (I’m sure there are issues with security if you’re too specific, but I suspect that having a genericized version of the doc would be a useful thing for folks who don’t have best-practices docs in place to see and use as a basis for writing their own.) If you were really feeling generous, and the doc doesn’t already spell things out, it might be interesting to have the background/justification for why you do various things the way you do.
Have fun in Seattle!
August 27th, 2007 at 9:13 pm
I am obscenely jealous of your cabling algorithm.