Microsoft has blamed an overheating data centre for a 16-hour shutdown of its Outlook and Hotmail systems. Microsoft said a “rapid and substantial temperature spike” and that the spike “ was significant enough before it was mitigated that it caused our safeguards to come in to place for a large number of servers in this part of the datacenter”
They blogged about it here, but don’t give away too much of the details other than “a firmware update screwed it all up”
At the end of the blog, they provide a link to a page. This page reports on known incidents affecting Microsoft services, which is pretty neat;
We recently had a spike in the temperature of our office server room because during a power cut the UPS kicked in and whilst the air conditioning is not on the UPS, whilst our servers are. The temperature monitor of our racks was at 34c, far higher than what the temperature of the room should be. Crucially however to get the warning email at this temperature was pointless; Sending us an email when it hits 34c is like saying “Hi, this room is about to melt, hope you’ve got some backups”, and had the sys admin running around trying to get desk fans in the room. After this incident we reduced the cutoff temperature threshold to 25c. Fact is the room should be at 19c, and if it hits 25c we know that there a problem and gives us more time to fix it, or at least switch off the non critical servers to help the room cool. Getting the air con on the UPS would really help us out in situations like this!