Rack Cooling

Consider heat removal from rack now.

The rack consumes 194 kW at max load, and all these 194 kW of heat should be removed by hot water, 44C. Why is so hot? It needs to be explained using a picture:

cooling system

First, why the system has two cooling loops - internal and external? The answer is that we can’t use water (very expensive Class A purity) by sub-zero temperatures outside, we have to use glycol mix instead for outside. If so, why we can't use glycol mix in inner cooling loop? Because glycol mix have 30% less thermal capacity, it’s vital parameter for cooling CPU/GPU. One more reason that is less obvious, is electrochemical corrosion: we can't mix copper and aluminum (see details on page 46 in IBM "Robust Cooling System" presentation), to avoid the corrosion. We can control used materials in inner cooling loop everywhere, because it is our project, but outer loop is off-the-shelf decision, without knowing detailed info about its internals.

As we can see at the picture, we have 44C inlet and 45-50C outlet. Why 45-50? Because it depends on system computational load. On full load, the rack consumes more power, so we have higher outlet temperature, 50C. If we small computational load at all, we have 45C on outlet. Obvious question - if we don't have computational load, wouldn't it better to switch off the power at all, like in notebook or washing machine? The answer is NO! Because datacenters lose from 5% to 10% of computational nodes and 15% disks on every power cycle. Therefore, it's cheaper to spend energy for non-working nodes than repair 10% of then on each power on. Simple economics, nothing more.

Return to the picture. Why 44C inlet is used? We have there some kind of trade-off, compromise between two restrictions. First restriction is an ambient temperature. Water in second loop should have 45C min temperature providing that ambient temperature is less than 35C. In that condition, we guarantee 42C outlet in outer loop. It's temperature lower limitation.  Upper limitation is that we need to have 44C inlet for inner loop (2C is losses in heat exchanger between inner and outer loops) to have enough DeltaT between cooling water and component temperature, otherwise we won't get enough heat exchange, which leads to electronic components overheating.

Just remember simple formula:

q = -k*ΔT

Where k is thermal conductivity constant (consists of a lot of components) and ΔT is temperature difference between cooled component and coldplate. With 20 Celsuis diffference we can remove twice more heat than with 10 at the same tame, by same rest parameters. Or, saying it in different words, pump twice less cooling water, which directly affects to PUE (Power Usage Efficiency), one of the key parameters for supercomputers and datacenters

©2019 Nikolay Bodunov. All Rights Reserved.