We’ve been struggling with cooling issues for the last week. The two rigs with the GTX 1070 cards are running warm, close to their thermal caps, but are stable and happy working that way. The GTX 1080TI cards, however, were throttling themselves and even turning themselves off because they were simply running too hot out there. I tried adding case fans in all the locations allowed by the case, as well as using the Tripplite SRCOOL7KRM air conditioner, a swamp cooler to cool the room, running without lids, and various other things. Now I am going to return the $750 air conditioner and $600 swamp cooler, since neither made any difference.
The cards are quite crowded and prone to overheating.
I was having so much trouble getting temperatures under control, I stood up a Nagios instance on my VMware Environment and made custom NRPE checks with PNP4Nagios graphs to visualize temperatures as I made changes. Eventually, I found that sitting Delta 120MM 200CFM case fans directly on top of the cards, blowing air down between them, seems to work effectively, for a total of $20 per fan and two fans per machine:
No Delta fans over the GPUs, Delta fans around 13:30, reboot around 14:10, and up with a 200w power limit at 14:20. The more power they draw, the faster they operate. There is a performance/watt ratio that you look for, and I’m currently running the cards in miner3 at a 200w limit. They dynamically adjust based on load, using that figure as an upper limit.
The fans won’t fit inside the case, and there was no good way to mount them, so I took out a section of each case lid with an angle grinder and a cutting disc.
The Delta fans are 38mm thick, which is a tiny bit too tall for the case lid to close. So each 4U server needs about 6U for effective cooling.
The rest of the rack has now found its place on the back wall of the garage, near the entrance door to our house.
The spacing on the racking is odd because of the AC unit I initially installed. The Chenbro rails are a huge pain to move, but I’ll probably space them more evenly at some point. There were also a few 1U servers in there I’ve since pulled.
I’m using blanks to keep the exhaust air from being pulled through the intakes of the servers. It does make a small (1C) difference on the graphs.
The HP DL180 G6 runs ESXi, and hosts my OpenBSD router/firewall, domain controller, file server, a backup server for a remote location, my development environment, and all the framework used to manage the miners.
I bought this 24 port SMC POE switch off Ebay, new in box, for about $70 shipped. The model isn’t listed anywhere, and it doesn’t have updates available, but for home lab purposes works well. It looks to use the generic Broadcom firmware, with the same management interface as Dell PowerConnect and some Netgear switches. I’m actually fairly confident that this is identical to the PowerConnect 5424 internally, and the aforementioned firmware would work on this.
Cable management isn’t the best, but Velcro ties pull everything out of the way nicely. The managed PDU allows for me to monitor voltage and current remotely, and I have Nagios alerting me when power limits fall below or exceed certain thresholds.
The 240V/30A drop to the rack comes in through the ceiling, as does one of the Ethernet connections back to the house (this one leading to a TP-Link EAP-330 wireless AP). To comply with fire code, I apparently need to fill the rest of the hole with expanding foam.
Anyway, now that the machines are running cool, they’ve been put to work mining and we’re receiving ETH, ZEC, and LBRY deposits daily.
Edit – ReRacked
I reracked everything with 4U of spacing between nodes, and the switch on the back of the rack. That provides space for one more miner, which is probably about the safe limit with cooling and electricity here, assuming I upgrade to all 1080TIs (which I’ll do before I add a fifth node).
Ignore the askew thermometer – I noticed it right after taking these and fixed it.
Cabling is a lot easier to manage when it all just needs to go up.
Even 4U spacing leaves plenty of room to get in and work on these things without removing them from the rack or using the shoddy but overpriced rails.