Data Center Water Cooling Problems - RESOLVED 7/21/2025
UPDATE: 7/21/2025 - IPF installed a temporary chiller to resume water cooling for our AMD24 cluster. The cluster is back online and fully functional. We will need to do a short shutdown to switch to the main chiller after it is fixed, but that could be months down the road. Will send out annoucements when that happens.
UPDATE: 7/15/2025 - IPF is currently in the process of obtaining a rental water chiller and organizing the water and power feeds. They are optimistic of having a working solution by end of this week.
UPDATE: 7/11/2025 - The water cooling system is getting too warm and nfh nodes are being shutdown. IPF and vendor have identified a rental to restore service, timeline for installation still TBD. Everyone is working hard to get this ASAP.
Starting at 12:30am 7/9/2025, we experienced multiple failed components on our water cooling system. Waiting on IPF for solutions or alternatives. We are told it could be multiple weeks for a fix. We are shutting down AMD24 CPU nodes in hopes the AMD24 GPU nodes will remain in a safe operating temperature. We will provide updates as we learn more.