We'll post information about ICER's system downtimes, updates, new features, and other information for the ICER user community here.
May 4, 2026
IPF has identified a leak on our primary cooling system, and have received the parts to repair it this week. In order to complete this work quickly and reduce the total downtime expected, amd24 will need to be taken offline one day earlier than planned, at 7 AM this Wednesday, May 6th, and will remain offline during our regularly scheduled maintenance on 5/7. Jobs will not start that will overlap this window; jobs running on amd24 on the 6th may or may not be canceled, depending on the duration of the repair.
Apr 20, 2026
We are currently performing maintenance on the scratch filesystem with assistance from our storage vendor. Work is ongoing and no downtime is required — scratch remains mounted and available on all cluster nodes throughout.
While this work is underway, users may notice intermittent slowdowns when reading from or writing to scratch. This can show up as jobs appearing to stall briefly, longer-than-usual file operations from the command line, or pauses when listing directory contents. These delays are expected and should resolve on their own as the maintenance progresses.
What you can do:
- No action is required. Running jobs will continue to run.
- If a job is unusually sensitive to I/O latency, you may want to hold off on submitting it until we post an all-clear.
- Avoid large bulk operations against scratch (mass
rm -rf,tarof huge directories,rsyncof full trees) during this window if you can defer them.
We will post a follow-up once the maintenance is complete. If you encounter anything beyond general slowness — jobs failing with I/O errors, files that won’t open, or scratch appearing unmounted on a node — please open a ticket so we can take a look.
Thanks for your patience.
Apr 1, 2026
Over the next several weeks, ICER will be performing minor operating system updates across all nodes in the HPCC. During this time, users may notice longer queue times and some nodes unavailable in Slurm while they are being updated. Once the updates are complete, nodes will be returned to service and jobs will continue to schedule and run. Because these are minor updates, the software and module system is unaffected. No issues have been identified in initial testing, but please open a ticket if you experience any issues.
Apr 9, 2026
The HPCC will be unavailable on Thursday, May 7 starting at 5AM for regularly scheduled maintenance. No jobs will run during this time.
Jobs that will not be completed before this date will not begin until after maintenance is complete. For example, if you submit a four day job three days before the maintenance outage, your job will be postponed and will not begin to run until after maintenance is completed.
If you have any questions, please contact us
Feb 25, 2026
What is happening?
On Monday, May 4, 2026, the current version of the Miniforge3 module (which provides access to the conda command), 24.3.0-0 will be removed. All users should update to using the new version 25.11.0-1.
Subscribe via RSS
