Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.1-19
== General ==
- Improvements
* Added mlnx-ofed24.01 packages
* Added mlnx-ofed24.04 packages
* Added support for Slurm 24.05
* Added support for AGE 2023.1.1 (8.8.1)
* Include the gsp firmware with the cuda-driver* packages
* Updated cm-nvhpc to 24.5
* Updated cuda-driver-535 to 535.183.01
* Updated cuda-driver-legacy-470 to 470.239.06
* Updated cuda-driver to 550.90.07
* Updated enroot to 3.5.0
* Updated mlnx-ofed23.10 to 23.10-3.2.2.0
* Updated mlnx-ofed58 to 5.8-5.1.1.2
* Updated PBS Professional 2022 to 2022.1.6
* Updated PBS Professional 2024 to 2024.1.1
* Updated Slurm 23.02 to 23.02.8
* Updated Slurm 23.11 to 23.11.9
- Fixed Issues
* An issue with an infinite loop in request-remote-assistance with nohup
== CMDaemon ==
- Improvements
* Allow the option to perform a periodic check if the head node IP has been changed on external DHCP renewal
* Allow the option to change the behavior of the monitoring drain action to not set a drain reason when draining the node(s)
* Allow to append or skip adding Slurm drain reason when a healthcheck fails with drain action enabled
- Fixed Issues
* An issue with the interfaces healthcheck when interface speeds are defined with a unit
* An issue with setting the user/group ownership of static configuration files managed by a generic role
* An issue with parsing of the requested CPUs setting for UGE jobs
* An issue with determining the number of requested CPUs for multi-node Slurm jobs when storing the jobs information in CMDaemon
* An issue with hard-coded references to /sbin/arping which in some cases can prevent CMDaemon from using arping in the event of a failover
* An issue with moving the software image revisions directories when updating the path of the parent software image
* An issue with temporary file names convention in some healthchecks
== Cluster Tools ==
- Fixed Issues
* An issue where cm-mysql-sanitize.py (which is required by cm-diagnose) is not part of the cluster-tools package
== Machine Learning ==
- Fixed Issues
* An issue where WLM kernels may be unexpectedly restarted if one of the kernels fails to start
== cm-scale ==
- Fixed Issues
* In some cases, an issue with saving the state of drained nodes when the head node is restarted, which can prevent the Auto Scaler from considering the nodes as available to the Auto Scaler
* An issue where the shutdown state from files may be used incorrectly
== slurm ==
- New Features
* Set the ENROOT_MOUNT_HOME configuration option to "no" by default for new setups