Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.0-21
== General ==
- Improvements
* Added CUDA 12.1 packages
* Added CUDA 12.2 packages
* Added CUDA 12.3 packages
* Added cuda-driver-legacy-470 package: 470 version of NVIDIA driver to support older datacenter/Tesla GPUs
* Added mlnx-ofed23.04
* Added mlnx-ofed23.07
* Added mlnx-ofed23.10
* The mlnx-ofed packages' installation scripts will now pin down the kernel packages for Ubuntu when deploying MOFED
* Updated cm-nvhpc to 23.11
* Updated cm-openssl to 1.1.1u
* Updated cuda-driver-legacy-470 to 470.223.02
* Updated cuda-driver to 545.23.08
* Updated mlnx-ofed49 to 4.9-7.1.0.0
* Updated mlnx-ofed54 to 5.4-3.7.5.0
* Updated mlnx-ofed58 to 5.8-4.1.5.0
* Updated openssl to 1.1.1w
- Fixed Issues
* Changed the architecture of the Lmod package from independent (noarch/all) to architecture dependent, which resolves the "module 'bit32' not found" issue on Ubuntu
* An issue where the 90-cm-sysctl.conf file is not marked as a configuration file on Ubuntu base distributions
== CMDaemon ==
- Improvements
* Added rules, alert, and alertmanagers Prometheus endpoints
* Disable the CMDaemon monitoring engine if it detects missing or truncated monitoring database files to prevent a possible CMDaemon crash
* The cpuspeedGovernor node and category property is no longer supported and has been removed from the CMDaemon configuration
* Ensure malformed strings in the GPU information do not corrupt the JSON serialization in CMDaemon
* Allow the option to use storcli software with the CMDaemon megaraid healthcheck
- Fixed Issues
* An issue with the Prometheus exporter for monitoring data when the data contains measurable names with spaces
* In some cases, CMDaemon may fail to trigger a provisioning request for a modified file when the names of two images begin with the same substring
* An issue with printing informational messages in the mounts health check implementation
* An issue where a restart of CMDaemon on the head node can cause CMDaemon on the compute nodes to perform a generally harmless restart of the Slurmd service also when there are no configuration changes
* An issue where the cumulative flag passed to CMDaemon by a JSON monitoring sampler script is not interpreted during initialization
* An issue where the interfaces health check can report failure on compute nodes with a ConnectX IB card in UEFI mode as the BOOTIF interface
* In some cases, an issue where cm-diagnose may not collect the required information from the primary/passive head node when the secondary head node is the active head node
* An issue with consecutive executions of "open --failbeforedown" to open devices with cmsh when the value of the failbeforedown counter is not changed
== Node Installer ==
- Improvements
* An issue where the node-installer may halt with a message "Unable to determine accelerators" due to temporary issues with listing the devices with lspci
- Fixed Issues
* An issue with provisioning compute nodes with separate /usr filesystem
* An issue that prevented cloning headnodes with btrfs filesystems
* An issue with the node-installer disk scripts being unable to assemble MD raids
* An issue with the bootif_detect and getclientid scripts on compute nodes that PXE booting from ConnectX-3 cards and using the GRUB bootloader
* An issue where the RDMA settings are not added to the corresponding entries in the /etc/fstab file when using NFS over RDMA
* In some cases, an issue with the bootif_detect script is unable to detect the correct InfiniBand (IB) device when there are multiple IB interfaces
== Head Node Installer ==
- Improvements
* Slurm 23.02 is now installed by default for new cluster installations
== Machine Learning ==
- New Features
* Introduced ML package cm-cudnn8.9-cuda12.1 and cm-cudnn8.9-cuda12.0
* Introduced ML package cm-cudnn8.5-cuda11.8
== cm-cluster-extension ==
- Fixed Issues
* Fixed various issues related to Azure caused by changes in the Azure API
== cm-diagnose ==
- Improvements
* Sanitize all mysqldumps in cm-diagnose
== cm-jupyter-setup ==
- Fixed Issues
* An issue that prevented cm-jupyter-setup from running in multi-distro environments
== cm-wlm-setup ==
- Fixed Issues
* An issue where Ubuntu-based cm-wlm-setup is unable to complete the setup of slurm if the Slurm packages had previously been removed by using the purge package manager option
== jupyter ==
- Improvements
* Update the JupyterLab and JupyterHub dependencies to the most recent versions
== openpbs23.06 ==
- Improvements
* Added OpenPBS 23.06 packages
== pythoncm ==
- Fixed Issues
* An issue in the collapse Bracket code which in some cases can produce an error "Solver::find, cleared zero bits" when handling a selection of hostnames