Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 9.1-15

== General ==

- Improvements

* Add CUDA 11.8 packages
* Add mlnx-ofed57 packages for the Mellanox 5.7 OFED stack
* Ubuntu 20.04: update to 20.04.5.
* Upgrade pyxis to 0.14.0
* Update mlnx-ofed54 to version 5.4-3.5.8.0

- Fixed Issues

* Automatically start slurmdbd when Slurm configuration is frozen in cmd.conf
* An issue where the power script execution environment does not include the CMD_NODE_INSTALLER_PATH variable, which prevents custom power scripts (such as ilo_power.pl) from performing power operations

- Deprecated features

* OpenShift integration

== CMDaemon ==

- Improvements

* Exclude all /snap/.* mount points from the "procmounts" sampler, which otherwise creates unnecessary metrics in CMDaemon
* Copy the file cluster.csr.new on all headnodes during install-license
* Increase the default for Kubernetes kubelet's --max-pods from 50 to 110 for new installations

- Fixed Issues

* An issue with removing job queues when using the JobQueue remove pythoncm call
* An issue with updating the Slurm configuration when the secondary head node is the active head node
* An issue with the json whoami API call returning a username instead of a profile
* An issue with removing OSDs from a Ceph cluster if the corresponding OSD nodes are down
* An issue where the version config file timestamps (versionconfigfiles=yes) are always set to the Unix epoch (1970)
* An issue where a cloud director power off may hang for up to a minute if the node is already off
* An issue with merging CMDaemon monitoring execution multiplexers into one, which results in only the last multiplexer taken into account

== Bright View ==

- Fixed Issues

* An issue where the main menu is not shown for logged-in users with a read only profile

== Head Node Installer ==

- Fixed Issues

* An issue with head node installations with Lmod where the DefaultModules.lua module file is not created by default, resulting in messages about empty LMOD_SYSTEM_DEFAULT_MODULES environment variable

== Machine Learning ==

- New Features

* Updated cm-cub-* packages to v1.17.2
* Deprecated ML package cm-chainer-py39-cuda11.2-gcc9
* Introduced ML package cm-cutensor-cuda11.7
* Introduced ML package cm-ml-distdeps-cuda11.7
* Introduced ML package cm-nccl2-cuda11.7-gcc9
* Introduced ML package cm-cudnn8.5-cuda11.7

- Improvements

* Update cm-openmpi4-*-cuda-* packages to v4.1.4

== cm-clone-install ==

- New Features

* Do not include loop device mounts (if present) when generating the disk setup XML for the head node for cloning when using cm-clone-install

== cm-scale ==

- Fixed Issues

* An issue with starting nodes for multi-node jobs requesting more memory per node than the available memory divided by the number of requested nodes

== cmsh ==

- Improvements

* Add --update-containers support to the cmsh device foreach command

== pbspro2022 ==

- Improvements

* Add support for PBS Pro 2022

== slurm21.08 ==

- Fixed Issues

* Incorrect path to the failedprejob and allprejob directories, causing the prolog-prejob script to fail

== slurm22.05 ==

- Improvements

* Upgrade Slurm to 22.05.6