Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 8.2-28

== General ==

- New Features

* Added settings to allow administrators to configure a script to be run after nodes come up

- Improvements

* Deployment of IBM Spectrum LSF Suite is no longer supported. Going forward, the only way to use LSF on a Bright cluster is to use LSF Standard Edition (which will remain a supported option)
* Added a metric that reports nodes' WLM drain status
* Added Mellanox 5.6 OFED stack (mlnx-ofed56 packages)
* Added Mellanox 5.6 OFED stack (mlnx-ofed55 packages)
* Updated mlnx-ofed54 to version 5.4-3.1.0.0
* Updated mlnx-ofed49 to version 4.9-4.1.7.0
* Updated cuda-dcgm to version 2.3.5-1
* Updated cuda-driver to version 510.47.03
* Added CUDA 11.6 packages
* Added CUDA 11.5 packages

- Fixed Issues

* mlnx-ofed56, mlnx-ofed55, mlnx-ofed54, mlnx-ofed49: Added Mellanox OFED KMOD/KMP package build functionality for RPM based distributions
* cuda-driver: Load the nvidia_drm kernel module from the cuda-driver script. Which, otherwise, can result in missing EGL devices in /dev/dri
* An issue in the the KubernetesComponentsStatus healthcheck that caused false failures reports

== cmdaemon ==

- Known Issues
* On SLES, updating cmdaemon and cm-boost results in cmdaemon unable to start during the package update. Starting cmdaemon after the update is completed will resolve the issue

- New Features

* Allow setting a custom per network interface MTU or disabling setting the MTU in the network interface configuration file

- Improvements

* Rewrite the mysql health check, so that it does not require the mysql password to supplied on the command line
* Modifying a network in cmdaemon that is used by Kubernetes will now request the relevant Kubernetes services to update their configuration and restart
* Add category labels to the devices in PromQL

- Fixed Issues

* An issue where password crypt can generate duplicate edge site secret hashes
* An issue where some older base distribution versions of openssl are unable create FIPS compliant DH parameters during add-on installation
* An issue where Slurm's partition extra options may not be preserved in the configuration if scontrol temporarily does not work
* An issue where, in the case of head node HA, cmd.conf global configuration settings could get stored and sent to nodes multiple times
* An issue that prevented gpusettings from being overridden at the node level from their category values
* An issue with executing monitoring actions on incorrectly filtered-out data
* An issue with removing image-update provisioning requests when the node-installer is running, which can cause an unexpected image update right after cmdaemon has started while the mount scripts may still be busy
* In some cases, the cmsh monitoringdataproducer command can crash cmdaemon if the command is executed in the category mode

== Bright View ==

- Fixed Issues

* An issue where some setup wizards were not available in Bright View when using Rocky as base distribution

== cm-kubernetes-setup ==

- Improvements

* Make sure helm is only initialized if Kubernetes version <= 1.18 is deployed

== cm-scale ==

- Fixed Issues

* An issue with starting nodes for PBS jobs with memory requirements not defined in bytes

== cuda-dcgm ==

- Fixed Issues

* An issue with DCGM python bindings linking with DCGM libraries

== ml ==

- New Features

* Updated cm-tensorflow-* packages to v2.7.0
* Introduced support for Machine Learning packages on py39/gcc9
* Updated cm-tensorflow2-* packages to v2.5.2
* Updated cm-xgboost-* packages to v1.5.0

- Improvements

* Stopped upgrading ML packages cm-dynet-*
* Stopped upgrading the machine learning packages for sles12
* Stopped upgrading the machine learning packages for cuda10.2