Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 8.2-28
== General ==
- New Features
* Added settings to allow administrators to configure a script to be run after nodes come up
- Improvements
* Deployment of IBM Spectrum LSF Suite is no longer supported. Going forward, the only way to use LSF on a Bright cluster is to use LSF Standard Edition (which will remain a supported option)
* Added a metric that reports nodes' WLM drain status
* Added Mellanox 5.6 OFED stack (mlnx-ofed56 packages)
* Added Mellanox 5.6 OFED stack (mlnx-ofed55 packages)
* Updated mlnx-ofed54 to version 5.4-3.1.0.0
* Updated mlnx-ofed49 to version 4.9-4.1.7.0
* Updated cuda-dcgm to version 2.3.5-1
* Updated cuda-driver to version 510.47.03
* Added CUDA 11.6 packages
* Added CUDA 11.5 packages
- Fixed Issues
* mlnx-ofed56, mlnx-ofed55, mlnx-ofed54, mlnx-ofed49: Added Mellanox OFED KMOD/KMP package build functionality for RPM based distributions
* cuda-driver: Load the nvidia_drm kernel module from the cuda-driver script. Which, otherwise, can result in missing EGL devices in /dev/dri
* An issue in the the KubernetesComponentsStatus healthcheck that caused false failures reports
== cmdaemon ==
- Known Issues
* On SLES, updating cmdaemon and cm-boost results in cmdaemon unable to start during the package update. Starting cmdaemon after the update is completed will resolve the issue
- New Features
* Allow setting a custom per network interface MTU or disabling setting the MTU in the network interface configuration file
- Improvements
* Rewrite the mysql health check, so that it does not require the mysql password to supplied on the command line
* Modifying a network in cmdaemon that is used by Kubernetes will now request the relevant Kubernetes services to update their configuration and restart
* Add category labels to the devices in PromQL
- Fixed Issues
* An issue where password crypt can generate duplicate edge site secret hashes
* An issue where some older base distribution versions of openssl are unable create FIPS compliant DH parameters during add-on installation
* An issue where Slurm's partition extra options may not be preserved in the configuration if scontrol temporarily does not work
* An issue where, in the case of head node HA, cmd.conf global configuration settings could get stored and sent to nodes multiple times
* An issue that prevented gpusettings from being overridden at the node level from their category values
* An issue with executing monitoring actions on incorrectly filtered-out data
* An issue with removing image-update provisioning requests when the node-installer is running, which can cause an unexpected image update right after cmdaemon has started while the mount scripts may still be busy
* In some cases, the cmsh monitoringdataproducer command can crash cmdaemon if the command is executed in the category mode
== Bright View ==
- Fixed Issues
* An issue where some setup wizards were not available in Bright View when using Rocky as base distribution
== cm-kubernetes-setup ==
- Improvements
* Make sure helm is only initialized if Kubernetes version <= 1.18 is deployed
== cm-scale ==
- Fixed Issues
* An issue with starting nodes for PBS jobs with memory requirements not defined in bytes
== cuda-dcgm ==
- Fixed Issues
* An issue with DCGM python bindings linking with DCGM libraries
== ml ==
- New Features
* Updated cm-tensorflow-* packages to v2.7.0
* Introduced support for Machine Learning packages on py39/gcc9
* Updated cm-tensorflow2-* packages to v2.5.2
* Updated cm-xgboost-* packages to v1.5.0
- Improvements
* Stopped upgrading ML packages cm-dynet-*
* Stopped upgrading the machine learning packages for sles12
* Stopped upgrading the machine learning packages for cuda10.2