Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.2-2
== General ==
- New Features
* Support for SUSE Linux Enterprise Server 15 SP3
* New package cm-apptainer as a replacement of the existing cm-singularity package due to upstream project re-branding
- Improvements
* Update cm-docker to v20.10.14
* Update cm-docker-compose to v2.4.1
* Update nvhpc to version 22.3
* Update cuda-dcgm to version 2.3.5-1
* Update openssl to 3.0.2 / 1.1.1n for CVE-2022-0778
* Update cuda-driver to version 510.47.03
* Update cuda11.6 toolkit packages to 11.6 update 2
* Use kernel version 5.13.0 by default when installing Bright with Ubuntu 20.04 base distribution
== cmdaemon ==
- New Features
* Added an option to get the latest monitoring counters using REST
- Improvements
* cmdaemon will now show a warning when a prejob data producer is defined but prejob is not enabled for the workload manager
* cmdaemon no longer restarts the named and ntp services on HA takeover by default
* Group the Interconnect system information (sysinfo) to reduce the memory footprint of cmdaemon
* Sort the list of disk partitions in system information
* Add category labels to the devices in PromQL
* New Program Runner tracing levels to make it less verbose by default, which can decrease the number of logged lines in the cmdaemon log file
* cm-scale template cloud nodes can no longer be booted or selected in the node-installer
* Add gpu-nvlink-bandwidth-total metric
* Allow /cm/shared to be provisioned to the passive edge director for cases without shared storage in the edge
* Added the wlm filter cmsh command to cm-diagnose to be able to collect some workload managers jobs information
* New REST endpoints for license, version, device, and sysinfo
- Fixed Issues
* An issue where the monitoring data for completed jobs is not always removed, which can lead to cmdaemon allocating too much memory
* In some cases, HA takeover may fail due to write history update RPC for report file changes flooding the head node (and its log files)
* Do not retry CMProc::rexecCommand when the ptracker is no longer defined, which otherwise can result in error messages in the cmdaemon log file
* An issue with sampling the user counts metric for a head node
* An issue where when MIG is enabled and there is no data the GPU utilization metric incorrectly shows a large number
* An issue with dumping the data for all entities and measurables when using the REST API
* An issue with the pythoncm programrunnerstatus kill method not working in some cases
== cluster-tools ==
- Fixed Issues
* Improve the log messages from the DAS shared storage mount and umount scripts to include the hostname of the head node
== cm-kubernetes-setup ==
- Improvements
* The 'enabled' fields under the 'calico:' and 'flannel:' blocks in the cm-kubernetes-setup configuration files are no longer used and have been removed
== cm-scale ==
- Fixed Issues
* In some cases, an issue with detecting failures to create cloud node instances
== cmsh ==
- Improvements
* An issue where the list of WLM jobs in cmsh may not be correctly sorted when using the filter command
* Display the full labeled entity index when the --verbose flag is used in cmsh
- Fixed Issues
* Tab completion for zones and policies in the cmsh roles mode
* cmsh permissions on Ubuntu are 700 instead of 755
== cod ==
- Improvements
* Apply the tags specified during the creation of an AWS COD cluster also to the AWS nodes and EBS volumes created by the head node after the cluster is set up