Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 9.2-10

== General ==

- New Features

* Added CUDA toolkit packages for ARM64

- Improvements

* Updated python2-pyasn1 to 0.4.2-3.2.1
* Updated cm-nvhpc to 23.1

- Fixed Issues

* Decreased the size of the rescue environment ramdisk, to allow some Dell hardware to boot in UEFI mode (e.g., Dell PowerEdge R650)
* An issue with cm-chroot-sw-img unable to execute a shell in the software image when the user has defined a $SHELL environment variable for a shell that is not present in the software image

== CMDaemon ==

- Improvements

* On Ubuntu base distribution, inet=manual is now added to the configuration files for network interfaces without an IP, so that they can be brought up by the operating system
* Allow the option to specify inet=manual for network interfaces on Ubuntu base distribution (set revision inet=manual)
* Process the custom /cm/conf/{category,node} files also after an imageupdate of a running node so that the customizations are preserved

- Fixed Issues

* CMDaemon crash on Rocky 9.1 / RHEL 9.1 base distribution when a regular user logs in to the user portal or Bright View
* An issue where overriding the kernel module parameter at the node-level does not take effect and the kernel module parameters are rather inherited from the software image
* An issue where automatic FSExport are not added for nodes in a type3 network setup
* An issue with monitoring data plots consisting of consolidated and raw data sources
* Delay the start of the slurmd service until after the MIG configuration is updated when the compute node is booted
* An issue where the Etcd health check script may fail for nodes with tagged VLAN interfaces
* An issue where MIG operations may not be able to complete, which can prevent future MIG operations
* An issue where MIG apply may timeout when CMDaemon is starting because the timeout is too short
* An issue with the automatic switch of the monitoring node when the passive head node goes down for a prolonged period of time
* In some cases, CMDaemon crash in the sysinfo implementation when the CMDaemon service is stopping
* A CMDaemon memory leak when the Slurm placeholders maxnodes value is less than nodes in the queue
* An issue where the bond primary=name directive is not written for the underlying physical network interface on Ubuntu

== Bright View ==

- Fixed Issues

* An issue with the reinstall and sync Bright View actions for software images
* An issue with assigning the slurmclient role directly to a compute node in order to override the slurmclient role at the overlay or the category level
* An issue where Bright View may show "No response from the server yet because client quit" upon logout

== Cluster Tools ==

- Improvements

* Automatically detect environmental proxies in cm-diagnose

- Fixed Issues

* An issue where the cm-restore-db-password script may not reset the database user password in mysql for Slurm

== Machine Learning ==

- Fixed Issues

* Updated cm-tensorflow2-* to 2.11.0
* An issue where importing the tensorflow python module with tensorflow2-py39-cuda11.2-gcc9 yields a RequestsDependencyWarning message

== cm-kubernetes-setup ==

- New Features

* Added Network Operator to the cm-kubernetes-setup script

- Improvements

* Wait explicitly for the Ingress Controller to be up & running to avoid potential issues later when deploying other operators

- Fixed Issues

* Clean up the nginxreverseproxy and nginx.conf configurations when Kubernetes is uninstalled

== cm-scale ==

- Fixed Issues

* An issue where cm-scale tries to match the Kubernetes pods or jobs labels to the node's labels. This is now disabled by default

== cmburn ==

- Improvements

* Added cm-gpu-burn package for Rocky9/RHEL9 base distributions

== licensing ==

- Fixed Issues

* Remove the license expiration warning on the secondary head node after installing a new license

== openpbs20 ==

- Fixed Issues

* An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== openpbs22.05 ==

- Fixed Issues

* An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== pbspro2021 ==

- Fixed Issues

* An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== pbspro2022 ==

- Fixed Issues

* An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== slurm ==

- New Features

* Rebuild Slurm with CUDA 11.8

== slurm22.05 ==

- Improvements

* Updated Slurm to 22.05.8