Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 9.1-14

== General ==

- New Features

* Support for SLES15sp4

- Improvements

* Update lua to 5.4.4 (CVE-2022-28805)
* Update openssl to version 1.1.1q
* Update cuda-dcgm to version 2.4.6.1
* Update mlnx-ofed49 to version 4.9-5.1.0.0
* Updated mlnx-ofed56 to version 5.6-2.0.9.0
* Added cuda11.7 packages

== CMDaemon ==

- New Features

* Added CMDaemon advanced configuration options for customizing global nginx.conf values

- Improvements

* Remove the user home directories asynchronously, which resolves an issue with the delete process appearing hung when the user's home directory contains many files
* Performance improvements in CMDaemon to decrease the start-up time of the head node CMDaemon on clusters with many pending WLM jobs
* Reduce verbosity of the 'result for obsolete tracker' messages, so that they are no longer included by default in the CMDaemon log file
* CMDaemon will now produce a warning when a new image is added without an underlying directory structure
* Improved logic when invalidating the nscd hosts cache on the compute nodes, to avoid cases where an outdated cache interferes with hostnames lookup
* CMDaemon certificates are now generated with a start date of 1 calendar day before the issue date, instead of the Unix epoch
* An issue where cm-manipulate-advanced-config.py is missing a python import, resulting in a crash when executed
* Ensure that the Kubernetes NetworkPolicy feature works when the kube-proxy masqueradeAll flag is disabled
* An issue where monitoring data for labeled entities is not preserved after the entity has been dropped and automatically re-added

- Fixed Issues

* In some cases, an issue with re-establishing the edge nodes sessions to director after a restart of CMDaemon
* An issue with CMDaemon events delivery to edge nodes, which can result in an outdated information about committed entities
* Fix an issue with setting up Kubernetes where if the passive head node is the active leader according to Etcd, then Kubernetes will not always be able to initialize properly
* In rare occasions CMDaemon can hang while stopping due to a blocking SSL read operation
* An issue where PBS queue options set in CMDaemon may not be set in the PBS server configuration
* Send the rsyslog log to both head nodes for on-prem compute nodes
* Fix an issue with generating a valid Kubernetes kubeconfig for users with special characters in their login name. Performance improvements of the user manager
* Fix rare crash in CMDaemon while cloning an image
* An issue where CMDaemon may not (re)generate the Slurm logrotate files in some cases, such as when the files are modified or deleted outside of CMDaemon
* Fix possible crash in the provisioning status code
* Fix a potential buffer overflow
* An issue where CMDaemon can crash if the Bright View monitoring tree call does not pass a context
* Add full support for multi-value http request parameters, to resolve an issue where the "CMDaemon ready" service is not able to handle a list of services by name
* In some cases, terminating spot instances with CMDaemon may fail if the spot request has been cancelled outside of CMDaemon
* In some cases, an issue with upgrading from Bright 8.x to 9.1 due to an invalid SQL statement
* An issue where CMDaemon may occasionally hang on SSL_read while stopping
* An issue where the default gateway may not be set on a cluster with an aliased external network interface

== Bright View ==

- Fixed Issues

* An issue where SLURM/GRES file templates are not read-only in Bright View and can be modified

== Node Installer ==

- Improvements

* Allow the node-installer to continue with configuring IPMI after encountering a failure to set username and password when the user already exists

- Fixed Issues

* An issue in the ilo_power.pl script, which can break the remote power management for nodes that use an ilo0 interface for the power control

== User Portal ==

- Fixed Issues

* An issue where the user portal can show only 1 core for a compute node, regardless of the actual number of cores

== cm-create-image ==

- Fixed Issues

* An issue where images created with cm-create-image do not preserve the xattrs of the base tar image
* An issue where node-installer images created using the cm-create-image tool do not have an updated rsyslog.conf file

== cm-kubernetes ==

- Improvements

* Make the PersistentVolumeClaims privilege part of the default list of privileges for the users when Kubernetes is setup

== cm-kubernetes-setup ==

- Improvements

* Enable by default the selection of newer Kubernetes versions in the cm-kubernetes-setup screens
* An issue with Kubernetes on Edge deployments, where the stage "waiting for Root Service Account" is performed too early and may not complete successfully in some cases
* Fix an issue where Kubernetes version >= 1.21 is not deployed with masqueradeAll=false for kube-proxy, preventing NetworkPolicies from working
* In the Kubernetes module files, remove the MANPATH definitions which are no longer used
* In some cases, an issue with setting the Kubernetes labels for control-plane, master, and worker

- Fixed Issues

* Ensure cm-kubernetes-setup --default-cni-bin-dir flag updates all relevant roles

== cm-setup ==

- Fixed Issues

* Make cm-*-setup configuration file permissions more restrictive

== cm-uge ==

- Improvements

* Update the default settings in cm-uge to allow running OpenMPI jobs without involving ssh

== cm-wlm-setup ==

- Improvements

* Automatically remove the WLM settings from the Auto Scaler configuration when the WLM is disabled

- Fixed Issues

* An issue with making the pbs.service file available on the compute nodes with offloaded PBSPro server role, which prevents the PBSPro server from starting during the setup

== cmsh ==

- New Features

* Allow the --start and --end arguments in rangequery command to be specified as date/time stamps

- Improvements

* Warn that using rshell to image as non-root user might not work (As there is no ldap inside the image)
* Added --filter option to the environment command, to allow for showing only the entities that match the specified regex

- Fixed Issues

* An issue where the --user option of the cmsh "rshell" command does not take effect
* An issue where the monitoringdrop command does not drop the measurables when multiple measurables are specified on the command line
* An issue where the cmsh device switchport update may not save on commit
* An issue where the XSD validation is not always loaded in cmsh when configuring a disk setup for the compute nodes

== hwloc ==

- Improvements

* Update cm-hwloc2 to 2.7.1

== ml ==

- New Features

* Introduced packages cm-cudnn8.2-cuda11.4
* Introduced packages cm-cudnn8.4-cuda11.4

== openpbs22.05 ==

- Improvements

* Add OpenPBS 22.05 integration

== pythoncm ==

- Improvements

* pythoncm now includes periodic checks during the provisioning wait, to ensure that tools such as cm-wlm-setup do not time out while the nodes are being provisioned

== slurm ==

- Improvements

* Add slurmrestd service file to the Slurm packages

- Fixed Issues

* Rebuild the Ubuntu Slurm packages with cm-pmix3

== slurm22.05 ==

- Improvements

* Update to 22.05.3
* Add Slurm 22.05 support