Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 9.1-8

== General ==

- Improvements

* Added cuda 11.4 packages
* Added Mellanox OFED 5.4 packages
* Added mlnx-ofed53 package
* An issue with Grafana, where since v8.0 it defaults to POST instead of GET for requests
* Support for RHEL 8.4 and CentOS 8.4 head node installer

== cmdaemon ==

- Improvements

* In some cases, an issue with the directories permissions for k8s certificates
* Ability to process all Jobs in the job-info-management.py maintenance script
* Prometheus exporter timestamps can be too old or in the future
* Allow to append drain reason for already drained node
* Improved Bright head node HA failover handling w/r/t Kubernetes

- Fixed Issues

* Parallel versions of RPC missing in the API docs
* An issue with showing metrics owned by a category in the monitoring tree
* An issue with ssh access to non-master job nodes when usernodelogin=onlywhenjob for PBS, UGE, and LSF
* An issue with pythoncm latest monitoring counter returning averaged data
* Some device status metric enum values not added after an upgrade of Bright 8.2 to 9.1
* An issue with taking into account the DCGM exclude healthcheck GlobalConfig parameter
* sshd banner can cause failure of the ssh2node health check
* In some cases, an issue with sorting the WLM server nodes by their rank
* An issue with cmsh samplenow --debug flag not passed to the scripts
* Custom timezone set for a node can be overwritten by an image update
* Patial http get can results in 100% CPU load
* In some cases, failure to collect partition metrics after cmd has been running for a long time
* A BMC interface can be marked as external interface if it is the only interface on the external network in an edge setup
* Ensure Bright Kubernetes labels are set correctly for category and racks names that contain spaces
* Improved power operation result handling when a cloud node AWS instance has been terminated outside of cmdaemon
* Allow spaces in Slurm queue {Allow,Deny}{Groups,Accounts} parameters
* An issue with caching labeled entity data on the passive head node, which can result in missing monitoring data
* In some cases, bad nullptr check can cause the configuration dumper to crash
* In some cases, cloning an image with revisions in pythoncm duplicates too much data, which can result in an image that cannot be cleanly removed
* Added nullptr checks for image-remove, to prevent possible crashes on incorrectly configured revision information
* Negative intervals passed to PromQL can cause cmdaemon to crash
* An issue with hwloc integration on Ubuntu
* An issue with edge HA directors not provisioning from the head node
* An issue with renaming PBS nodes

== node-installer ==

- Fixed Issues

* Update the standalone script to work with REL/Centos 8.4
* Exclude dracut network modules from being added to the original initrd image for RHEL
* In some cases, an issue with detecting InfiniBand boot device names

== Bright View ==

- Improvements

* Do not hide the grid columns when dragged out of the grid
* Deleting a dashboard now requires confirmation

- Fixed Issues

* An issue with the redirect to the active head node when connected to the passive head node
* An issue with chargeback getting the correct number of cores and job slots used by jobs

== bright-installer ==

- Fixed Issues

* An issue with using local repositories for add-on installations

== cm-kubernetes-setup ==

- New Features

* Added Kubernetes state metrics addon for additional metrics

== cm-openssl ==

- New Features

* Upgrade cm-openssl packages to 1.1.1l (CVE-2021-3711)

== cm-scale ==

- Fixed Issues

* The Auto Scaler now honors the Slurm job start time, when a job is submitted with a begin time in the future

== cm-setup ==

- Improvements

* An issue with adding shorewall rules to both head nodes in the case of head node HA

== cm-wlm-setup ==

- Fixed Issues

* An issue with the mail program path in smail utility on Ubuntu

== cmsh ==

- Fixed Issues

* cmsh 'set property 1' can set the property to false instead of true
* escape '>' in cmsh tab completion

== cuda-dcgm ==

- Fixed Issues

* An issue with taking into account the DCGM exclude healthcheck GlobalConfig parameter

== head node installer client ==

- Improvements

* An issue with updating the values in the IP addresses fields when pressing the Tab key (and not the Enter key)

== ml ==

- New Features

* Updated cm-opencv4-* packages to v4.5.3
* Updated cm-fastai2-* packages to v2.4.1
* Updated cm-pytorch-* packages to v1.9.0
* Updated cm-horovod-* packages to v0.22.1
* Updated cm-gpytorch-* packages to v1.5.0
* Updated cm-xgboost-* packages to v1.4.2
* Extended support for ML package for CUDA 11.2
* Extended support for ML package for Ubuntu 20.04

- Improvements

* Deprecate cm-openmpi-geib-* packages
* Extended support of B4DS addon to NVIDIA DGXs

== pbspro2020 ==

- New Features

* Upgrade PBS Pro 2020 to 2020.1.4

== pbspro2021 ==

- New Features

* Upgrade PBS Pro 2021 to 2021.1.1

== pythoncm ==

- Improvements

* Added extra parameters for the pythoncm node dump monitoring function

- Fixed Issues

* An issue with the pythoncm parallel.version_information RPC call

== slurm ==

- New Features

* slurm20 and slurm20.11 are now built with JWT

== slurm20.11 ==

- Improvements

* Update slurm20.11 packages to 20.11.8

== user portal ==

- Fixed Issues

* An issue where the user portal may use timestamps in milliseconds instead of seconds for API calls