Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 9.1-2

== cmdaemon ==

- New Features

* Calculate GPUs UP/total metrics for cluster overview
* Support for OpenPBS

- Improvements

* In some cases, litenodes can be reported as DOWN when they are UP
* Improved burn status overview
* An issue with taking into account the burning status in the device status monitoring history
* Added extra Prometheus series queries for: hostname, category, wlm, job_id
* Added Prometheus /api/v1/series support for hostname
* In some cases, an upgrade of 8.2 to 9.x can leave duplicate aggregate node PDU, preventing commit of data samplers
* Drain reason specification for PBS*, UGE and LSF

- Fixed Issues

* Added /var/lib/shorewall to the exclude list to prevent shorewall from breaking after image update
* An issue with LSF configuration of Linux Cgroup Accounting and Process Tracking
* An issue with power operations being executed more than once when the device is passed multiple times
* An issue with curl reporting error 65 stream rewind in cmsh
* An issue with long host or network names, which can cause the head node cmdaemon certificate to go over the SSL CSR limits and crash cmd
* An issue with setting Burning status
* Improved /tftpboot rsync after images are cloned, to prevent missing symlinks when the images are used too soon
* An issue with the node-to-queue mapping that is shown in cmsh
* In some cases, monitoring data from multiple sources may be displayed in the wrong order
* In some cases, HeadNode IP fallback doesn't take active/passive into account
* An issue with the monitoring data points interpolation when a node is powered off
* In some cases, the passive director is not able to confirm the active is powered off when the active has gone down
* An issue with cluster.pem becoming out of date in the head node images if an HA take over takes place shortly after requesting a new license
* In some cases, other node device status on edge nodes can become out of date
* An issue with enum value cache in the head node DB
* default-environment module output confusing for the ssh2node healthcheck
* Support EBS volume tags
* Support for using pre-created public IPs with cluster extension in Azure
* An issue with offloaded PBSPro drain operations
* An issue with failing health checks when Etcd is running on the head node

== Bright View ==

- Improvements

* Ability to save the Rack View settings

- Fixed Issues

* An issue with the dropdowns in the node identification screen
* Hide unused field totalNodes and TotalCPUs in Jobqueue

== cm-docker-setup ==

- New Features

* Multiple docker registry mirrors can be configured during setup

== cm-kubernetes-setup ==

- New Features

* Multiple docker registry mirrors can be configured during setup

== cm-scale ==

- New Features

* Tracker parameter workloadsPerNode now is counted

- Fixed Issues

* An issue with power operations being executed more than once when the device is passed multiple times
* An issue with default gpu and memory specification for non existent cloud nodes (that will be cloned from template)
* Improved Auto Scaler which now, when possible, allocates nodes that already have the right software image and are already in the right configuration overlay

== cm-wlm-setup ==

- New Features

* The Slurm accounting database can be installed on a configurable node

== cmburn ==

- Improvements

* An issue with cmburn on rhel8 and centos8

== cmsh ==

- Fixed Issues

* An issue with "list -a" not showing deleted items in cmsh sub modes
* An issue with instant and range PromQL --delimiter in cmsh

== ml ==

- New Features

* Updated cm-cudnn8.0-* packages to v8.0.5.39
* Updated cm-cmake-* packages to v3.18.4
* Updated cm-opencv3-* packages to v3.4.12
* Updated cm-gpytorch-* packages to v1.2.1

== pythoncm ==

- New Features

* Allow pythoncm to be used with a user/pass cookie

- Improvements

* When using pythoncm, the dhcp IP of a node is now cleared on cloning of the node (which is more consistent with cmsh)
* Added raw Prometheus query support in pyhthoncm
* Allow cluster.get_by_type to be used using string types

- Fixed Issues

* Do a name based unique check for generic roles in pythoncm
* An issue with pythoncm power history
* An issue with python module expand brackets for a single digit range

== slurm20 ==

- Improvements

* Added AMD GPU plugin to slurm19 and slurm20 packages