Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.1-8
== General ==
- Improvements
* Added cuda 11.4 packages
* Added Mellanox OFED 5.4 packages
* Added mlnx-ofed53 package
* An issue with Grafana, where since v8.0 it defaults to POST instead of GET for requests
* Support for RHEL 8.4 and CentOS 8.4 head node installer
== cmdaemon ==
- Improvements
* In some cases, an issue with the directories permissions for k8s certificates
* Ability to process all Jobs in the job-info-management.py maintenance script
* Prometheus exporter timestamps can be too old or in the future
* Allow to append drain reason for already drained node
* Improved Bright head node HA failover handling w/r/t Kubernetes
- Fixed Issues
* Parallel versions of RPC missing in the API docs
* An issue with showing metrics owned by a category in the monitoring tree
* An issue with ssh access to non-master job nodes when usernodelogin=onlywhenjob for PBS, UGE, and LSF
* An issue with pythoncm latest monitoring counter returning averaged data
* Some device status metric enum values not added after an upgrade of Bright 8.2 to 9.1
* An issue with taking into account the DCGM exclude healthcheck GlobalConfig parameter
* sshd banner can cause failure of the ssh2node health check
* In some cases, an issue with sorting the WLM server nodes by their rank
* An issue with cmsh samplenow --debug flag not passed to the scripts
* Custom timezone set for a node can be overwritten by an image update
* Patial http get can results in 100% CPU load
* In some cases, failure to collect partition metrics after cmd has been running for a long time
* A BMC interface can be marked as external interface if it is the only interface on the external network in an edge setup
* Ensure Bright Kubernetes labels are set correctly for category and racks names that contain spaces
* Improved power operation result handling when a cloud node AWS instance has been terminated outside of cmdaemon
* Allow spaces in Slurm queue {Allow,Deny}{Groups,Accounts} parameters
* An issue with caching labeled entity data on the passive head node, which can result in missing monitoring data
* In some cases, bad nullptr check can cause the configuration dumper to crash
* In some cases, cloning an image with revisions in pythoncm duplicates too much data, which can result in an image that cannot be cleanly removed
* Added nullptr checks for image-remove, to prevent possible crashes on incorrectly configured revision information
* Negative intervals passed to PromQL can cause cmdaemon to crash
* An issue with hwloc integration on Ubuntu
* An issue with edge HA directors not provisioning from the head node
* An issue with renaming PBS nodes
== node-installer ==
- Fixed Issues
* Update the standalone script to work with REL/Centos 8.4
* Exclude dracut network modules from being added to the original initrd image for RHEL
* In some cases, an issue with detecting InfiniBand boot device names
== Bright View ==
- Improvements
* Do not hide the grid columns when dragged out of the grid
* Deleting a dashboard now requires confirmation
- Fixed Issues
* An issue with the redirect to the active head node when connected to the passive head node
* An issue with chargeback getting the correct number of cores and job slots used by jobs
== bright-installer ==
- Fixed Issues
* An issue with using local repositories for add-on installations
== cm-kubernetes-setup ==
- New Features
* Added Kubernetes state metrics addon for additional metrics
== cm-openssl ==
- New Features
* Upgrade cm-openssl packages to 1.1.1l (CVE-2021-3711)
== cm-scale ==
- Fixed Issues
* The Auto Scaler now honors the Slurm job start time, when a job is submitted with a begin time in the future
== cm-setup ==
- Improvements
* An issue with adding shorewall rules to both head nodes in the case of head node HA
== cm-wlm-setup ==
- Fixed Issues
* An issue with the mail program path in smail utility on Ubuntu
== cmsh ==
- Fixed Issues
* cmsh 'set property 1' can set the property to false instead of true
* escape '>' in cmsh tab completion
== cuda-dcgm ==
- Fixed Issues
* An issue with taking into account the DCGM exclude healthcheck GlobalConfig parameter
== head node installer client ==
- Improvements
* An issue with updating the values in the IP addresses fields when pressing the Tab key (and not the Enter key)
== ml ==
- New Features
* Updated cm-opencv4-* packages to v4.5.3
* Updated cm-fastai2-* packages to v2.4.1
* Updated cm-pytorch-* packages to v1.9.0
* Updated cm-horovod-* packages to v0.22.1
* Updated cm-gpytorch-* packages to v1.5.0
* Updated cm-xgboost-* packages to v1.4.2
* Extended support for ML package for CUDA 11.2
* Extended support for ML package for Ubuntu 20.04
- Improvements
* Deprecate cm-openmpi-geib-* packages
* Extended support of B4DS addon to NVIDIA DGXs
== pbspro2020 ==
- New Features
* Upgrade PBS Pro 2020 to 2020.1.4
== pbspro2021 ==
- New Features
* Upgrade PBS Pro 2021 to 2021.1.1
== pythoncm ==
- Improvements
* Added extra parameters for the pythoncm node dump monitoring function
- Fixed Issues
* An issue with the pythoncm parallel.version_information RPC call
== slurm ==
- New Features
* slurm20 and slurm20.11 are now built with JWT
== slurm20.11 ==
- Improvements
* Update slurm20.11 packages to 20.11.8
== user portal ==
- Fixed Issues
* An issue where the user portal may use timestamps in milliseconds instead of seconds for API calls