Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.0-16
== General ==
- Improvements
* Added cuda 11.4 packages
* Added Mellanox OFED 5.4 packages
* Added mlnx-ofed53 package
* An issue with Grafana, where since v8.0 it defaults to POST instead of GET for requests
* Update mlnx-ofed52 to version 5.2-2.2.3.0
* Update mlnx-ofed49 to version 4.9-3.1.5.0
== cmdaemon ==
- Improvements
* In some cases, an issue with the directories permissions for k8s certificates
* Ability to process all Jobs in the job-info-management.py maintenance script
* Improved Bright head node HA failover handling w/r/t Kubernetes
- Fixed Issues
* Parallel versions of RPC missing in the API docs
* An issue with showing metrics owned by a category in the monitoring tree
* An issue with ssh access to non-master job nodes when usernodelogin=onlywhenjob for PBS, UGE, and LSF
* sshd banner can cause failure of the ssh2node health check
* In some cases, an issue with sorting the WLM server nodes by their rank
* In some cases, failure to collect partition metrics after cmd has been running for a long time
* A BMC interface can be marked as external interface if it is the only interface on the external network in an edge setup
* Ensure Bright Kubernetes labels are set correctly for category and racks names that contain spaces
* Improved power operation result handling when a cloud node AWS instance has been terminated outside of cmdaemon
* Allow spaces in Slurm queue {Allow,Deny}{Groups,Accounts} parameters
* In some cases, bad nullptr check can cause the configuration dumper to crash
* In some cases, cloning an image with revisions in pythoncm duplicates too much data, which can result in an image that cannot be cleanly removed
* Added nullptr checks for image-remove, to prevent possible crashes on incorrectly configured revision information
* Negative intervals passed to PromQL can cause cmdaemon to crash
* An issue with hwloc integration on Ubuntu
* An issue with time conversion of the additional node information active timeout
* Call 'nscd -i hosts' after hostname of a device is changed to invalidate the cache
* An issue with cmdaemon keeping track of canceled job arrays that have never been scheduled with Slurm
* In some cases, ssh configuration for the edge nodes is not always being generated until the edge site is committed again
* Improved detection and wait for the raid arrays to assemble
== node-installer ==
- Fixed Issues
* Update the standalone script to work with REL/Centos 8.4
* Exclude dracut network modules from being added to the original initrd image for RHEL
* In some cases, an issue with detecting InfiniBand boot device names
* Support for boot-over-IB with grub
== Bright View ==
- Improvements
* Do not hide the grid columns when dragged out of the grid
- Fixed Issues
* An issue with the redirect to the active head node when connected to the passive head node
== cm-kubernetes-setup ==
- New Features
* Added Kubernetes state metrics addon for additional metrics
== cm-openssl ==
- New Features
* Upgrade cm-openssl packages to 1.1.1l (CVE-2021-3711)
== cm-scale ==
- Fixed Issues
* The Auto Scaler now honors the Slurm job start time, when a job is submitted with a begin time in the future
== cm-setup ==
- Improvements
* An issue with adding shorewall rules to both head nodes in the case of head node HA
== cm-wlm-setup ==
- Fixed Issues
* An issue with the mail program path in smail utility on Ubuntu
* An issue with configuring Slurm slots for GPU nodes
== cmsh ==
- Fixed Issues
* escape '>' in cmsh tab completion
* An issue with parsing selection/pattern options for the grabimage command
== ml ==
- New Features
* Updated cm-opencv4-* packages to v4.5.3
* Updated cm-fastai2-* packages to v2.4.1
* Updated cm-pytorch-* packages to v1.9.0
* Updated cm-horovod-* packages to v0.22.1
* Updated cm-gpytorch-* packages to v1.5.0
* Updated cm-xgboost-* packages to v1.4.2
* Extended support for ML package for CUDA 11.2
- Improvements
* Deprecate cm-openmpi-geib-* packages
* Extended support of B4DS addon to NVIDIA DGXs
== pbspro2020 ==
- New Features
* Upgrade PBS Pro 2020 to 2020.1.4
== pbspro2021 ==
- New Features
* Introduce pbspro2021 packages
== pythoncm ==
- Improvements
* New pythoncm example to calculate the percentage of time a node is UP
- Fixed Issues
* An issue with the pythoncm parallel.version_information RPC call
== slurm ==
- New Features
* slurm20 and slurm20.11 are now built with JWT
== slurm20.11 ==
- Improvements
* Update slurm20.11 packages to 20.11.8
== user portal ==
- Fixed Issues
* An issue where the user portal may use timestamps in milliseconds instead of seconds for API calls