Base Command Manager / Bright Cluster Manager Release Notes

Release notes for BCM 10.23.10

== General ==

New Features

* Added mlnx-ofed23.07 package
* Added cm-pmix4 package

Improvements

* Added drainstatus to cm-diagnose
* Updated cuda-driver package to 535.104.12
* Updated cm-libprometheus package to 0.47.0
* Updated cm-openssl package to 3.1.3

== CMDaemon ==

New Features

* Added advanced config flag DisableRemoteShell to disable all remote shell RPC
* Added events for Cumulus service management operations

Improvements

* Added cmsh clone device option to increment IP addresses by values other than 1
* Allow lite node IP to be set during cmsh device add
* Display an error when setting an invalid software image in cmsh
* Update /etc/resolv.conf via netconfig on SLES15 instead of writing file
* Created the ability to add model/serial number information to new switches (ZTP)
* Kill active ramdisk create process when software image is removed

Fixed Issues

* Fixed provisioning trigger when an image name starts with the name of another image
* Allow cm-cmd-ports --get to work without an active cmd
* Prevent "Reboot required: Interfaces have been modified" event from being shown for a node if the node has a VLAN interface on a Bridge interface that includes a bond interface
* Fixed cm-burn unsuccessful completion in the absence of both a pre and post section
* Image updates on provisioning nodes now wait for provisioning operations on other nodes to complete before proceeding.
* Allow appending or skipping adding a Slurm drain reason when healthcheck fails with drain action enabled
* Fixed crash of pythoncm parallel node termination function
* Fixed an edge case that causes hostlist generation failures when there are 3 numeric fields in the hostname
* Fixed service management for cm-lite-daemon

== cm-scale ==

Fixed Issues

* Allow to start terminated cloud nodes whose state is one of the node installer ones
* Terminate useless AWS spot instance requests
* Fixed the termination of cloud nodes when multiple clone operations are issued in parallel
* Fixed the startup of nodes by cm-scale if Slurm job predicted start time is set by Slurm in the future
* Fixed handling of job arrays with range from 1 to >1 figure number

== Cloud ==

New Features

* Added support for AWS FSx on Ubuntu for cmjob

Improvements

* Improved error message when starting a cloud node with incorrect VPC/subnet configuration

Fixed Issues

* Fixed issue with cm-cloud-storage-setup when using us-east-1 region
* Prevent cloud instance termination when cloud director is down from being listed as UP+terminated
* Fixed starting spot instances after a no-capacity in availability zone scenario occurs
* Unfulfilled spot instance requests stay in PENDING state until fulfilled or terminated
* Store availability zones for networks created by COD or manually, which enables AutoScaler to distribute loads between availability zones in COD deployments

== Kubernetes ==

New Features

* Added support for NGC token authentication in cm-kubernetes-setup

Improvements

* Improved the wizard when it should fail earlier then it actually does (incorrect return code checks caused the installer to confusingly fail at later stages)
* Kubernetes wizard errors will now show more context information where possible
*Increased timeouts for kubeadm init and clusterctl init operations to effectively handle slow connections

Fixed Issues

* Add user wizard will use BCM user name and not commonName

== Workload Management ==

New Features

* Added enroot and enroot+caps packages

Fixed Issues

* Update AWS spot instances state in Slurm when they are terminated outside of BCM

== Container Engines ==

Improvements

* Improved internal IP detection logic for etcd (similarly to internal IP detection for Kubernetes Calico and Flannel)

== Monitoring ==

New Features

* Added Prometheus /rules and /alert and /alertmanagers end points
* Added operstate metrics (operational state i.e., UP / DOWN ) via cm-lite-daemon for Cumulus switches

Improvements

* Display K/M/G in cmsh for consolidated averages when no unit is set for a metric

Fixed Issues

* Added support to run healthcheck with storcli software next to megacli software

== Cluster on Demand ==

Improvements

* Improved the display of the EULA when running from docker image
* Allow CMDaemon to work with cluster-on-demand cluster spanning multiple regions (requires manual setup)

== Base View==

Improvements

* Provide notifications in Base View if BCM package updates are available
* Visualize licensed GPU used and available in Base View