Base Command Manager / Bright Cluster Manager Release Notes
Release notes for BCM 10.23.10
== General ==
New Features
* Added mlnx-ofed23.07 package
* Added cm-pmix4 package
Improvements
* Added drainstatus to cm-diagnose
* Updated cuda-driver package to 535.104.12
* Updated cm-libprometheus package to 0.47.0
* Updated cm-openssl package to 3.1.3
== CMDaemon ==
New Features
* Added advanced config flag DisableRemoteShell to disable all remote shell RPC
* Added events for Cumulus service management operations
Improvements
* Added cmsh clone device option to increment IP addresses by values other than 1
* Allow lite node IP to be set during cmsh device add
* Display an error when setting an invalid software image in cmsh
* Update /etc/resolv.conf via netconfig on SLES15 instead of writing file
* Created the ability to add model/serial number information to new switches (ZTP)
* Kill active ramdisk create process when software image is removed
Fixed Issues
* Fixed provisioning trigger when an image name starts with the name of another image
* Allow cm-cmd-ports --get to work without an active cmd
* Prevent "Reboot required: Interfaces have been modified" event from being shown for a node if the node has a VLAN interface on a Bridge interface that includes a bond interface
* Fixed cm-burn unsuccessful completion in the absence of both a pre and post section
* Image updates on provisioning nodes now wait for provisioning operations on other nodes to complete before proceeding.
* Allow appending or skipping adding a Slurm drain reason when healthcheck fails with drain action enabled
* Fixed crash of pythoncm parallel node termination function
* Fixed an edge case that causes hostlist generation failures when there are 3 numeric fields in the hostname
* Fixed service management for cm-lite-daemon
== cm-scale ==
Fixed Issues
* Allow to start terminated cloud nodes whose state is one of the node installer ones
* Terminate useless AWS spot instance requests
* Fixed the termination of cloud nodes when multiple clone operations are issued in parallel
* Fixed the startup of nodes by cm-scale if Slurm job predicted start time is set by Slurm in the future
* Fixed handling of job arrays with range from 1 to >1 figure number
== Cloud ==
New Features
* Added support for AWS FSx on Ubuntu for cmjob
Improvements
* Improved error message when starting a cloud node with incorrect VPC/subnet configuration
Fixed Issues
* Fixed issue with cm-cloud-storage-setup when using us-east-1 region
* Prevent cloud instance termination when cloud director is down from being listed as UP+terminated
* Fixed starting spot instances after a no-capacity in availability zone scenario occurs
* Unfulfilled spot instance requests stay in PENDING state until fulfilled or terminated
* Store availability zones for networks created by COD or manually, which enables AutoScaler to distribute loads between availability zones in COD deployments
== Kubernetes ==
New Features
* Added support for NGC token authentication in cm-kubernetes-setup
Improvements
* Improved the wizard when it should fail earlier then it actually does (incorrect return code checks caused the installer to confusingly fail at later stages)
* Kubernetes wizard errors will now show more context information where possible
*Increased timeouts for kubeadm init and clusterctl init operations to effectively handle slow connections
Fixed Issues
* Add user wizard will use BCM user name and not commonName
== Workload Management ==
New Features
* Added enroot and enroot+caps packages
Fixed Issues
* Update AWS spot instances state in Slurm when they are terminated outside of BCM
== Container Engines ==
Improvements
* Improved internal IP detection logic for etcd (similarly to internal IP detection for Kubernetes Calico and Flannel)
== Monitoring ==
New Features
* Added Prometheus /rules and /alert and /alertmanagers end points
* Added operstate metrics (operational state i.e., UP / DOWN ) via cm-lite-daemon for Cumulus switches
Improvements
* Display K/M/G in cmsh for consolidated averages when no unit is set for a metric
Fixed Issues
* Added support to run healthcheck with storcli software next to megacli software
== Cluster on Demand ==
Improvements
* Improved the display of the EULA when running from docker image
* Allow CMDaemon to work with cluster-on-demand cluster spanning multiple regions (requires manual setup)
== Base View==
Improvements
* Provide notifications in Base View if BCM package updates are available
* Visualize licensed GPU used and available in Base View