Base Command Manager / Bright Cluster Manager Release Notes

############################################################
Release notes for NVIDIA Base Command™ Manager (BCM) 11.33.0
############################################################

*Released: 29 May 2026*

General
=======

New Features
------------

* Added support for BaseOS 7.5.0
* Added support for running Harbor and Kubernetes on the same node
* Added support for deploying Harbor v2.14 in cm-container-registry-setup
* Updated cm-tftboot GRUB images to 2.14 to support booting over a VLAN
* Updated Network Operator to 25.10
* Updated Pyxis and Enroot to 0.23.0 and 4.1.2
* Updated Slurm 25.05 to 25.05.8
* Updated Slurm 25.11 to 25.11.6
* Updated Ubuntu 22.04 to 22.04.5
* Updated Ubuntu 24.04 to 24.04.4
* Updated cm-nvfwupd to 2.1.1
* Updated cm-nvidia-container-toolkit to 1.19
* Updated cm-nvhpc to 26.3
* Updated containerd to 2.2.1
* Updated CUDA 13.1 to 13.1.1
* Updated golang-go-latest to 1.25.10
* Updated lib-prometheus to 1.26.1
* Reduced the rescue rootfs size to 320 MB in cm-clone-install to allow booting on certain Dell servers
* Reduced the head node installer rootfs size to less than 300 MB to allow it to load on certain Dell servers
* Excluded /var/lib/munge/munged.seed from software image synchronization

Fixed Issues
------------

* Fixed copying of missing subscription files when creating RHEL 9 images with cm-create-image

Known Issues
------------

* When updating a BCM Ubuntu 22.04 cluster with Slurm from BCM 11.32.0 or earlier to BCM 11.33.0, Slurm services may fail to start because the updated Slurm packages do not automatically install the required dependency package cm-hwloc2-local. As a workaround, install cm-hwloc2-local on the head node and in the software images (or compute nodes) used by Slurm.

CMDaemon
========

New Features
------------

* Added the ability to start and stop the nvdebug process per device from cmsh via RPC
* dhcpd can now be prevented from running on head nodes by adding the ``dhcpd_override_needs_to_be_running`` extra-value with value ``false``

Fixed Issues
------------

* Fixed a possible Topograph failure on Topograph certificate regeneration
* Fixed automatic nginx.conf propagation when toggling the ingressproxyenable flag in KubeCluster configuration
* Fixed a Kubernetes metric collection script vulnerability that permitted arbitrary code execution as root by regular users with kubectl create permissions
* Fixed removal of module files on Kubernetes uninstallation
* Fixed an issue where slurm.takeover.sh returned an error when debug was not enabled
* Fixed empty Slurm topology generation on the first Slurm start after cm-wlm-setup
* Fixed an issue where workload manager prolog and epilog scripts were not running in ABS order
* Fixed the possibility of providing a malicious comment string to a Slurm job that is parsed by the gpu-workload-power-profiles prolog when profile validation is disabled
* Fixed an issue where prolog and epilog scripts were not run when a non-Slurm workload manager is used
* Fixed slurmctld starting after Slurm redeployment when the accounting node is in a different location

COD
===

New Features
------------

* AWS: Streamlined dataflow from the COD tool to the head node; the latest COD tool version is required
* COD-AWS: Print VPC and subnet IDs during cluster creation; logging has also been improved
* COD-GCP: Show image source blob dates instead of the GCP image resource date
* COD-GCP: Improved BCM node identification to mitigate orphaning instances in the cloud when instance creation times out but eventually completes; as soon as an instance starts and enters netboot, it is identified by BCM, which also speeds up bulk instance creation
* COD-AWS: Added support for RoCEv2 secondary networks (``--create-secondary-network``/``--create-secondary-subnet`` and ``--use-existing-secondary-network-id``/``--use-existing-secondary-subnet-id``); BCM creates or reuses secondary networks and subnets and configures them accordingly
* COD-Azure: When deploying in a pre-existing resource group, added validations to ensure the resource group is specified for dependent pre-existing resources, moved resource group validation earlier so cluster create fails before the summary screen, and deprecated the ``--existing-rg`` flag; ``--resource-group `` now indicates deployment in an existing resource group, consistent with other CSP logic
* Added image size listing for all COD flavors by specifying the ``size`` field in the ``--columns`` list

Fixed Issues
------------

* COD-GCP: After ``cm-cod-gcp cluster stop``, ``cm-cod-gcp cluster list`` now shows instance status as "stopped" instead of "terminated"
* Fixed ``--store-head-node-ip`` so it only stores the IP address to file (it previously also stored the cluster name)
* Fixed ``--version`` always showing trunk
* COD-GCP: Retry HTTP 503 (Service Unavailable) errors
* Fixed a runtime error when creating two or more clusters simultaneously when head node images are created from blobs
* COD-AWS: Fixed --on-error-undo not working after SSH/CMDaemon wait timeout
* COD-Azure: Fixed --on-error-undo not working after SSH/CMDaemon wait timeout
* COD-AWS: Fixed CMDaemon initialization failure due to inability to fetch availability zones in the me-south-1 region
* Fixed an issue that caused the serial console during node install on cloud nodes to be unreadable; node install on cloud nodes should now reliably output plaintext to the serial console
* COD-GCP HA: Fixed compute nodes failing to boot if the original head node is unavailable
* COD/CX-Azure: Fixed excessive memory usage in a Python script that caused OOM errors on smaller head nodes
* Fixed an OCI disk size validation bug where the check incorrectly always used the first size configured after CMDaemon startup

cm-kubernetes-setup
===================

New Features
------------

* Run:ai must now be deployed via the new cm-runai-setup wizard; the Run:ai installation option in cm-kubernetes-setup is now disabled
* Added Kubernetes Gateway API support in cm-kubernetes-setup, replacing the legacy Ingress API and Ingress NGINX Controller with Kgateway and MetalLB (L2 or BGP mode, now enabled by default for Gateway IP allocation)
* Replaced Ingress proxy with Gateway proxy on the head node: external traffic on the selected head node port (443 by default) is forwarded to the Gateways of the target Kubernetes cluster via TLS SNI; NodePorts 30080 and 30443 are no longer used
* Added a Gateway configuration menu in cm-kubernetes-setup to configure Gateway TLS certificates, reconfigure the Gateway proxy, install Gateway components and replacements for legacy BCM ingresses, and remove the legacy Ingress NGINX controller and BCM Ingress resources
* Existing clusters deployed with Ingress NGINX Controller can migrate to Gateway API without recreating the Kubernetes cluster; see the documented migration path in the BCM manual

Deprecated or Removed Features
------------------------------

* Ingress NGINX Controller-based Ingress resources are replaced by Gateway API resources

Fixed Issues
------------

* Fixed the GPU operator option "latest" in cm-kubernetes-setup
* Fixed NGINX failures on head nodes when Kubernetes is uninstalled with multiple Kubernetes clusters running

cm-wlm-setup
============

New Features
------------

* Added the ability to configure Nebius as a Topograph provider
* Improved Slurm setup performance for clusters with tens of thousands of nodes
* cm-wlm-setup now preserves Pyxis configuration with the --reinstall-pyxis option

Fixed Issues
------------

* Fixed OpenPBS setup with cm-wlm-setup
* Fixed Slurm setup with a compute node as storage when cm-mariadb is preinstalled in that node's software image

cm-setup
========

New Features
------------

* Applied the NVIDIA design palette to TUI wizard screens in cm-setup based tools

Base View
=========

New Features
------------

* Added support for deploying Kgateway from the Kubernetes wizard in Base View for Kubernetes Gateway API-based ingress routing
* Added support for deploying MetalLB from the Kubernetes wizard in Base View as the load balancer for externally exposed Kgateway services
* Updated the Kubernetes wizard to use "Gateway proxy" terminology in place of "Ingress Proxy" to align with the new Gateway API-based architecture
* Added support for uninstalling Heimdall from the Mission Control wizard in Base View
* Enabled the AHR and AJR wizards in the license information page (previously shown as "Coming Soon")

Deprecated or Removed Features
------------------------------

* Removed support for installing the Ingress NGINX Controller from the Kubernetes wizard in Base View; use Kgateway with MetalLB instead

Fixed Issues
------------

* Fixed JupyterLab vulnerabilities allowing privilege escalation via argument injection in the PyPI Extension Manager and account takeover via stored cross-site scripting
* Fixed cross-site scripting vulnerabilities by updating dompurify to 3.4.1