Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.0-13
== General ==
- Improvements
* Added mlnx-ofed52 package.
* Update cuda-driver to version 460.32.03
* Update Docker to v19.03.15
* Update cm-nvhpc to version 21.2
* Added smbk5pwd to openldap-servers package
* Major Slurm versions can now be upgraded without disabling the old one (see the documentation)
- Fixed Issues
* Ensure cm-python37 contains the dmidecode module
== cmdaemon ==
- Improvements
* Validation for the partition node basename and digits
* By default, perform a daily malloc trim to reduce memory usage, controlled by adv. config. option MallocTrimInterval
* Disable all monitoring recorders during upgrade
* Added adv. conf. option ForkUpdateOOM, as a way to prevent dmesg from filling up with cmd OOM messages
- Fixed Issues
* AdvancedConfig = { "SSLServerMethod=TLS 1.2" } no longer implies 1.2 + 1.3, it is recommended to use AdvancedConfig = { "SSLServerMethod=TLS 1.3" } instead
* To achieve the old behavior of 1.2 + 1.3, AdvancedConfig can be set to { "TLS 1.1=0", "TLS 1.2=1", "TLS 1.3=1" }
* An issue with networking interface bonding options on Ubuntu
* An issue with triggering sync to the passive headnode after cloning an image
* An issue with detecting UGE job array tasks nodes in CMSH/BrightView
* Possible memory corruption when using file customization
* Ensure slurmd is restarted next to slurmctld on specific configuration changes
* Ensure SchedulerType from SlurmServerRole is added to slurm.conf
* An issue that can cause excessive Jobs Ended messages, especially when using Job Arrays
* Introduced batching for cases where cmdaemon calls sacct with a lot of Job ID's as arguments to prevent possible crashes
* An issue with converting information timeouts from milliseconds to seconds, which can cause long RPC delays
* Possible deadlock in cmd stop when power manager is in heavy use at the time
* An issue with starting nightly provisioning updates
* Slurm gres Cores setting was being added to gres.conf also when empty
* An issue with software image remove with all revisions, which can sometimes crash cmd
* In some cases, prejob healthcheck tries to run on the node before the monitoring controller on the node is started
* An issue with createramdisk on fully updated CentOS 7 headnodes with FIPS enabled
* Do not convert Slurm gres count if it is not divisible by 1024
* An issue with Etcd healthcheck status in the case of multiple interfaces
* Prometheus date with timestamp sometimes gets added out of order
== node-installer ==
- Fixed Issues
* Allow mkinitrd_cm and install-ipxe to recognize sle-hpc as a valid SLES distribution
== cm-kubernetes-setup ==
- Improvements
* Fixed a potential crash in the wizard regarding auto-detection of available network CIDR ranges
== cm-scale ==
- Improvements
* Allow for keeping (not terminating) specified cloud nodes when they are unused
- Fixed Issues
* Don't undrain a node that was drained by WLM due to prejob healthcheck (prolog) failure
== cm-setup ==
- New Features
* cm-edge-setup: Allow for different categories to be specified for edge director and edge nodes
== cm-wlm-setup ==
- Improvements
* Support for Slurm 20.11
== cmsh ==
- Improvements
* Added timeout to cmsh power command
* Fixed an issue with cmsh packages installed size for ubuntu
== ml ==
- New Features
* Introduced package cm-cudnn8.0 for CUDA 11.2
* Introduced package cm-cutensor for CUDA 10.2 and CUDA 11.2
* Introduced package cm-ml-pythondeps for CUDA 11.2
* Introduced ML package cm-ml-distdeps for CUDA 11.2
* Introduced package cm-cudnn8.1 for CUDA 10.2
* Introduced package cm-cudnn8.1 for CUDA 11.2
* Introduced package cm-tensorflow2-extra-* for CUDA 10.2
* Updated cm-tensorflow2-* packages to v2.4.1
* Updated cm-gpytorch-* packages to v1.3.1
* Updated cm-horovod-* packages to v0.21.3
* Updated cm-opencv3-* packages to v3.4.13
* Updated cm-tensorflow-* packages to v1.15.5 (end-of-life reached)
* Updated cm-xgboost-* packages to v1.3.3
* Updated cm-tensorrt-* packages to v7.2.2
* Updated cm-pytorch-* packages to v1.7.1
* Updated cm-onnx-* packages to v1.8.1
* Updated cm-dynet-* packages to v2.1.2
* Introduced package cm-chainer for CUDA 11.2
* Introduced package cm-xgboost for CUDA 11.2
* Introduced ML package cm-nccl2 for CUDA 11.2
* Introduced package cm-openmpi-geib for CUDA 11.2
* Introduced cm-opencv4-* packages
- Improvements
* Deprecated cm-bazel package
* Renamed cm-tensorrt-cuda10.2-gcc package to cm-tensorrt-cuda10.2
* Deprecated cm-fastai-* packages
* Deprecated cm-tensorflow-*, cm-horovod-tensorflow-*, cm-onnx-tensorflow-* and cm-keras-* packages
* Switched GCC support for several ML packages from GCC 5 (-gcc) to GCC 8 (-gcc8)
* Introduced package cm-fastai2 for CUDA 10.2