Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.0-3
== General ==
- Improvements
* mlnx-ofed47: Updated to version 4.7.3.2.9.0
- Fixed Issues
* Improved support in the head node installer for older graphics cards
* intel-mpi-2018: fixed LD_LIBRARY_PATH in module file.
* mlnx-ofed47: install upstream-libs packages instead of mlnx-libs packages, to prevent dependency issues.
== cmdaemon ==
- New Features
* New RPC to get BMC logs produced by a script
- Improvements
* Limit the maximal http request size to 64MB
* Default gateway for the edge compute nodes
* Delay ramdisk creation until node-installer is created during multi arch/OS setup
* Switch to long lines by default for snmp trap events
* Added more examples for custom switch controller scripts
* Removed the old batchspawners implementation for JupyterHub
* New option to audit to a JSON file or http server
* An issue with extending the cluster to multiple cloud providers having cloud types with the same names
* Added commit warning for bonding + BOOTIF interfaces
- Fixed Issues
* In some cases, an issue with /etc/localtime symlink target in /cm/node-installer
* In some cases, an issue with determining the head node IP if it has multiple interfaces on the same network
* Crash when edge director has bond interfaces
* Added ability to ignore prolog scripts to prevent reaping cgroups too early
* In some cases, an issue with using the prolog setting in the UGE server role
* An issue with writing reverse dns zone for edge external networks on the head node
* An issue with propagating the image kernel parameters to the grub config
* An issue with building the Kubernetes overview for cmsh / Bright View.
* Improved check for too large http requests
* An issue with userportal not showing jobs for user with the (default) portal profile
* Provisioning scheduler can schedule on nodes with 0 slots set
* Fallback mechanism for Kubernetes user certs + configuration files when CMDaemon does not have write access to the user home directories
* In some cases, an issue with cmdaemon handling certificate requests for a user when the requests fail
* Uplink ports for switch control scripts may be ignored
* An issue with mounting /cm/shared on multi arch systems
* Rare deadlock in device update
* In some cases, an issue with adding UGE complex attributes definitions (e.g. gpu)
* In some cases, an issue with Slurm sacct information after setup
* An issue with health check scripts for Kubernetes and etcd
* An issue with removing a cloud provider, which can break openvpn for other providers
* Speedup shutdown / reboot commands issued from cmsh / bright-view / pythoncm
== cm-kubernetes-setup ==
- Improvements
* Taint is now executed at the very end of the Kubernetes deployment setup
- Fixed Issues
* Disable nginx service by default on Ubuntu to avoid issues with Harbor
== cm-scale ==
- Fixed Issues
* A possible issue with pin queue feature in Auto Scaler
== cmsh ==
- Fixed Issues
* Show All head nodes column for configuration overlays in cmsh
* An issue with the hostname in cmsh installer interactions
== ml ==
- New Features
* Updated cm-tensorflow-* packages to v1.15.2 to address vulnerability issues
* Updated cm-pytorch-* packages to v1.4.0
* Updated cm-horovod-* packages to v0.19.0
- Improvements
* Introduced cm-gpytorch-* packages to support efficient gaussian processes
* Removed the old batchspawners implementation for JupyterHub
* Split openmpi-geib-cuda-64 package into cm-openmpi-geib-cuda10.1-gcc and cm-openmpi-geib-cuda10.2-gcc
- Fixed Issues
* Fixed cm-tensorflow2-*cuda* packages to use SSE4.1 SSE4.2 AVX AVX2 FMA instructions
* Fixed cm-tensorflow-*cuda* packages to use SSE4.1 SSE4.2 AVX AVX2 FMA instructions
== monitoring ==
- Improvements
* Added a check to prevent a monitoring producer from adding too many (random) measurables
* Exclude az* proc/net/dev network interfaces from monitoring
* Automatically set resource / type monitoring filter / multiplexer if the name matches
* Include extra information in cmsh dumpmonitoringdata
- Fixed Issues
* Program runner manager failures can lead to action scripts no longer running
== user portal ==
- Fixed Issues
* An issue with userportal not showing jobs for user with the (default) portal profile