Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 9.0-3

== General ==

- Improvements

* mlnx-ofed47: Updated to version 4.7.3.2.9.0

- Fixed Issues

* Improved support in the head node installer for older graphics cards
* intel-mpi-2018: fixed LD_LIBRARY_PATH in module file.
* mlnx-ofed47: install upstream-libs packages instead of mlnx-libs packages, to prevent dependency issues.

== cmdaemon ==

- New Features

* New RPC to get BMC logs produced by a script

- Improvements

* Limit the maximal http request size to 64MB
* Default gateway for the edge compute nodes
* Delay ramdisk creation until node-installer is created during multi arch/OS setup
* Switch to long lines by default for snmp trap events
* Added more examples for custom switch controller scripts
* Removed the old batchspawners implementation for JupyterHub
* New option to audit to a JSON file or http server
* An issue with extending the cluster to multiple cloud providers having cloud types with the same names
* Added commit warning for bonding + BOOTIF interfaces

- Fixed Issues

* In some cases, an issue with /etc/localtime symlink target in /cm/node-installer
* In some cases, an issue with determining the head node IP if it has multiple interfaces on the same network
* Crash when edge director has bond interfaces
* Added ability to ignore prolog scripts to prevent reaping cgroups too early
* In some cases, an issue with using the prolog setting in the UGE server role
* An issue with writing reverse dns zone for edge external networks on the head node
* An issue with propagating the image kernel parameters to the grub config
* An issue with building the Kubernetes overview for cmsh / Bright View.
* Improved check for too large http requests
* An issue with userportal not showing jobs for user with the (default) portal profile
* Provisioning scheduler can schedule on nodes with 0 slots set
* Fallback mechanism for Kubernetes user certs + configuration files when CMDaemon does not have write access to the user home directories
* In some cases, an issue with cmdaemon handling certificate requests for a user when the requests fail
* Uplink ports for switch control scripts may be ignored
* An issue with mounting /cm/shared on multi arch systems
* Rare deadlock in device update
* In some cases, an issue with adding UGE complex attributes definitions (e.g. gpu)
* In some cases, an issue with Slurm sacct information after setup
* An issue with health check scripts for Kubernetes and etcd
* An issue with removing a cloud provider, which can break openvpn for other providers
* Speedup shutdown / reboot commands issued from cmsh / bright-view / pythoncm

== cm-kubernetes-setup ==

- Improvements

* Taint is now executed at the very end of the Kubernetes deployment setup

- Fixed Issues

* Disable nginx service by default on Ubuntu to avoid issues with Harbor

== cm-scale ==

- Fixed Issues

* A possible issue with pin queue feature in Auto Scaler

== cmsh ==

- Fixed Issues

* Show All head nodes column for configuration overlays in cmsh
* An issue with the hostname in cmsh installer interactions

== ml ==

- New Features

* Updated cm-tensorflow-* packages to v1.15.2 to address vulnerability issues
* Updated cm-pytorch-* packages to v1.4.0
* Updated cm-horovod-* packages to v0.19.0

- Improvements

* Introduced cm-gpytorch-* packages to support efficient gaussian processes
* Removed the old batchspawners implementation for JupyterHub
* Split openmpi-geib-cuda-64 package into cm-openmpi-geib-cuda10.1-gcc and cm-openmpi-geib-cuda10.2-gcc

- Fixed Issues

* Fixed cm-tensorflow2-*cuda* packages to use SSE4.1 SSE4.2 AVX AVX2 FMA instructions
* Fixed cm-tensorflow-*cuda* packages to use SSE4.1 SSE4.2 AVX AVX2 FMA instructions

== monitoring ==

- Improvements

* Added a check to prevent a monitoring producer from adding too many (random) measurables
* Exclude az* proc/net/dev network interfaces from monitoring
* Automatically set resource / type monitoring filter / multiplexer if the name matches
* Include extra information in cmsh dumpmonitoringdata

- Fixed Issues

* Program runner manager failures can lead to action scripts no longer running

== user portal ==

- Fixed Issues

* An issue with userportal not showing jobs for user with the (default) portal profile