Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 9.0
== MultiArch/MultiOS Support ==
* X86 and ARM64 multiarch support
* Support for mixed base distributions
* New tool cm-image for creating software images, node-installer images and /cm/shared trees
* node-installer-nfsroot package is now replaced with a new /cm/node-installer image
* node-installer and cmdaemon-node-installer packages are now installed in /cm/node-installer
* Updating the Bright packages also requires updating the packages in /cm/node-installer image
== cmjob ==
* cmsub has been renamed to cmjob
** previous cmsub functionality is now a sub-command of cmjob
** other sub-commands provide the user with more control over how, and when, the job data is moved to, from or within the cloud
* Support for Amazon FSx for Lustre and Azure NetApp Files (ANF), in addition to the default storage-node based approach. FSx/ANF offer:
** Typically faster cloud job startup time
** Less complexity and no single point of failure
** Better I/O performance and larger file system size
** Ability to reuse a single instance of FSx/ANF for multiple jobs. In the case of follow-up jobs, this eliminates the need to transfer data to/from the object store
** Support for staging data in an FSx/ANF instance
* Advanced data management (labels) support:
** In the object store the data can now be stored under a user-specified label
** Labels are uniquely assigned to a job or to a set of related jobs
** Support for staging input data under a label before submitting a job
** Support for storing output data only in the cloud object store, without also downloading it locally. Useful for large output datasets which are to be used by follow-up cloud jobs
** Selective download of the stored data under a label at any time
** Support for adding job dependencies based on data labels
** Improved command line interface for managing (e.g. list, remove) of data under a label
== HPC workload managers ==
* Support for multiple instances of the same workload manager
* Improved cm-wlm-setup with ncurses interface
* New wlm mode in cmsh
== Cluster Extension Setup ==
* Improved performance by offloading all status ICMP pings to the cloud director
* Dropped CMDaemon built-in Static IP support which was used for managing AWS Floating IPs
== Edge Setup ==
* Improved performance by offloading all status ICMP pings to the edge director
== Machine Learning ==
* Updated building tools packages (e.g. Bazel 0.24.1, cuDNN 7.6.5, NCCL 2.4.8)
* Enforced packages consistency by using homogeneous building tools (e.g. GCC5 5.5.0, Open MPI 3.1.3, CUDA 10.1)
* New Python 3 packages (e.g. Chainer, DyNet, fast.ai, Horovod, Theano)
* Updated all packages to the latest upstream versions
* Dropped obsolete packages Caffe and Digits
* New naming scheme for packages to support multiple Python versions, accelerators and compilers
* Support for Ubuntu 18.04, SLES 15, RHEL 8, CentOS 8
* Support for AVX-512 Vector Neural Network Instructions (VNNI)
== Accounting & Reporting ==
* Parametrized Prometheus queries
* Prometheus based drilldown queries
* Timestamp support for Prometheus exporters
* Allow Prometheus metrics to be marked as public, private or individual
== Monitoring ==
* Allow devices to be closed for running actions, but continue to sample data
* Allow data pick-up interval for nodes to be temporarily increased
* Merge delay parameter in monitoring action, to allow actions to be delayed so that multiple of the same can be merged into one
* Out-of-band sampling for higher initial sampling of jobs
* New debug flag for samplenow for easy script debugging
== Head Node Installer ==
* New head node installer GUI interface
== cmsh ==
* Switch to submode for snmp settings
* New network ips command to list all IPs on a network
* New packages RPC/command to get a quick overview of the installed packages on the nodes
== Bright View ==
* Switch to Angular 8
* Rack-view 2D mode (https://www.youtube.com/embed/SE2DDc2Qm-E)
* Support for rack orientation in rack-view (https://www.youtube.com/embed/jeAbamYwzak)
* Customizable entity overviews (https://www.youtube.com/embed/shxh5meIiVg)
* Grouping and filtering for entities settings inputs (https://www.youtube.com/embed/g3nFHG23ZlE)
== Kubernetes ==
* Upgrade Kubernetes to v1.16
* Upgrade Kubernetes addons and related:
** CoreDNS to v1.6.2
** Calico to v3.10
** CNI plugins to v0.8
** Kubernetes Dashboard to v2.0
** Metrics Server to v1.0.0
** Nginx Ingress Controller to v0.26
** Nvidia Device Plugin to v1.11
** Helm to v3
** Dropped Tiller, which is no longer needed in Helm v3
** Dropped Heapster
== Container engines/runtimes ==
* Upgrade Docker to 19.03.4
* NVIDIA Container Toolkit, replacing Nvidia Container Runtime for Docker. NVIDIA GPUs are now natively supported as devices in the Docker runtime
* Upgrade Singularity to 3.4.2
== Container registries ==
* Upgrade Docker Registry to 2.7
* Upgrade Harbor to 1.8
* Upgrade Docker Compose to 1.24
== OpenStack ==
* Support for OpenStack Stein
== Ceph ==
* Upgrade Ceph to the Ceph Nautilus release
* New in this version is automatic scaling of placements groups, which can be configured in Bright within the Ceph OSD Pool mode. See https://docs.ceph.com/docs/nautilus/rados/operations/placement-groups for more information on this new feature.
== pythoncm ==
* Switch to python 3.7
== lite-daemon ==
* Switch to python 3
== Other ==
* SNMP v3 support
* SNMP trap manager role
* New BOOTING node state for nodes with defined bootif
* New mute command for nodes, so that no monitoring actions are performed on it
* Consistent/predictable network device naming
* Added flag to Ethernet switches to control big/small switch port mapping
* Power history
* Information on last rsync
* Option for node-installer to detect switch ports via LLDP or CDP
* New statusinfo command to show information related to the status subsystem
* Dropped cgred service management and cgroupsupervisor role
* Power architecture is no longer supported
* RHEL6/Centos6 are no longer supported