Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 8.0-11

== General ==

- New Features

* New cmsh device superpower command
* Support for batched power operations

- Improvements

* Report correct error on insufficient space during head node installation
* Support for custom sorting of jobs and parent jobs in the user portal, specified using adv. config. flags

- Fixed Issues

* Portal profile missing plot privileges, which can cause userportal charts to not work
* Ubuntu OS flavor is reported incorrectly in sysinfo
* Improved token and service update/management for profiles
* Added support for HP iLO on Ubuntu nodes
* An issue with shorewall6 not starting due to incorrect entries to /etc/shorewall6/netmap in cluster-extension setup
* An issue with the permissions for /usr/local/share/swig

== cmdaemon ==

- New Features

* Configurable behavior of prejob healthchecks

- Improvements

* Speed up generation of new admin / portal / ... certificate during request-license
* Allow just booted nodes to retrieve entity configuration from their provisioning node. GlobalConfig = { "InitialEntityFromProvisioningNode=1" }

- Fixed Issues

* In some cases, cmdaemon could hang in EnumMetricValueManager during stop
* An issue which could result in cmdaemon crash while removing a user
* Power environment not set in super power scripts
* An issue with determining SGE (OGS) job submission time
* An issue with changing user home directories across devices
* Rare cm-nfs-checker core dump
* In some cases, node ldap certificate can have wrong ownership
* Torque client role gpus were not set properly
* Increased the cm-nfs-checker buffer size for very long paths
* An issue with long node names drain status for UGE/OGS
* An issue with cm-repair-cmdaemon-db
* Possible cmdaemon crash when changing the image for many nodes several times in rapid succession
* CMD_SCRIPTTIMEOUT env. variable not set for the power scripts
* Frozen Slurm configuration is still updated by cmdaemon
* An issue with removing old ramdisk files in /tftpboot on the passive head node
* Creating a lot of open and idle https connections could cause cmd to stop accepting new connections
* New adv. config. options to control (un)drain during burn
* Queues taking a long time to appear in monitoring tree
* In some cases, drain action executed after the job prolog was completed (thus jobs could still run)
* cmdaemon crash when using numeric usernames in Kubernetes certificates
* In some cases, failed certificate creation could lead to cmdaemon crash
* An issue with pam_bright module not working on Ubuntu
* sysinfo showing GPUs as not supported
* cmdaemon crash when issuing "device consolelog" cmsh command for AWS nodes
* An issue with DNS configuration generation in the case of AWS + DirectConnect + two subnets in a VPC
* In disk setup, allow for specifying a blockdev which is a symlink (like many /dev/disk/*), which can be useful to prevent random disk swapping issues
* Trigger udev again in the node-installer so that any rules in /cm/node-installer/etc/udev/rules.d get applied
* In some cases, cmdaemon crash if cloud directory IP is updated before cloud node has come UP properly
* An issue with clearing prolog/epilog settings of LSF queues
* Optionally configure backup controller in slurm.conf

== node-installer ==

- Fixed Issues

* Skip DNS reverse lookup for rsync in node-installer
* sles11 cm-busybox package contains now a busybox binary that is compiled on sles12, to prevent seg faults on sles11 compute nodes with latest kernel
* An issue with using network interface priorities when configuring static routes
* Added support to node-installer scripts to be able to boot a node over Omni-Path

== cluster-tools ==

- Fixed Issues

* An issue with wlm-setup images list command line option

== Bright View ==

- New Features

* Added average GPU utilization metric

== cm-scale ==

- Improvements

* Support slurm logical job feature requests

- Fixed Issues

* In some cases, cm-scale could stop the extra node before the compute node was stopped

== cmsh ==

- New Features

* New cmsh "device status" option --overview, to create an overview of the devices

- Improvements

* Improved limitations when retrieving a large number of entities with cmsh

- Fixed Issues

* cmsh clone user does not set user name, but user ID
* An issue with use / remove of a static route
* An issue with --raw option to latestmetricdata
* cmsh device list --rack crash if racks are not configured
* An issue with selecting the user for the cmsh ssh command when using custom profiles
* Name range collapse format .. doesn't take number of digits into account

== cod ==

- Improvements

* cod-aws: changed the default VMtypes from m3.medium to t2.medium. The former is no longer being reported by AWS apis as available.

- Fixed Issues

* COD-OS cluster select with both names/patterns and version/distro will do AND instead of OR.

== monitoring ==

- New Features

* Monitoring sampler for lnstat
* Added API call to return all known monitoring resources / types
* Added small tool to calculate monitoring storage size: cm-bright-monitoring-usage.py

- Improvements

* Added extra flags to sample_ipmi for more fine grained control
* Openstack interfaces are now added to default exclude list for proc dev net sampler
* Allow the trigger entity matcher to also match by type and resource
* Job metric initialization optimisations
* Reduce the amount of monitoring storage logging on compute nodes
* Improved autocompletion for dumpmonitoringdata command in jobs mode of cmsh

- Fixed Issues

* Metric parameter not shown in send email action
* Allow monitoring storage defaults to be changed with AdvancedConfig
* Improved check if a mount point is docker related, to prevent it from being monitored
* getMonitoringMeasurablesForEntity for invalid entity returns error 400, instead of []
* Switch uptime mertric not marked as cumulative
* In some cases, crash in sample now
* Enum value cache needs to be case insensitive for measurable / parameter and key values
* Trigger expressions need to be case insensitive
* An issue with sample now of cumulative metrics returning the raw data instead of the derivative
* Usercount monitoring data producer counts all users
* Reduced overhead of the smart monitoring sampler