Base Command Manager / Bright Cluster Manager Release Notes

Release notes for Bright 7.1

== New Features ==

* Puppet integration
- Open Source Puppet 3.7.5 configuration management system included in Bright.
- Puppet modules can be installed from puppetforge
- Scan for installed puppet classes / resources
- Puppet classes / resources assignable as roles to nodes / categories
- Resource and class declaration (puppet code) through cmsh or cmgui
- Puppet is applied on the nodes (controlled by CMDaemon) to ensure that the machines are in a certain state
- Apply can be triggered manually, on schedule, and/or on node-up

* IEEL integration
- Setup tool which makes it easy to deploy IEEL on top of Bright
- IEEL metrics and healthchecks
- Comprehensive view of Lustre filesystems, servers, and clients as resources in CMGUI

* Ceph
- Automatic reweight (up and down) of OSDs, allowing new OSDs to be added to a cluster without causing expensive data rebalancing
- OSD pools management

* Introduction of ConfigurationOverlays to make it easy to assign roles to a set of nodes
- Use ConfigurationOverlay for Hadoop integration

* Support for fine grained kernel settings
- Per category: kernelVersion, kernelParameters, kernelModules
- Per node: kernelVersion, kernelParameters, kernelModules
- All 3 fields are independently overwritten, Node > Category > SoftwareImage
- A custom initrd is generated for a Node/Category if kernelVersion or/and kernelModules is set

* Dell BIOS integration support for models R730,FC630,FC430,FC830 and M830
- Ability to configure BIOS parameters from CMGUI/cmsh
- Switch between predefined BIOS modes/profiles using CMGUI/cmsh
- Allow custom BIOS settings for individual nodes or a category of nodes
- Firmware update of multiple nodes using CMGUI/cmsh
- Flexibility to add support for more models (provided XML schema and iDRAC features are consistent across models)

* Workload Management
- UGE: dedicated management, i.e. new roles, cgroups management, queues, exclude lists, etc.
- SGE/UGE: parallel environments are managed now from BCM.
- Slurm: new compute node and partition (queue) parameters managed from slurm role and queue.
- Slurm placeholders in slurmserver role: ability to specify number of nodes that will always be "visible" for users in particular queue.
- Allow to simulate jobs within CMDaemon, useful for development of 3rd party applications working with jobs via CMDaemon API.
- Optimized cloud resource usage when cm-scale-cluster is used: stop instances at the full instance-hour.

* Switch from GSOAP to internal binary / JSON (de)serialization for RPC
- move to JSON (de)serialization for cmgui, web portal
- move to binary (de)serialization for cmsh, pythoncm, internal CMDaemon communication

* New cmsh commands
- time: measure the time the command took
- watch: repeat a command at a fixed interval and display the result in a cleared screen
- home: back to top level
- path: prints the current working cmsh-directory, can be copy pasted
- bookmark: bookmark the current path to a named reference
- goto: return to one of the bookmarked paths

* Background task monitoring
- Display and wait for CMDaemon tasks to complete
- Power operation tasks
- Dell BIOS operations
- Ramdisk creation

* User Portal was completely redone using a more modern and extensible framework (AngularJS)

== OpenStack ==

* Upgrade OpenStack to Juno
* Improved scalability by allowing HAProxy to be deployed through a Bright role. This allows for greater flexibility in terms of where HAProxy runs, and how many instances are run in a cluster. It also allows more fine-grained control on what interfaces HAProxy binds which is useful for public clouds which require the API endpoints to be available to the outside world.
* More efficient communication between CMDaemon and OpenStack. CMDaemon now communicates directly with OpenStack rather than going through a layer of external scripts.
* Dramatically enhanced version of OpenStack configuration and management interface in CMGUI. Unique feature is that you can very easily jump between different entities that are related to each other. In OpenStack Horizon this is much more difficult because at the top-level you already have to decide what aspect you want to see. No more Horizon embedded tabs.
* Several new OpenStack related roles which allow more fine-grained control over what services run on which node. For example, OpenStack Compute Role (nova) was split into OpenStack Compute API, OpenStack Compute Conductor, OpenStack Compute Hypervisor.
* Several instances of Neutron can be used simultaneously, for redundancy and load-balancing.
* Network nodes allow automatic rescheduling of virtual routers.
* Support for multiple different cinder volume back-ends (e.g. fast and slow storage).
* Easier customization of OpenStack related configuration files.
* User portal OpenStack tab.
* Complete OpenStack management in CMSH.
* Ability to spin up virtualized Bright sub-clusters inside of Bright OpenStack. The sub-clusters could in turn also be configured as OpenStack private clouds.
* Added Memcached integration via dedicated Memcached Role.
* Basic Swift integration. Swift is installed, but the Rados Gateway (Ceph) will remain default/preferred way of doing Object Storage.
* Advanced Quota support
* Advanced monitoring of API endpoint performance of OpenStack. CMDaemon keeps track of how long individual API calls (list Users, List projects, list servers etc) to individual OpenStack services take. This makes it very easy to keep track of overall cluster performance over time, as well as to spot bottlenecks.
* Monitoring of individual VMs running in OpenStack. Keeping track of Disk/net I/O, CPU usage, etc of individual VMs. This makes it very easy to e.g. spot which tenants/users consume most resources. Also it makes it very easy to spot problems in the cluster.
* Monitoring metrics which allow adminstrators to get an overview of misused resources (ram/storage used for created but paused VMs, etc).

== Big Data ==

* Updated support for latest releases from Apache, Cloudera, Hortonworks, Pivotal.
* Overhaul of Hadoop management in CMGUI. Improved flexibility for configuration of Hadoop roles via "Hadoop configuration groups". These groups leverage ConfigurationOverlays and make it easy to manage set of Hadoop nodes. E.g., it's possible to define sets of DataNodes, each set using different settings. CMGUI now shows more information about the status of Hadoop components and tools.
* Common maintenance operations are available, conveniently grouped by Hadoop component.
* Several Hadoop roles have been reviewed, paying attention to performance settings.
* Added support for Key Management Server via dedicated Role. This allow users to leverage Transparent Encryption in HDFS, i.e. defining Encryption Zones in HDFS.
* Added Apache Spark integration via dedicated Roles. Supported Spark deployments: YARN mode, Standalone Mode.
* Added Apache Hive integration via dedicated Role.
* Added Apache Sqoop and Sqoop2 integration via dedicated Role.
* User portal Hadoop tab.

== Changes in Behavior of Existing Features ==

* All CMDaemon metrics/checks can be sampled at a specific offset within the sampling interval. E.g. every hour on the half hour.
* Renamed qlgc-ofed package to intel-truescale-ofed
* Added more Xeon Phi parameters managed by Bright.
* Re-implemented cm-libpam in such way that user can ssh to node even if no processes of the user runs there (old behavior), but rather when a workload manager makes an allocation of the node (new behavior).

== Miscellaneous Notes ==

* All packages were upgraded to latest versions