Base Command Manager / Bright Cluster Manager Release Notes
Release notes for Bright 6.1-1 (released 2013-06-01)
New Features
- Full support for Intel Xeon Phi (i.e. MIC)
- Complete JSON API
- CMGUI now uses JSON for communication with the cluster. This means that there are no compiled components in CMGUI anymore, which makes it much easier to run CMGUI anywhere (Linux, MacOS X, Windows) without having to (re)compile anything.
- Username/password authentication for CMGUI
- Integration of user management for management users and normal users ('Management profile' attribute can be set for users to assign users management capabilities)
- Mac OS X package installer for CMGUI (requires Mac OS 10.6 Snow Leopard or later and Firefox 16 or later)
- SELinux support for RHEL6/CentOS6/Scientific Linux 6
- Cloud-related (Cloud Bursting):
- Added support for hierarchical provisioning nodes in the cloud
- Cloud support for High Availability setups
- Added 'cloud-check' utility which validates the correctness of the current cloud bursting configuration. Useful when troubleshooting cloud-related issues.
- Support for Amazon VPC (Virtual Private Cloud) -- Cloud bursting can now be performed into a Amazon Virtual Private Clouds.
- Managing VPCs
- Managing VPC subnets
- Text-based setup-by-step wizard, 'cloud-setup-private-cloud,' for automating the creation of VPCs
- Support for Amazon Elastic IPs (both in EC2-classic and EC2-VPC environments). The cloud directors can now be permanently assigned a persistent public IP address.
- Support for pdsh-style ranges in CMSH. For example: r[0-4,9]n[1-48]
- Added health check to check that an IB HCA is working properly: ib
- Added health check which can inspect 'dmesg' output for regular expressions: dmesg
- Upgraded CMDaemon logging facilities. The file /cm/local/apps/cmd/etc/logging.cmd.conf allows for a fine-grained control over which CMDaemon subsystems log what type (verbosity) of log messages.
- Improved GPU support
- Reset GPUs
- Set various properties (dependent on GPU type)
- Quick and extensive GPU healthchecks
- Improved displaying of chassis in CMGUI rackview
- Support for two-factor and one-time password authentication via CMGUI
- Support for CPU scaling governors
- Introduced pctl utility which can be used to control power to nodes independently from CMDaemon in emergency situations
- It is now possible to add a device state to the list format in cmsh's device mode. The default output format has been changed to include the state, and it no longer includes the power distributions property.
- Added feature to specify that a software image provided by a provisioning node, is stored on shared storage. In previous Bright versions the provisioning role had a property called images. This was replaced by localimages and sharedimages properties. The allimages property can now be set to "localdisk", "sharedstorage", or "notallimages". Images that are on shared storage for both the head node and the provisioning node will not be affected by the 'updateprovisioners' CMSH command anymore.
- Disabling/configuring automatic update of provisioning nodes is now done by setting the partition property provisioningnodeautoupdatetimeout (zero disables it), instead of setting the CMDaemon configuration file directives.
- Changed behavior of automatically updating provisioning nodes. Before, whenever a provisioning request came in, CMDaemon would check how long ago the last automatic update took place. If it was too long ago, it would trigger an update. This caused many interruptions when booting a large number of nodes. Now, whenever a provisioning node is considered for handling a requests, the CMDaemon will check how long ago it successfully provided that same image to another node. If it's too long ago it will trigger an update. This means that booting nodes after not booting nodes for a long time will trigger an update, but booting a large number of nodes will no longer be interrupted.
- On cloud nodes, the node installer previously downloaded a fresh kernel/initrd images from the head node, which was not optimal. As of Bright 6.1 nodes will download these images from the cloud director.
- Slurm power save settings can now be configured from the slurmserver and slurmclient roles.
- Provide ability to specify administrator's email during head node installation, which will be used to report critical errors.
- The 'cmha-setup' utility was made cloud-aware
- Add --state option to select devices for several cmsh commands (e.g. foreach --state)
- Added option to set the provisioning node auto update timeout, to prevent provisioning node updates during large scale installation
Changes in Behavior of Existing Features
- The pythoncm Python interface has improved session support so that it can deal properly (i.e. disconnect and reconnect) with cmdaemons that are restarting.
- Added support for making a 'BOOTIF' interface part of a bridge interface.
- fftw2 and fftw3 packages have been renamed (included MPI name), which also includes changes to the module paths
- package: fftw2-<compiler>-64 -> fftw2-openmpi-<compiler>-64
- module: fftw2/<compiler>/64/... -> fftw2/openmpi/<compiler>/64/...
- package: fftw3-<compiler>-64 -> fftw3-openmpi-<compiler>-64
- module: fftw3/<compiler>/64/... -> fftw3/openmpi/<compiler>/64/...
- Name of globalarrays packages changed (to standard schema), which also includes changes to the module paths
- package: globalarrays-<compiler>-openmpi-64 -> globalarrays-openmpi-<compiler>-64
- module: globalarrays/<compiler>/openmpi/64/... -> globalarrays/openmpi/<compiler>/64/...
- conman package is renamed to cm-conman
- This includes the files in general places, for example /etc/init.d/conman is renamed to /etc/init.d/cm-conman
- iozone package is renamed to cm-iozone
- freeipmi package is renamed to cm-freeipmi
- ipmitool package is renamed to cm-ipmitool
- hwloc package is renamed to cm-hwloc
- iperf package is renamed to cm-iperf
- Changed syslog daemon for SLES11 from syslogd to syslog-ng
- CUDA 4.0, 4.1 packages are no longer available
- mpich2 packages are no longer provided, instead the mpich package has been upgraded to 3.x series
- mpich-mx and mx10g packages are no longer provided
- Prolog script calls all prologs in <WLMTOP>/var/prologs/
- CMDaemon uses short hostnames when creating new node in PBS Pro qmgr
- CMDaemon stops to set acl_hosts in PBS Pro qmgr according defined in CMDaemon queues
- pbspro-slave package is renamed to pbspro-client
- cmsub is moved to separate package
- Simplify using external workload managers
- Service stopped via CMDaemon will not be restarted automatically after CMDaemon restart
- Switch from 1024 to 2048 bits for all certificates
- Improved support for bonded interface.
- Allow bridge interfaces to be created with no assigned IP
- Add advanced properties to workload manager roles in CMGUI
- Allow selection of nodes in CMGUI rackview
- Warn if a statically assigned IP conflicts with the network's dynamic range.
- Changed cmsh/device/pping to list nodes as DOWN rather then DEAD. The latter could falsely suggest broken hardware.
- When the node installer is running in debug mode, debug messages are no longer printed to the screen, but only to the log file.
- All environment modules now use prepend-path instead of append-path when setting variables like PATH, MANPATH, etc.
- Additional Dell PE-C utilities will be installed on the head node when 'Dell' is chosen as the hardware vendor during head node installation.
- Default home directories of system users (cmhealth, slurm, pbsdata, sgeadmin) created by Bright are no longer in /home.
- Workload manager system users are also created in the software images by default, to facilitate smooth offloading of workload managers.
- The default dynamic range of the management network has been reduced to 2048 addresses.
- Add new job properties, which also are available via API
- Allow to configure workload managers paths (which are used by CMDaemon) within the workload managers server and client roles.
- Add Boards and SocketsPerBoard node properties to slurmclient role.
- TORQUE is built with mic support option.
- VLAN id should be either 0 or start with a non-zero digit.
- GotoBLAS was replaced with OpenBLAS
Miscellaneous Notes
- All third-party packages were upgraded to latest versions
- cm-dist-limit-* packages are introduced, which can be used to block OS major upgrade
- Decreased the time needed to provision a compute node by several seconds.