University of Rochester

Blue Gene/Q

From CIRC

Contents



System Overview

Total Processing Power: 1024 Compute Nodes, 16,384 CPU Cores, 209TFLOPS

The Blue Gene/Q system at the University of Rochester, called BlueStreak, consists of one rack of the 209 TFLOPS IBM Blue Gene/Q massively parallel processing (MPP) supercomputer, one IBM System p (POWER 7) front-end node, one IBM System p (POWER 7) service node, and 4 IBM System x I/O nodes connected to a 400 TB IBM GPFS System Storage solution. The Blue Gene/Q system consists of 1,024 nodes, 16,384 CPU cores, 16 TB of RAM, and 400 TB of storage. Each node consists of a 16-core A2 processor with 32 MB of cache and access to 16 GB of RAM. The figure below shows the hardware configuration of BlueStreak.

Image:BlueGeneQHardware.png

Instructions

Username/Password

  • User accounts and access from IP addresses or ranges is requested using the web-based account management tool.
  • Your username is your NETID. Your password is your NETID password.

Access

  • Secure Shell (SSH) is the only supported mechanism for accessing BlueStreak.
  • Connect via ssh to bluestreak.circ.rochester.edu.

Software

Software Porting and Development

IBM XL Compilers

bgxlc your_C_program.c
bgxlf your_Fortran_program.f

MPI wrappers for IBM XL compilers

module load mpi-xl
mpixlc yourprogram.c
mpixlf77 yourprogram.f

GNU compilers

module load gnu-compilers
gcc yourprogram.c
gfortran yourprogram.f

MPI wrappers for GNU compilers

module load mpi-gnu
mpicc yourprogram.c
mpif77 yourprogram.f

Available software and modules

  • See Blue Gene/Q Software
  • To load a particular module, type 'module load <module>'
  • To see modules you have loaded, type 'module list'
  • To remove a particular module, type 'module unload <module>'

Submitting Jobs to BlueStreak

BlueStreak uses the Simple Linux Utility for Resource Management (SLURM) software for job queueing and scheduling system. Users must submit a walltime limit in their jobs. Otherwise, a default of 30 seconds will be used.

Partitions

BlueStreak users can submit their jobs to one of four partitions (queues) with different time and node limits.


Partition Name Time Limit Minimum # nodes Maximum # nodes Usage
debug 60 minutes 1 32 Testing and debugging programs
standard 24 hours 1 512 Normal execution queue
reserved 48 hours 4 1024 For jobs using a reservation on the system

Queue Information

To view all jobs running or queued on BlueStreak, type:

squeue -l

To view the jobs for a specific user:

squeue -l -u <NetID>

Type 'man squeue' for more information.

Job Submission

  • sample sbatch script

Jobs are submitted to the queue using an sbatch script, such as

#!/bin/sh
#SBATCH --job-name MyJob                 # Job name
#SBATCH --partition debug                # Partition [debug, preempt, standard, reserved]
#SBATCH --nodes 4                        # Number of nodes [must be power of 2 within queue limitations]
#SBATCH --ntasks-per-node 16             # Number of tasks per node (There are 16 cores per node)
#SBATCH --cpus-per-task 1                # Number of cores per task (If not 1, then you should use openmp or a compiler that can auto-parallelize)
#SBATCH --time 0-00:30:00                # wall time (30 minutes)
#SBATCH --mail-type ALL                  # Send e-mail when... [BEGIN, END, FAIL, REQUEUE, or ALL]
##SBATCH --mail-user username@domain.com  # e-mail address - (currently commented out)

srun yourprogram
  • sbatch submits a job to the queue:
sbatch yourscript

Type 'man sbatch' for more information.


  • scancel cancels a queued or running job:
scancel <jobid>

Jobs that were previously running should now display a completed status.

Debugging

There is a tool called core processor that will parse core files for debugging. It is located at

 /bgsys/drivers/ppcfloor/coreprocessor/bin/coreprocessor.pl

You can also use the utility

addr2line

to parse core files.

Data Storage

BlueStreak is connected to a 400 TB General Parallel File System (GPFS), a portion of which is allocated to /scratch. Please limit the data in /home to configuration files and application software. Please see the data storage policies on the Policies page for more information.

Additional Documentation

SLURM queuing system

IBM Redbook: BlueGene/Q Application Development

IBM Redpaper: BlueGene/Q Code Development and Tools Interface

BlueGene/Q Summit at Argonne National Labs

Introduction to Mira (BlueGene/Q at Argone National Labs) - by Kalyan Kumaran

Optimizing Single-Node Performance on BlueGene - by Lee Killough, Argonne National Labs

Personal tools