DOE's new data-management tool aims to tame LHC data

Dubbed Big PanDA, Brookhaven National Laboratory's workload management system seeks to maximize Department of Energy supercomputer resources.

Brookhaven National Laboratory's supercomputer, TITAN.

The Brookhaven National Laboratory's workload management system is being used to tap the Titan supercomputer resources at the Oak Ridge National Laboratory. (Image: TITAN supercomputer / BNL)

Department of Energy researchers have developed a data-management tool for its supercomputers to help handle the tidal wave of data generated by the recently restarted Large Hadron Collider (LHC) across the Atlantic.

The LHC in Switzerland now operates at nearly twice its former collision energy, according to Brookhaven National Laboratory. That increase has left data physicists sifting through a monumental pile of data. A pilot project under the Department of Energy’s Office of Science is using a big data management tool developed by physicists at Brookhaven and the University of Texas at Arlington.

According to Brookhaven, a workload management system, dubbed PanDA (for Production and Distributed Analysis), was originally designed by high-energy physicists to handle data analysis jobs for the LHC’s ATLAS collaboration. ATLAS is part of the LHC project that detects particles created by proton collisions.

During the LHC’s first run, from 2010 to 2013, PanDA made ATLAS data available for analysis by 3,000 scientists around the world using the LHC’s worldwide grid of networked computing resources. The latest rendition of the supercomputer workload management system, called Big PanDA, likewise aims to meet big data challenges in many areas of science by maximizing the use of limited supercomputing resources.

Big PanDA schedules jobs opportunistically on the DOE's Titan supercomputer in Oak Ridge, Tenn. According to Brookhaven, Big PanDA's opportunistic "fill in the gap" operation on Titan doesn’t conflict with the supercomputer's ability to schedule traditional, very large computing jobs.

The integration of Big PanDA on Titan, according to Brookhaven, is the first large-scale use of leadership class supercomputing facilities coupled with a workload management to assist in the analysis of experimental high-energy physics data. The program will have immediate benefits for ATLAS, lab officials predicted.

As the volume of data increases with the LHC collision energy, so does the need for running simulations that help scientists interpret experimental results. Those simulations, a Brookhaven announcement said, are best handled by supercomputers.

The prototype Big PanDA software has been significantly modified from its original design, lab officials said, and “backfills” simulations of the collisions taking place at the LHC into spaces between typically large supercomputing jobs, minimizing valuable computer run time.