2013: "The Power of Many: Running Many Simulations on Many Supercomputers", Dr. Shantenu Jha, Rutgers University
From Iain Bethune on December 15th, 2016
A promising way to overcome this common limitation is the use of a Pilot-Job --- which can be defined as a container or placeholder job to provide multi-level scheduling via an application-level scheduling overlay over the system scheduler. We discuss both the theory and practise of Pilot-Jobs: Specifically, we introduce the P* Model of Pilot-Jobs and present "BigJob" as a SAGA-based extensible, interopable and scalable implementation of the P* Model. We then discuss several science problems that have/are using BigJob to execute multiple simulations at unprecedented scales on a range of supercomputers and distributed supercomputing infrastructure such as XSEDE.