GPI-2 is the second generation of GPI, a product of the CC-HPC at the Fraunhofer ITWM that will continue to be supported and licensed .
GPI-2 implements the GASPI specification (www.gaspi.de), an API specification which originates from the ideas and concepts GPI. GPI-2 is an API for asynchronous communication. It provides a flexible, scalable and fault tolerant interface for parallel applications.
You can have a look at the API or have a look at some examples. You can also browse through the source code on Github. To get an overview of features and how to compile and run a parallel program, see the GASPI tutorial.
GPI-2 is the next generation of GPI with more features. GPI has been evolving since 2005 and was known as FVM (Fraunhofer Virtual Machine) and in 2009 settled with the name GPI (Global address Programming Interface). GPI has completely replaced MPI at the Fraunhofer ITWM, where all products and research are based on GPI. In 2011, the Fraunhofer ITWM an its partners such as Fraunhofer SCAI, TUD, T-Systems SfR, DLR, KIT, FZJ, DWD and Scapos have initiated and launched the GASPI project to define a novel specification an API (GASPI based on GPI) and to make this novel specification a reliable, scalable and universal tool for the HPC community. GPI-2 is the first open source implementation.
- High performance
- Flexible API
- Failure tolerance
- Memory segments to support heterogeneous systems (e.g. Intel Xeon Phi)
- Threaded-model and thread-safe interface
Who uses GPI-2?
Sharp Reflections GmbH
(spin-off of Fraunhofer ITWM)
Seismic analysis software that combines pre-stack visualisation, processing and interpretation in one powerful platform
(spin-off of Fraunhofer ITWM, currently preparing for start-up phase)
Linux based 3D visualisation software for seismic post-stack data
(internally also used for research as vehicles to test new computer architectures and for performance tests):
Various seismic products
Industrial Product (workflows need to be adapted for industrial domains, in a quickly evolving field, thus research still required):
Distributed runtime engine that automatically parallelises stream and batch-data processing workflows
Machine Learning Algorithm based on GASPI
Linear Algebra Library based on GASPI
RTM (Reverse Time Migration)
GRT (Generalized Radon Transform)
HPC Research on fault tolerance
Shahzad, F. et al, Building a Fault Tolerant Application Using the GASPI Communication Layer.
In: Proceedings of FTS 2015 in conjunction with IEEE Cluster 2015 : IEEE, 2015, S. 580-587.
T-Systems / DLR
TAU solver: General purpose tool for a broad range of aerodynamic and aero-thermodynamic problems
(National BMBF Project: GASPI http://www.gaspi.de/
EXA2CT Project, http://www.exa2ct.eu/,
INTERTWINE Project, http://www.intertwine-project.eu/)
CFD proxy code to increase parallel efficiency of current production code for large number of processes
2 scientific applications early adopters for GASPI/GPI
(EPiGRAM Project, http://www.epigram-project.eu/)
iPIC3D (Particle-in-Cell code fort he simulation of space and fusion plasmas): early adopter of GASPI/GPI
NEK5000 (simulation of incompressible flows in complex geometries): early adopter of GASPI/GPI
Portable Programming Interface to map grid-based applications on distributed computers (EXA2CT Project, http://www.exa2ct.eu/)
Recommender System (EXA2CT Project, http://www.exa2ct.eu/)
SHARK using GASPI/GPI as one underlying communication mechanism
BPMF (Bayesian Probabilistic Matrix Factorisation)
(EXA2CT Project, http://www.exa2ct.eu/)
DEFMESH and AETHER
IFS – Integrated Forecasting System (NWP)
(EPiGRAM Project, http://www.epigram-project.eu/ )
IFS (numerical weather prediction)
User of GPI-2 based applications
Known GPI-2 installations at SuperComputing Centres
Beskow (KTH, Stockholm)
Archer ( EPCC, Edinburgh),
Anselm (IT4I, Ostrava)
SuperMUC (LRZ, Munich)
Elwetrisch ( RHRK, Kaiserslautern)
Hazelhen, (HLRS, Stuttgart)
The available GPI-2 version has the following requirements:
- Infiniband or RoCE devices
- Ethernet (TCP)
- OFED software stack installation (in particular libibverbs)
- ssh server running on compute nodes (requiring no password).
GPI-2 (and GASPI) provides interesting and distinguishing concepts.
Modern hardware typically involves a hierarchy of memory with respect to the bandwidth and latency of read and write accesses. Within that hierarchy are non-uniform memory access (NUMA) partitions, solid state devices (SSDs), graphical processing unit (GPU) memory or many integrated cores (MIC) memory. The memory segments are supposed to map this variety of hardware layers to the software layer. In the spirit of the PGAS approach, these GASPI segments may be globally accessible from every thread of every GASPI process. GASPI segments can also be used to leverage different memory models within a single application or to even run different applications.
A group is a subset of all ranks. The group members have common collective operations. A collective operation on a group is then restricted to the ranks forming that group. There is a initial group (GASPI_GROUP_ALL) from which all ranks are members.
Forming a group involves 3 steps: creation, addition and a commit. These operations must be performed by all ranks forming the group. The creation is performed using gaspi_group_create. If this operation is successful, ranks can be added to the created group using gaspi_group_add.
To be able to use the created group, all ranks added to it must commit to the group. This is performed using gaspi_group_commit, a collective operation between the ranks in the group.
One-sided asynchronous communication is the basic communication mechanism provided by GPI-2. The one-sided communication comes in two flavors. There are read and write operations (single or in a list) from and into allocated segments. Moreover, the write operations are extended with notifications to enable remote completion events which a remote rank can react on.
One-sided operations are non-blocking and asynchronous, allowing the program to continue its execution along with the data transfer.
The mechanisms for communication in GPI-2 are the following:
There is the possibility to use different queues for communication requests where each request can be submitted to one of the queues. These queues allow more scalability and can be used as channels for different types of requests where similar types of requests are queued and then get synchronised together but independently from the other ones (separation of concerns).
GPI-2 provides atomic operations such that variables can be manipulated atomically. There are two basic atomic operations: fetch_and_add and compare_and_swap. The values can be used as global shared variables and to synchronise processes or events.
Failure tolerant parallel programs require non-blocking communication calls. GPI-2 provides a timeout mechanism for all potentially blocking procedures. Timeouts for procedures are specified in milliseconds. For instance, GASPI_BLOCK is a pre-defined timeout value which blocks the procedure call until completion. GASPI_TEST is another predefined timeout value which blocks the procedure for
the shortest time possible, i. e. the time in which the procedure call processes an atomic portion of its work.