X-Stack ComPort
ComPort Project Summary
As part of the recently announced DOE-investment in Research on Adapting Scientific Software to Run on Next-Generation Supercomputers, the ComPort project develops Rigorous Testing Methods to Safeguard Software Porting.
Modern research is crucially based on the use of high-performance computing (HPC) working in tandem with machine learning (ML). With growing heterogeneity of hardware (CPUs, GPUs, accelerators) and software (various parallelism models), the overall numerical integrity of code can be affected by a variety of causes, not all of which are fully understood or even have been encountered in existing HPC systems. Problems such as innocuous-looking changes to compiler optimization flags leading to aberrant climatic predictions are already being faced by researchers. Similar variations can cause ML systems to misclassify data or program states, leading to incorrect HPC software behaviors in combined HPC/ML systems already in use, such as for characterizing the Sars COV-2 virus.
The ComPort project develops rigorous methods to verify – after each software upgrade or port – whether computational results agree with expected answers (say, as delivered by prior versions that have stood the test of time). It empowers the user to define how to rigorously test the numerical behavior of hardware and software, and also specify what results to accept. The ComPort software tool suite will support all this while also providing a high degree of automation and high-level user feedback to diagnose and repair software applications to facilitate software numerical correctness maintenance despite changing hardware and compilers.