Welcome to the PAW-ATM Workshop.
- Proceedings
- Program
- Important dates
- Summary
- Scope and Aims
- Topics
- Submissions
- Organization
- Previous Instances of PAW-ATM
Program
- 9:00 - 9:02 Karla V. Morris Wright, Elliott Slaughter,
Engin Kayraklioglu, Irene Moulitsas, Katherine Rasmussen, Daniele Lezzi and Kenjiro Taura
PAW-ATM2025 Introduction
(Presentation)
- 9:02 - 10:00 Session 1 Session Chair: Katherine Rasmussen - Lawrence Berkeley National Laboratory
- 9:02 - 10:00 Distinguished Speaker: Pavan Balaji – Meta
- 10:00 - 10:30 Morning Break (30 min)
- 10:30 - 12:30 Session 2 Session Chair: Damian W. I. Rouson - Lawrence Berkeley National Laboratory
- 10:30 - 11:10 Invited Speaker: Laxmikant Kale - University of Illinois at Urbana-Champaign Alternative parallel programming models: past, present and is there a future?
- 11:10 - 11:30 Alexander Fell, Yuqing Wang, Tianshuo Su, Marziyeh Nourian, Wenyi Wang, Jose M. Monsalve-Diaz, Andronicus Rajasukumar, Jiya Su, Ruiqi Xu, Rajat Khandelwal, Tianchi Zhang, David F. Gleich, Yanjing Li, Hank Hoffmann, and Andrew A. Chien
KVMSR+UDWeave: Extreme-Scaling with Fine-grained Parallelism on the UpDown Graph Supercomputer
(Presentation)
- 11:30 - 11:50 Matt Drozt, Michael P. Ferguson, Ryan D. Friese, and Shreyas Khandekar
Comparing Distributed-Memory Programming Frameworks with Radix Sort
(Presentation)
- 11:50 - 12:10 Benjamin Brock, and Renato Golin
Slicing Is All You Need: Towards A Universal One-Sided Algorithm for Distributed Matrix Multiplication
(Presentation)
- 12:10 - 12:30 Baodi Shan, Mauricio Araya-Polo, and Barbara Chapman
DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP
(Presentation)
- 12:30 - 2:00 Lunch Break (90 min)
- 2:00 - 3:00 Session 3 Session Chair: Jan Ciesko - Sandia National Laboratories
- 2:00 - 2:20 Invited Speaker: Francesc Lordan - Barcelona Supercomputing Center From Parallel Clusters to Hyper-distributed applications: Programming Swarms with COLMENA
- 2:20 - 2:40 Andrew Davis, Hans Johansen, Xinfeng Gao, and Stephen Guzik Weak Scaling of NVSHMEM Applied To Hashed Distributed Structured Data
- 2:40 - 3:00 Mahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, and Szilárd Páll Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEM
- 3:00 - 3:30 Afternoon Break (30 min)
- 3:30 - 4:30 Session 4 Session Chair: Francesc Lordan - Barcelona Supercomputing Center
- 3:30 - 3:50 Davis Herring, Maxim Moraru, Scott Pakin, Julien Loiseau, Richard Berger, Philipp V. F. Edelmann, and Ben Bergen Enhancing HPX with FleCSI: Automatic Detection of Implicit Task Dependencies
- 3:50 - 4:10 Mia Reitz and Jonas Posner Stackless vs. Stackful Coroutines: A Comparative Study for RDMA-based Asynchronous Many-Task (AMT) Runtimes
- 4:10 - 4:30 David K Zhang, Rohan Yadav, Alex Aiken, Fredrik Kjolstad, and Sean Treichler KDRSolvers: Scalable, Flexible, Task-Oriented Krylov Solvers
- 4:30 - 5:30 Panel Discussion: The Role of Alternatives to MPI+X Technologies in AI/ML Panel Chair: Anshu Dubey - Argonne National Laboratory
- Ryan Coffee - SLAC National Accelerator Laboratory
- Zhihao Jia - Carnegie Mellon University
- Raye Kimmerer - Princeton University
- Peter Mendygral - Hewlett Packard Enterprise
- Jeremy Wilke - NVIDIA
- Manuscript Submissions deadline:
July 24, 2025July 31, 2025 (Extension) - Artifact Description (AD) Stage 1 (mandatory) Submissions deadline: July 31, 2025
- Notification to authors: August 30, 2025
- Artifact Evaluation (AE) Stage 2 (optional) Submissions deadline: September 4, 2025
- AE and Reproducibility Badges review period: September 5–19, 2025
- Signing the copyright transfer form to ACM: September 12, 2025
- Final AD/AE/Badges decisions and notification to authors: September 20, 2025
- Updating paper metadata (title, author info) in Linklings: September 22, 2025
- Camera-ready papers with AD/AE appendix due from authors: September 22, 2025
- Final program: September 22, 2025
- PAW-ATM workshop date: November 16, 2025
- Novel application development using high-level parallel programming languages and frameworks
- Examples that demonstrate performance, compiler optimization, error checking, and reduced software complexity
- Applications from artificial intelligence, data analytics, bioinformatics, and other novel areas
- Performance evaluation of applications developed using alternatives to MPI+X and comparisons to standard programming models
- Novel algorithms enabled by high-level parallel abstractions
- Experience with the use of new compilers and runtime environments
- Libraries using or supporting alternatives to MPI+X
- Benefits of hardware abstraction and data locality on algorithm implementation
- Full-length papers presenting novel research results:
- User experience abstracts:
- Karla Vanessa Morris Wright - Sandia National Laboratories
- Engin Kayraklioglu - Hewlett Packard Enterprise
- Kenjiro Taura - University of Tokyo
- Daniele Lezzi - Barcelona Supercomputing Center
- Katherine Rasmussen - Lawrence Berkeley National Laboratory
- Marjan Asgari - National Resources Canada
- Dan Bonachea - Lawrence Berkeley National Laboratory
- Jan Ciesko - Sandia National Laboratories
- Irina Demeshko - NVIDIA
- Nils Deppe - Cornell University
- Nelson Dias - Federal University of Paraná
- Michael P. Ferguson – Hewlett Packard Enterprise
- Magne Haveraaen - University of Bergen
- Engin Kayraklioglu - Hewlett Packard Enterprise
- Daniele Lezzi - Barcelona Supercomputing Center
- Bill Long - Hewlett Packard Enterprise/Cray Retired
- Nouredine Melab - University of Lille
- Esteban Meneses Rojas - National High Technology Center
- Henry Monge Camacho - Oak Ridge National Laboratory
- Karla Vanessa Morris Wright - Sandia National Laboratories
- Irene Moulitsas - Cranfield University
- Katherine Rasmussen - Lawrence Berkeley National Laboratory
- Julian Samaroo - Massachusetts Institute of Technology
- Michael Schlottke-Lakemper - University of Augsburg
- Elliott Slaughter - SLAC National Accelerator Laboratory
- Gabriel Tanase - Amazon Web Services
- Kenjiro Taura - University of Tokyo
- Miwako Tsuji - Riken Advanced Institute for Computational Science
- Irene Moulitsas - Cranfield University
- Elliott Slaughter - SLAC National Accelerator Laboratory
- Oliver Alvarado Rodriguez - Hewlett Packard Enterprise
- Keylor Arley - Universidad Costa Rica
- Yakup Budanaz - Technical University of Munich
- Fabio Durastante - University of Pisa
- Guillaume Helbecque - University of Lille
- Seema Mirchandaney - Stanford University
- Jonas Posner - University of Kassel
- Soren Rasmussen – National Center for Atmospheric Research
- Anjiang Wei - Stanford University
- Bradford L. Chamberlain - Hewlett Packard Enterprise
- Bill Long - Hewlett Packard Enterprise/Cray Retired
- Damian W. I. Rouson - Lawrence Berkeley National Laboratory
- PAW-ATM2025: Parallel Applications Workshop, Alternatives To MPI+X
- PAW-ATM2024: Parallel Applications Workshop, Alternatives To MPI+X
- PAW-ATM2023: Parallel Applications Workshop, Alternatives To MPI+X
- PAW-ATM2022: Parallel Applications Workshop, Alternatives To MPI+X
- PAW-ATM2021: Parallel Applications Workshop, Alternatives To MPI+X
- PAW-ATM2020: Parallel Applications Workshop, Alternatives To MPI+X
- PAW-ATM2019: Parallel Applications Workshop, Alternatives To MPI+X
- PAW-ATM2018: Parallel Applications Workshop, Alternatives To MPI
- PAW2017: PGAS Applications Workshop
- PAW2016: PGAS Applications Workshop
(Presentation)
(Presentation)
(Presentation)
(Presentation)
(Presentation)
(Presentation)
(Presentation)
(Panel Slides)
Panelists:
Important dates
Summary
As supercomputers become more and more powerful, the number and diversity of applications that can be tackled with these machines grow. Unfortunately, the architectural complexity of these supercomputers grows as well, with heterogeneous processors, multiple levels of memory hierarchy, and many ways to move data and synchronize between processors. The MPI+X programming model, use of which is considered by many to be standard practice, demands that a programmer be expert in both the application domain and the low-level details of the architecture(s) on which that application will be deployed, and the availability of such superhuman programmers is a critical bottleneck. Things become more complicated when evolution and change in the underlying architecture translates into significant re-engineering of the MPI+X code to maintain performance.
Numerous alternatives to the MPI+X model exist, and by raising the level of abstraction on the application domain and/or the target architecture, they offer the ability for "mere mortal" programmers to take advantage of the supercomputing resources that are available to advance science and tackle urgent real-world problems. However, compared to the MPI+X approach, these alternatives generally lack two things. First, they aren't as well known as MPI+X and a domain scientist may simply not be aware of models that are a good fit to their domain. Second, they are less mature than MPI+X and likely have more functionality or performance "potholes" that need only be identified to be addressed.
PAW-ATM is a forum for discussing HPC applications written in alternatives to MPI+X. Its goal is to bring together application experts and proponents of high-level languages to present concrete example uses of such alternatives, describing their benefits and challenges.
Scope and Aims
The PAW-ATM workshop is designed to be a forum for discussion of supercomputing-scale parallel applications and their implementation in programming models outside of the dominant MPI+X paradigm. Papers and talks will explore the benefits (or perhaps drawbacks) of implementing specific applications with alternatives to MPI+X, whether those benefits are in performance, scalability, productivity, or some other metric important to that application domain. Presenters are encouraged to generalize the experience with their application to other domains in science and engineering and to bring up specific areas of improvement for the model(s) used in the implementation.
In doing so, our hope is to create a setting in which application authors, language designers, and architects can present and discuss the state of the art in alternative scalable programming models, while also wrestling with how to increase their effectiveness and adoption. Beyond well-established HPC scientific simulations, we also encourage submissions exploring artificial intelligence, big data analytics, machine learning, and other emerging application areas.
Topics
Topics of interest include, but are not limited to:
Papers that include description of applications that demonstrate the use of alternative programming models will be given higher priority.
Submissions
Submissions are solicited in two categories:
Full-length papers will be published in the workshop proceedings. Submitted papers must describe original work that has not appeared in, nor is under consideration for, another conference or journal. Papers shall be eight (8) pages minimum and not exceed ten (10) pages including text, figures, and non-AD/AE appendices, but excluding bibliography and acknowledgments.
PAW-ATM follows the reproducibility initiative of SC25. Submissions shall include an Artifact Description (AD) appendix, and may optionally include an Artifact Evaluation (AE) appendix. The appendix pages related to the reproducibility initiative dependencies are not included in the page count.
Authors should include a draft of your AD/AE appendix with the initial manuscript PDF submission. You will have the opportunity to revise the appendix before its final submission on September 4, 2025.
Abstracts will be evaluated separately and will not be included in the published proceedings. Submissions in this track include a title and 1-4 pages abstract. The content may include any combination of novel and/or previously published work that is relevant to the workshop's scope. Content that highlights the experiences of users of alternatives of MPI, and their applications, will be prioritized within this submission category.
Abstracts may optionally include AD/AE appendices, not included in the abstract page count, but such appendices will not be evaluated and no badges will be awarded.
When deciding between manuscript submissions with similar merit, submissions whose focus relates more directly to the key themes of the workshop (application studies, computing at scale, high-level alternatives to MPI+X) will be given priority over those that don't.
Manuscripts shall be submitted through Linklings. Please use the ACM proceedings template.