I am a Postdoctoral Appointee at the Mathematics and Computer Science Division at Argonne National Laboratory, USA, and a Postdoc-at-Large at the University of Chicago. My research focuses on scalable systems for training and checkpointing large language models (LLMs), with an emphasis on breaking the GPU memory wall through multi-level offloading across heterogeneous memory and storage hierarchies (GPU HBM, host DRAM, CXL, NVMe, parallel file systems). I am particularly interested in designing asynchronous, cache-aware data movement strategies that enable efficient hybrid CPU-GPU training of transformer models at scales exceeding hundreds of billions of parameters, as well as lazy, non-blocking checkpointing techniques that minimize I/O interference during long-running training jobs.

More broadly, my work spans I/O performance optimization for HPC and AI workloads, including GPU-accelerated multi-level checkpoint caching and prefetching for scientific simulations, distributed training parallelism strategies, and data compression pipelines. My research has been recognized with Best Paper awards at HPDC 2024 and HiPC 2022.

I graduated from the Rochester Institute of Technology (RIT) in 2024, under the advisement of Prof. M. Mustafa Rafique and Prof. Bogdan Nicolae (ANL). My Ph.D. thesis received the 2025 ACM SIGHPC Doctoral Dissertation Award Honorable Mention.

I maintain a somewhat up-to-date list of conference and workshop deadlines in the HPC and distributed systems space at this link. Please recheck the official conference webpage for inconsistencies and updates; this is only meant for quick reference.

Publications

Conferences

[Preprint'26] Ziyue Liu, Zhengyang Wang, Ruijie Zhang, Avinash Maurya, Hui Zhou, Paul Hovland, Sheng Di, Franck Cappello, Bogdan Nicolae, Zheng Zhang, “ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload”, Under Review [Paper].
[ISCA'26] Moiz Arif, Avinash Maurya, Bogdan Nicolae, Sudharshan Vazhkudai, “Understanding Inference Scaling for LLMs: Bottlenecks, Trade‑offs, and Performance Principles”, ISCA'26: The 53rd International Symposium on Computer Architecture, Industry Track (Raleigh, USA) [Paper].
[HPDC'26] Jie Ye, Avinash Maurya, Krishna Teja Chitty‑Venkata, Bogdan Nicolae, Anthony Kougkas, Xian‑He Sun, “PKAS: Predictive KVCache‑Aware Scheduling for Faster LLM and Transformer Inferences”, HPDC'26: The 35th International Symposium on High‑Performance Parallel and Distributed Computing, (Cleveland, OH, USA) [Paper].
[MLSys'26] Zhengyang Wang, Ziyue Liu, Ruijie Zhang, Avinash Maurya, Paul Hovland, Bogdan Nicolae, Franck Cappello, Zheng Zhang, "BOOST: BOttleneck-Optimized Scalable Training Framework for Low-Rank Large Language Models", MLSys'26: The 9th Annual Conference on Machine Learning and Systems (Bellevue, WA, USA). [Paper].
[SenSys'26] Ali Khalid, Jaiaid Mobin, Sumanth Rao Appala, Avinash Maurya, Stephany Berrio Perez, M Mustafa Rafique, Fawad Ahmad, "Been There, Scanned That: Nostalgia-Driven LiDAR Compression for Self-Driving Cars", SenSys'26: The first ACM/IEEE International Conference on Embedded Artificial Intelligence and Sensing Systems (Saint-Malo, France). [Paper][Code].
[SC'25] Avinash Maurya, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory Wall", SC'25: The International Conference for High-Performance Computing, Networking, Storage, and Analysis (St. Louis, MO, USA, 2025) [Paper][Slides][Code].
[IPDPS'25] Jie Ye, Jaime Cernuda, Avinash Maurya, Anthony Kougas, Xian-He Sun, and Bogdan Nicolae. "Characterizing the Behavior and Impact of KV Caching on Transformer Inferences under Concurrency", IPDPS'25: The 39th IEEE International Parallel & Distributed Processing Symposium (Milan, Italy). [Paper]
[Middleware'24] Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading". Middleware'24: The 25th International Middleware Conference (Hong Kong, 2024). [Paper][Slides][Code]
[HPDC'24] Avinash Maurya, Robert Underwood, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models". HPDC'24: The 33rd International Symposium on High-Performance Parallel and Distributed Computing (Pisa, Italy, 2024). AWARDED BEST PAPER! [Paper][Slides][Code]
[IPDPS'24] Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, and Ali R. Butt. "Application-Attuned Memory Management for Containerized HPC Workflows". IPDPS'24: The 38th IEEE International Parallel & Distributed Processing Symposium (San Francisco, USA) [Paper]
[HiPC'23] Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, and Franck Cappello. "Towards Efficient I/O Pipelines using Accumulated Compression". HiPC’23: The 30th IEEE International Conference on High-Performance Computing, Data, and Analytics (Goa, India, 2023) [Paper] [Slides]
[HPDC'23] Avinash Maurya, M. Mustafa Rafique, Thierry Tonellot, Hussain J. AlSalem, Franck Cappello, Bogdan Nicolae. "GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching". HPDC'23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). [Paper][Slides]
[HiPC'22] Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Amr M. Elsayed, Thierry Tonellot, and Franck Cappello. "Towards Efficient Cache Allocation for High-Frequency Checkpointing". HiPC’22: The 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) AWARDED BEST PAPER! [Paper] [Slides]
[MASCOTS'21] Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Thierry Tonellot, and Franck Cappello. "Towards Efficient I/O Scheduling for Collaborative Multi-Level Checkpointing". MASCOTS’21: The 29th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (Virtual, Portugal, 2021) [Paper] [Slides]
[DS-RT'20] Avinash Maurya, Bogdan Nicolae, Ishan Guliani, and M. Mustafa Rafique. "CoSim: A Simulator for Co-Scheduling of Batch and On-Demand Jobs in HPC Datacenters". DS-RT’20: The 24th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications (Prague, Czech Republic, 2020) [Paper] [Slides] [Talk]

Journals

[TPDS'26] Avinash Maurya, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "DataStates-LLM: Scalable Checkpointing for Transformer Models Using Composable State Providers". TPDS'26: IEEE Transactions on Parallel and Distributed Systems [Paper].
[IJHPC'25] Franck Cappello, Sandeep Madireddy, Robert Underwood, Neil Getty, Nicholas Lee-Ping Chia, Nesar Ramachandra, Josh Nguyen, Murat Keceli, Tanwi Mallick, Zilinghan Li, Marieme Ngom, Chenhui Zhang, Angel Yanguas-Gil, Evan Antoniuk, Bhavya Kailkhura, Minyang Tian, Yufeng Du, Yuan-Sen Ting, Azton Wells, Bogdan Nicolae, Avinash Maurya, M Mustafa Rafique, Eliu Huerta, Bo Li, Ian Foster, Rick Stevens. "EAIRA: A Methodology for Evaluating AI Models as Scientific Research Assistants", IJHPC'25: International Journal of High Performance Computing Applications, 2025 [Paper]
[IJAR'18] Sarthak Langde, Avinash Maurya, Tanvi Nakhawa, Anurag Sinha, Smita Patil, Kriti Karanam, and Harshali Mugutrao. "Automated Attendance System.", IJAR'18: International Journal of Applied Research, vol. 4: 248‑257, October 2018 [Paper]

Workshops

[HPDC'26] Atkia Mahila, Avinash Maurya, M. Mustafa Rafique, Bogdan Nicolae. "Beyond Fixed Budgets: Characterizing the Inelasticity and Limitations of Tree-of-Thought Reasoning Strategies". FlexScience'26: The 16th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, colocated with HPDC'26, (Cleveland, OH, USA), 2026 [Paper].
[SCAsia'26] Mikaila Gossman, Avinash Maurya, Bogdan Nicolae, Jon C. Calhoun. "Understanding LLM Checkpoint/Restore I/O Strategies and Patterns". SCA/HPCAsia 2026 Workshops: Supercomputing Asia and International Conference on High Performance Computing in the Asia Pacific Region Workshops, (Osaka, Japan). [Paper]
[IPDPS'25] Moiz Arif, Avinash Maurya, Sudharshan Vazhkudai, and Bogdan Nicolae. "Evaluating Expansion Memory for Optimizer State Offloading for Large Transformer Models". HPAI4S'25 IPDPS-workshop: The first Workshop on HPC for AI Foundation Models and LLMs for Science, colocated with IPDPS'25, (Milano, Italy), 2025 BEST PAPER RUNNER-UP! [Paper][Slides].
[HPDC'24] Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers". FlexScience'24 workshop: The 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, colocated with HPDC'24 (Pisa, Italy, 2024). [Paper][Slides]
[HPDC'23] Moiz Arif, Avinash Maurya, and M. Mustafa Rafique. "Accelerating Performance of GPU-based Workloads using CXL". FlexScience'23: The 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, co-located with the 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). [Paper]
[HiPC'22] Avinash Maurya, Jaiaid Mobin, and M. Mustafa Rafique. 2022. "Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds". HiPCW'22: Workshop on Data Fabric for Hybrid Clouds (WDFHC) co-located with the 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) [Paper] [Slides]

Posters and Talks

[SC'23] Avinash Maurya, Robert Underwood, Bogdan Nicolae, M. Mustafa Rafique, Franck Cappello. "VELOC-LLM: Towards Efficient Asynchronous Checkpointing for Large-Language Models". SuperCheck@SC'23: Fourth International Symposium on Checkpointing for Supercomputing, (Colorado, USA, 2023) [Slides]
[HPDC'23] Jaiaid Mobin, Avinash Maurya, M. Mustafa Rafique. "COLTI: Towards Concurrent and Co-located DNN Training and Inference". HPDC'23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). AWARDED BEST POSTER! [Poster + Extended abstract]
[HiPC'22] Will Merges, Avinash Maurya, M. Mustafa Rafique. "Exploiting Lightweight OS Kernels for Emerging Datacenter Workloads". SRS HiPC'22: Student Research Symposium (SRS) co-located with the 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) [Poster + Extended abstract]
[SC'22] Avinash Maurya, M. Mustafa Rafique, Bogdan Nicolae. "Toward Efficient Checkpointing across Deep Tiers of Memory Hierarchy, Doctoral Showcase" SC'22: The International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, USA, 2022) [Poster + Presentation] [Slides]

Professional Service

Technical Program Committee

[ICPP'26] The 55th International Conference on Parallel Processing, Track: Softwares.
[SC'26] The International Conference for High Performance Computing, Networking, Storage, and Analysis. Track: Applications.
[FlexScience'26] The 16th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures.
[ICS'26] The 40th ACM International Conference on Supercomputing, 2026. External Reviewer.
[CCGrid'26] The 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing. Track: Systems for AI/ML and Distributed Intelligence, 2026
[IPDPS'26] The 40th IEEE International Parallel and Distributed Processing Symposium, Track: System Software, 2026.
[HiPC'25] The 32nd IEEE International Conference on High Performance Computing, Data, and Analytics, Track: AI/ML for Systems, Systems for AI/ML, 2025.
[RexIO'25] The Fifth Workshop on Re-envisioning Extreme-Scale I/O for Emerging Hybrid HPC Workloads Workshop Co-located with IEEE Cluster, 2025.
[CLUSTER'25] IEEE International Conference on Cluster Computing (CLUSTER), Track: Data, Storage, and Visualization, 2025.
[CCGrid'25] The 26th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid), Track: Machine Learning (ML) for Systems and Systems for ML, 2025.
[TC'25-] IEEE Transactions on Computers (TC), 2025-Present.
[TACO'25-] IEEE Transactions on Architecture and Code Optimization (TACO), 2025-Present
[TPDS'24-] IEEE Transactions on Parallel and Distributed Systems (TPDS), 2024-Present.
[SC'24-25] The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2024-2025, External Reviewer.

Organizing Committee

[HPAI4S'25-26] HPC for AI Foundation Models & LLMs for Science (HPAI4S) Workshop Co-located with IPDPS, 2025-2026.

Volunteering

[ATPESC'26] Argonne Training Program on Extreme-Scale Computing, 2026.
[SC'22-23] The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2022-2023, Student Volunteer.

Contact Details

Avinash Maurya
Postdoctoral Appointee
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Avenue, Lemont, 60439, IL, USA

amaurya [AT] anl [DOT] gov