I am a postdoctoral appointee at the Mathematics and Computer Science Division at Argonne National Laboratory, USA. My research focuses on optimizing large-scale data management across heterogeneous memory tiers and accelerating I/O performance for both classical HPC simulations and AI applications. Leveraging advanced asynchronous data movement and memory management strategies, I develop solutions for efficient hybrid CPU-GPU computations, as well as optimized loading, storing, caching, prefetching, and lazy flushing of GPU-resident distributed data structures—ensuring high-frequency, concurrent access to large data volumes.
Previously, I was a graduate student at the High-Performance and Distributed Systems Lab (HPDSL) in the Department of Computer Science at Rochester Institute of Technology (RIT), advised by Prof. M. Mustafa Rafique and Prof. Bogdan Nicolae (ANL). Before joining RIT, I worked as a full-stack developer for two years, designing and developing applications that scaled across hundreds of nodes, serving 100M+ users. I earned my undergraduate degree in Computer Engineering from RAIT, University of Mumbai, India, in 2017.
Publications
Conferences
Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading". Middleware'24: The 25th International Middleware Conference (Hong Kong, 2024). [Paper][Slides][Code]
Avinash Maurya, Robert Underwood, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models". HPDC'24: The 33rd International Symposium on High-Performance Parallel and Distributed Computing (Pisa, Italy, 2024). AWARDED BEST PAPER! [Paper][Slides][Code]
Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, and Ali R. Butt. "Application-Attuned Memory Management for Containerized HPC Workflows". IPDPS'24: The 38th IEEE International Parallel & Distributed Processing Symposium (San Francisco, USA) [Paper][Slides]
Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Franck Cappello. "Towards Efficient I/O Pipelines using Accumulated Compression". HiPC’23: The 30th IEEE International Conference on High-Performance Computing, Data, and Analytics (Goa, India, 2023) [Paper] [Slides]
Avinash Maurya, M. Mustafa Rafique, Thierry Tonellot, Hussain J. AlSalem, Franck Cappello, Bogdan Nicolae. "GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching". HPDC'23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). [Paper][Slides]
Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Amr M. Elsayed, Thierry Tonellot, Franck Cappello. "Towards Efficient Cache Allocation for High-Frequency Checkpointing". HiPC’22: The 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) AWARDED BEST PAPER! [Paper] [Slides]
Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Thierry Tonellot, Franck Cappello. "Towards Efficient I/O Scheduling for Collaborative Multi-Level Checkpointing". MASCOTS’21: The 29th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (Virtual, Portugal, 2021) [Paper] [Slides]
Avinash Maurya, Bogdan Nicolae, Ishan Guliani, and M. Mustafa Rafique. "CoSim: A Simulator for Co-Scheduling of Batch and On-Demand Jobs in HPC Datacenters". DS-RT’20: The 24th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications (Prague, Czech Republic, 2020) [Paper] [Slides] [Talk]
Workshops
Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. "Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers". FlexScience'24 HPDC-workshop: The 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, colocated with HPDC'24(Pisa, Italy, 2024). [Paper][Slides]
Moiz Arif, Avinash Maurya, and M. Mustafa Rafique. "Accelerating Performance of GPU-based Workloads using CXL". FlexScience'23: The 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, co-located with the 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). [Paper][Slides]
Avinash Maurya, Jaiaid Mobin, and M. Mustafa Rafique. 2022. "Towards Data Gravity and Compliance Aware Distributed Deep Learning on Hybrid Clouds". HiPCW'22: Workshop on Data Fabric for Hybrid Clouds (WDFHC) co-located with the 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) [Paper] [Slides]
Posters and Talks
Avinash Maurya, Robert Underwood, Bogdan Nicolae, M. Mustafa Rafique, Franck Cappello. "VELOC-LLM: Towards Efficient Asynchronous Checkpointing for Large-Language Models". SuperCheck@SC'23: Fourth International Symposium on Checkpointing for Supercomputing, (Colorado, USA, 2023) [Slides]
Jaiaid Mobin, Avinash Maurya, M. Mustafa Rafique. "COLTI: Towards Concurrent and Co-located DNN Training and Inference". HPDC'23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, Florida, United States, 2023). AWARDED BEST POSTER! [Poster + Extended abstract]
Will Merges, Avinash Maurya, M. Mustafa Rafique. "Exploiting Lightweight OS Kernels for Emerging Datacenter Workloads". SRS HiPC'22: Student Research Symposium (SRS) co-located with the 29th IEEE International Conference on High-Performance Computing, Data, and Analytics (Bangalore, India, 2022) [Poster + Extended abstract]
Avinash Maurya, M. Mustafa Rafique, Bogdan Nicolae. "Toward Efficient Checkpointing across Deep Tiers of Memory Hierarchy, Doctoral Showcase" SC'22: The International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, USA, 2022) [Poster + Presentation] [Slides]