Summary
Overview
Work History
Education
Skills
Websites
Certification
Accomplishments
Training And Workshops Delivered
Relevant Experience For Cloud And Ai Infrastructure
Publications And Technical Contributions
References
Technical Skills
Key Areas Of Expertise
Languages
Timeline
Generic

Mohsin Ahmed Shaikh

Thuwal

Summary

  • Computational Scientist with
  • 16 years of experience
  • In high-performance computing, distributed systems, and AI/ML workload deployment. Proven expertise in designing and implementing computing solutions on supercomputing infrastructure, with hands-on experience in Kubernetes, container orchestration, and performance optimization. Strong background in technical consulting, user training, and infrastructure evaluation for research environments.

Overview

21
21
years of professional experience
1
1
Certification

Work History

Computational Scientist

KAUST Supercomputing Lab, King Abdullah University of Science & Technology
08.2019 - Current
  • Developed and deployed Kubernetes and Kubeflow platform for multi-user AI/ML environment
  • Support distributed ML/DL workloads on heterogeneous cluster with over 3000 CPUs and 600 GPUs
  • Design flexible computing solutions to improve system utilization and researcher productivity
  • Consult with researchers on computational science problems and efficient use of supercomputing resources
  • Lead development and support of distributed workloads in data science and ML/DL
  • Work with HPE Cray EX system with 4609 AMD Genoa nodes for computational research
  • Participate in procurement team for next-generation architectures
  • Prepare data science application benchmarks for system evaluation
  • Contribute to acceptance and commissioning of computational solutions

Supercomputing Application Specialist

Pawsey Supercomputing Centre, Western Australia
09.2014 - 08.2019
  • Debugged, profiled, and tuned applications using Allinea Tools, CrayPAT, OpenSpeedShop, Score-P, and TAU
  • Supported users on 1488-node CrayXC 40 and heterogeneous CrayXC 30 with 472 CPU nodes and 64 GPU nodes - Constructed efficient and reproducible HPC workflows for computational research
  • Built software stack on Cray and SGI systems
  • Worked with operations team to design health check suites and generate periodic reports
  • Maintained user support documentation and automated monitoring systems

Postdoctoral Researcher

UC Bluefern, University of Canterbury, New Zealand
10.2011 - 12.2014
  • Conducted computational research using IBM Blue Gene L and Blue Gene P supercomputers
  • Developed scalable parallel programs for vascular physiology simulations
  • Assisted in supervision of postgraduate students

Supercomputing Services Consultant

UC Bluefern, University of Canterbury, New Zealand
01.2014 - 08.2014
  • Provided support and consulting on computational workflows and parallelization
  • Assisted users in optimizing their work for New Zealand eScience Infrastructure
  • Applied PhD experience in parallel computing to user support activities

Assistant Professor

Department of Biomedical Engineering, Mehran University of Engineering & Technology
08.2004 - 01.2007
  • Delivered undergraduate lectures in biomedical engineering subjects
  • Taught biological transport phenomena, biomaterials, and bioinstrumentation
  • Developed course materials for artificial organs and prostheses

Education

Ph.D. - Bioengineering

University of Canterbury
New Zealand
01.2012

M.Sc. - Bioengineering

University of Strathclyde
Glasgow, UK
01.2005

B.S. - Biomedical Engineering

Sir Syed University of Engineering & Technology
Pakistan
01.2003

Skills

  • Kubernetes
  • Kubeflow
  • Docker, Podman, Singularity
  • AWS Parallel Cluster
  • Software stack building & optimization
  • PyTorch
  • Horovod, MS DeepSpeed, Ray
  • High-performance data processing
  • Python, C/C, Bash scripting
  • MPI, OpenMP, OpenACC

Certification

  • NVIDIA DLI Certificate, Instructor Materials - Fundamentals of Accelerated Computing with CUDA Python
  • DeepLearning.AI, Generative AI with Large Language Models (Coursera)

Accomplishments

  • Successfully deployed Kubernetes/Kubeflow platform at KAUST for AI research community
  • Contributed to procurement and evaluation of next-generation HPC systems
  • Delivered technical training workshops to researchers and students on container orchestration and AI/ML deployment
  • Optimized GPU-accelerated workloads achieving significant performance improvements
  • Maintained and optimized large-scale computing infrastructure supporting diverse research workloads

Training And Workshops Delivered

  • Container Orchestration & AI/ML, Kubeflow 101 for Data Scientists: Platform deployment and usage, Containers on HPC Platforms: Docker, Singularity, and Kubernetes implementation, Distributed ML/DL on Supercomputers: Scaling machine learning workloads, Scaling Hyperparameter Tuning: Optimization techniques for large-scale AI
  • Performance Optimization & DevOps, Profiling and Debugging: ARM Forge and Cray tools, SLURM for Complex Workflows: Job scheduling and resource management, HPC with Python: Advanced computational techniques, MPI and OpenMP: Parallel programming fundamentals

Relevant Experience For Cloud And Ai Infrastructure

  • Container Orchestration: Hands-on experience with Kubernetes and Kubeflow deployment in multi-user research environment, directly applicable to enterprise AI infrastructure.
  • GPU-Accelerated AI Workloads: Extensive experience optimizing AI/ML workloads on GPU clusters, including distributed training and performance tuning.
  • Technical Consulting: Proven track record in consulting with users on computational problems and infrastructure optimization, transferable to enterprise client engagement.
  • Solution Architecture: Experience in evaluating and procuring computing systems, with understanding of architectural requirements for AI/ML workloads.
  • DevOps Practices: Applied containerization, automation, and monitoring in production HPC environments.

Publications And Technical Contributions

Towards an HPC Service Oriented Hybrid Cloud Architecture Designed for Interactive Workflows, SC20 UrgentHPC Workshop, 2020, Evaluation of next-generation high-order compressible fluid dynamic solver on cloud computing for complex industrial flows, Array, 17, 03, 2023, Kubeflow-as-a-Service on HPC clusters - First Experiences, CANOPIE-HPC Workshop, 2023, Parallelization of a distributed ecohydrological model, Environmental Modelling & Software, 101, 03, 2018, Macro-scale phenomena of arterial coupled cells: a massively parallel simulation, Journal of The Royal Society Interface, 2011, Multiple conference proceedings on parallel computing and distributed systems

References

Available upon request

Technical Skills

Kubernetes, Kubeflow, Docker, Podman, Singularity, AWS Parallel Cluster administration and management, SLURM for complex workflows and resource management, Software stack building, application optimization, PyTorch, Horovod, MS DeepSpeed, Ray, NVIDIA RAPIDS, CUDA programming, Profiling and debugging with ARM Forge, Cray tools, DASK, high-performance data processing, Cray systems (XC40, XC30), HPE Cray EX, IBM Blue Gene, Python, C/C++, Bash scripting, MPI, OpenMP, OpenACC, CUDA, High-performance interconnects, cluster networking

Key Areas Of Expertise

  • Container Orchestration (Kubernetes, Kubeflow) & AI/ML Workload Optimization
  • GPU-Accelerated Computing & Performance Optimization
  • Technical Consulting & Solution Architecture
  • HPC Infrastructure Design & Procurement
  • DevOps & Training Delivery

Languages

English
Proficient
C2

Timeline

Computational Scientist

KAUST Supercomputing Lab, King Abdullah University of Science & Technology
08.2019 - Current

Supercomputing Application Specialist

Pawsey Supercomputing Centre, Western Australia
09.2014 - 08.2019

Supercomputing Services Consultant

UC Bluefern, University of Canterbury, New Zealand
01.2014 - 08.2014

Postdoctoral Researcher

UC Bluefern, University of Canterbury, New Zealand
10.2011 - 12.2014

Assistant Professor

Department of Biomedical Engineering, Mehran University of Engineering & Technology
08.2004 - 01.2007

Ph.D. - Bioengineering

University of Canterbury

M.Sc. - Bioengineering

University of Strathclyde

B.S. - Biomedical Engineering

Sir Syed University of Engineering & Technology
Mohsin Ahmed Shaikh