HPC Resource Management & Security Analysis
Role: Systems Scientist | Date: 2026-03-16
Supported by: Department of Energy
Linux Kernel CUDA Slurm Security
Abstract
In multi-tenant High-Performance Computing (HPC) environments, resource isolation is critical for security. This research investigates vulnerabilities and builds system-level mitigations.
Methodology
- VRAM Auditing: Investigating NVIDIA VRAM memory leaks to mitigate cross-job data persistence risks.
- Slurm Hooks: Developing Slurm prolog and epilog hooks to automatically sanitize RAM, Disk, and GPU resources between jobs.
- Container Profiling: Profiling Apptainer container resource usage using Linux kernel tracing tools to enforce stricter boundaries.