Distributed Systems
Work description
Responsibilities under the grant: - Design a system for managing the energy consumption of GPUs used in deep learning within distributed environments. - Implement and optimize a prototype based on the initial design. - Conduct experimental evaluations of the developed prototype, using a variety of deep learning models and hardware devices (e.g., various processing and storage devices). The tasks described in this work plan require the application and development of concepts and techniques from Computer Engineering, which are typically addressed in the core curriculum of the Integrated Master's Degree in Computer Engineering or the Master's Degree in Computer Engineering.
Academic Qualifications
BSc Degree in Informatics Engineering Sciences.
Minimum profile required
Knowledge with energy monitoring and energy control systems (i.e., Intel RAPL, PowerJoular, EnergAt, NVML, DVFS);Knowledge on deep learning frameworks and models (i.e., PyTorch, ResNet18, AlexNet, Cifar-10), as well as heterogenous workloads (e.g., cloud-based workloads, supercomputing workloads);Solid knowledge on operating systems;Solid knowledge on distributed systems.
Preference factors
- Experience in the design and development on energy control systems for GPUs; - Solid knowledge in the state-of-the-art of energy control systems for deep learning; - Experience with the C++ programming language.
Application Period
Since 19 Dec 2024 to 03 Jan 2025
Centre
High-Assurance Software