Our consortium member IT Center is organizing an HPC workshop from December 9th till 11th, 2024
It will include "Machine Learning (ML) on NVIDIA GPUs" with "High Performance Computing"reference
Machine Learning on HPC clusters with NVIDIA Chips
The annual event aiXcelerate at the RWTH Aachen (NHR4CES@RWTH) is a Tuning Workshop for HPC users. It consists of lectures, that are accessible for everybody accessible and practical Parts, in which (invited) groups of the learned concepts can apply to your own code. This year aiXcelerate treated the topic of "Machine Learning (ML) on NVIDIA GPUs" and focuses on the Use of the GPUs of the RWTH HPC-cluster "CLAIX" with Frameworks like PyTorch or tensor flow. The Workshop provides insights into the Performance analysis and Performance Tuning of ML code.
It is not an introduction to Machine Learning!
The lectures are distributed to the morning sessions of the three days of aiXcelerate. The theme of the first day is "Analyzing the Performance of ML code" and includes the use of the automatically running RWTH Performance Monitoring system, as well as the use of NVIDIA's Nsight Tool and how to find bottlenecks. The second day is focused on "scaling of ML code on multiple GPUs/node". Here approaches with PyTorch (Distributed) and tensor flow + Horovod are presented, in order to accelerate ML of Codes by the parallel use of more Hardware. On the third day, we will be working with the "dealing with data sets of ML code". We will present different options to the storage and use of ML data at runtime (on CLAIX). In addition, the use of Check-Pointing in ML is presented Codes.
aiXcelerate is carried out with the support of NVIDIA. The Catering is sponsored by the NEC and NVIDIA.
When: From 09. – 11. December 2024, from 9:00 – 17:30
Where: seminar room 003/004, IT Center, Kopernikusstraße 6, 52074 Aachen, Germany
Online: over Zoom/Webex
The event is free of charge!
Please register here.
For more information have a look on the website of IT center.
Agenda
Day 1: “Analyzing the Performance of ML code”
Day 2: “Scaling ML of code across Multiple GPUs/ Node”
Day 3: “Handling Datasets of ML code”
Requirements:
- Beginners to advanced in the use and development of ML
- Basic knowledge of GPU Hardware architectures
- Knowledge of machine-learning models (which are important for you)
- Knowledge about how these models can be used, for example, with PyTorch or tensor flow
- Basic knowledge of concurrency
- The event will be in English!