Loading…
6-7 August
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon India 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in India Standard Time (UTC+5:30)To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Thursday August 7, 2025 2:10pm - 2:40pm IST
Modern AI workloads rely on large GPU fleets whose efficient utilisation is crucial due to high costs. However, gathering telemetry from these workloads to optimise performance is challenging because it requires manual instrumentation and adds performance overheads. Further, it does not produce telemetry in a standardised format for commonly used visualisation tools like Prometheus.

This talk explores the potential of leveraging eBPF to capture CUDA calls made to GPUs, including kernel launches and memory allocations. Data from these probes can be used to export Prometheus metrics, facilitating detailed analysis of kernel launch patterns and associated memory usage. This approach offers significant benefits as eBPF imposes minimal overhead and requires no intrusive instrumentation. Our implementation is also open-source and available on GitHub.
Speakers
avatar for Marc Tudurí

Marc Tudurí

Staff Engineer, Grafana Labs
Marc Tuduri is Prometheus contributor, OpenTelemetry member and Software Engineer at Grafana.
Thursday August 7, 2025 2:10pm - 2:40pm IST
Hall 3
  AI + ML

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link