Senior Deep Learning Performance Engineer

NVIDIA • Full-time • US, US • 9m ago

We are now looking for a Senior Deep Learning Performance Engineer!

NVIDIA is seeking highly skilled and dedicated engineers who are passionate about optimizing AI workloads, particularly Generative AI and Large Language Model (LLM) training. This role requires working across all levels of the hardware and software stack, including GPU architecture, compilers, kernels, and Deep Learning frameworks such as JAX and PyTorch. You'll have the opportunity to work on cutting-edge technology that will accelerate training performance for deep learning users all over the world. Are you ready to take on challenging problems and collaborate with innovative teams to engineer systems for high-performance Deep Learning? This could be the role for you!

What You Will Be Doing:

Understand, analyze, profile, and optimize large language model training on state-of-the-art hardware and software platforms.
Understand the big picture of LLM training performance on GPUs, prioritizing and then solving problems across all state-of-the-art LLM variations from research to industry.
Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
Implement and simulate key LLM workload behaviors in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
Build tools to automate workload analysis, workload optimization, and other critical workflows.

What We Need To See:

PhD (or equivalent experience) in CS, EE or CSEE and 5+ years; or MS and 8+ years of relevant work experience.
Strong background in deep learning and neural networks, in particular training and large language models.
Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Expertise in analyzing and tuning application performance, preferably on GPUs.
Familiarity with common deep learning software packages like PyTorch and JAX.
Prior experience with processor and system-level performance modelling.
Programming skills in C++, Python, and CUDA.

Intelligent machines powered by AI computers that can learn, reason and interact with people are no longer science fiction. Today, a self-driving car powered by artificial intelligence can meander through a country road at night and find its way. An AI-powered robot can learn motor skills through trial and error. This is truly an extraordinary time. The era of AI has begun, and we are powering it. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. Are you passionate about performance? Are you interested in working on industry-leading Deep Learning products? Come, join our Deep Learning Architecture team and help build real-time, cost-effective computing platforms driving our success in this exciting and rapidly growing field.

The base salary range is 176,000 USD - 333,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.