Company Description
At Western Digital, our vision is to power global innovation and push the boundaries of technology to make what you thought was once impossible, possible.
At our core, Western Digital is a company of problem solvers. People achieve extraordinary things given the right technology. For decades, we’ve been doing just that. Our technology helped people put a man on the moon.
We are a key partner to some of the largest and highest growth organizations in the world. From energizing the most competitive gaming platforms, to enabling systems to make cities safer and cars smarter and more connected, to powering the data centers behind many of the world’s biggest companies and public cloud, Western Digital is fueling a brighter, smarter future.
Binge-watch any shows, use social media or shop online lately? You’ll find Western Digital supporting the storage infrastructure behind many of these platforms. And, that flash memory card that captures and preserves your most precious moments? That’s us, too.
We offer an expansive portfolio of technologies, storage devices and platforms for business and consumers alike. Our data-centric solutions are comprised of the Western Digital®, G-Technology™, SanDisk® and WD® brands.
Today’s exceptional challenges require your unique skills. It’s You & Western Digital. Together, we’re the next BIG thing in data.
Job Description
Western Digital’s High-Performance Computing environments are key to bringing new storage solutions to market. As a Senior High-Performance Computing (HPC) engineer in the IT Infrastructure team, you will be at the heart of Western Digital’s engineering and product development process, delivering the IT HPC infrastructure and services that empowers engineering teams to develop new storage technologies and deliver high quality products to market quickly.
As a member of the HPC as a service team – HPCaaS, you will be responsible for establishing and executing strategic objectives focused on improving the effective utilization of the compute resources while meeting or exceeding customer service level agreements for job prioritization, job concurrency, and job throughput in our EDA compute clusters. This includes leading architectural innovation and path finding efforts to create and implement Western Digital’s next generation Grid computing environment. As a member of the team, you will be expected to not only deliver on technical requirements and solutions but also be able to present your solutions to senior management. Responsibilities include but are not limited to working as an individual contributor, a team member and a technical team lead to explore, define, and pilot new solutions with little supervision. Develop solutions, scripts, and/or processes to automate management of services and tools as required. In this role, you will be collaborating closely with EDA and hardware design team stakeholders to define and deliver workload efficiency improvements in Western Digital’s EDA HPC infrastructure globally.
What you’ll be doing:
- Support multi-site, high-performance compute infrastructure and services for the global engineering product development organizations
- Design, create, deliver, and support the deployment of Ansible automation within HPC and Unix environments
- Identify and propose solutions and new services for the distributed ASIC and GPU computing clusters
- Perform troubleshooting and root cause analysis of HPC clusters and file system related issues
- Develop and maintain documentation for all aspects of the HPC infrastructure
- Improve root cause analysis and corrective action for problems large and small – identify patterns and propose how we can automate repetitive tasks
- Recommend and implement solutions to improve the performance of workloads
- Support diverse Engineering Design Automation environment
Qualifications
- Bachelor’s degree in computer science or equivalent experience
- 10+ years of Linux systems administration experience specifically in managing or supporting RedHat and/or Centos Linux in production environments
- Experience with configuration management tools: Ansible, Puppet, Chef
- Experience with automation tools like Terraform or any other orchestration tools.
- Ability to technically lead a project through the lifecycle
- Scripting skills: highly skilled in at least two typical scripting languages (shell/bash, python, ruby)
- Excellent problem-solving, multitasking, troubleshooting skills, and attention to detail are required to work in this challenging and dynamic environment
- Very strong interpersonal, customer service, result-oriented, and team-building skills
Additional Information
Western Digital thrives on the power and potential of diversity. As a global company, we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We believe the fusion of various perspectives results in the best outcomes for our employees, our company, our customers, and the world around us. We are committed to an inclusive environment where every individual can thrive through a sense of belonging, respect and contribution.
Western Digital is committed to offering opportunities to applicants with disabilities and ensuring all candidates can successfully navigate our careers website and our hiring process. Please contact us at staffingsupport@wdc.com to advise us of your accommodation request. In your email, please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying.