At Weights & Biases, our mission is to build the best developer tools for machine learning. Weights & Biases is a series C company with $250 million in funding and a rapidly growing user base. Our platform is an essential piece of the daily work for machine learning engineers, from academic research institutions like FAIR and UC Berkeley to massive enterprise teams including iRobot, OpenAI, Toyota Research Institute, Samsung, NVIDIA, Salesforce, Blue Cross Blue Shield, Lyft, and more.
The Site Reliability teams at Weights & Biases are responsible for ensuring that our high-volume, low-latency environments continue to perform around the clock. These teams collaborate closely with our product engineers to ensure that Weights & Biases can manage millions of requests, ensuring our customers always have dependable and actionable data at their fingertips. You’ll be responsible for shaping the infrastructure of our data-intensive, real-time services as we continue to grow at petabyte scale.
Responsibilities
- Keep our services reliable, available, fast and cost-efficient
- Respond to, investigate and fix service issues, whether they are deep in the OS kernel or in the application code
- Build tools and production frameworks to make our engineering team’s lives easier
- Design, build and maintain the infrastructure we need to support orders of magnitude more customers
Requirements
- Experience managing, monitoring, and debugging large-scale distributed systems in production
- In-depth knowledge of at least one cloud provider (AWS, GCP, Azure, VMWare, etc)
- Strong grasp of at least one higher-level language and its ecosystem (Go, Python, TypeScript, etc.)
- Deep understanding of IaC concepts and tools (Terraform, SaltStack, Ansible, etc)
- Experience with containerized deployment systems such as Kubernetes
- Familiarity with CI/CD tools (Jenkins, GitHub Actions, FluxCD, Argo, etc)
- Experience with monitoring and scaling production SQL databases (MySQL / PostgreSQL preferred)
Our Benefits
- 🏝️ Flexible time off
- 🩺 Medical, Dental, and Vision for employees and Family Coverage
- 🏠 Remote first culture with in-office flexibility in San Francisco
- 💵 Home office budget with a new high-powered laptop
- 🥇 Truly competitive salary and equity
- 🚼 12 weeks of Parental leave (U.S. specific)
- 📈 401(k) (U.S. specific)
- Supplemental benefits may be available depending on your location
- Explore benefits by country
We encourage you to apply even if your experience doesn't perfectly align with the job description as we seek out diverse and creative perspectives. Team members who love to learn and collaborate in an inclusive environment will flourish with us. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. If you need additional accommodations to feel comfortable during your interview process, reach out at careers@wandb.com.
#LI-Remote