Job Title: Senior Site Reliability Engineer (SRE)
The Elevator Pitch
Evolv is looking for a top-notch Senior Site Reliability Engineer (SRE) to be a technical leader and contributor in our Cloud Architecture and Engineering team. In this role you will help increase productivity for developers, increase system uptime, and design the future framework for Evolv to scale and thrive.
As the Senior SRE you will be a key stakeholder in automation, delivery, and compliance of our software in our secure and regulated environments.
Success in the Role: What are performance outcomes over the first 6-12 months you will work toward completing?
- In the first 30 days, you will:
- Participate in “EvolvED”, a week-long orientation program for all new employees
- Meet leadership and various teams, sit in on meetings, learn and understand relevant architecture and process’ in use
- Become comfortable and familiar with our internal systems, processes, and people
- Become a contributor to our DevSecOps vision, architecture, and infrastructure designs
- Within 3 months, you will:
- Lead and implement our resiliency and scale plans aligning to products roadmap
- Lead and implement security controls in regulated environments
- Implement new secure and regulated environments
- Identify areas for improvement removing bottlenecks and reducing friction
- Work on cross functional teams and projects delivering best in class solutions
- By the end of the first year, you will:
- Enable and deliver scalable and automated strategies – Infrastructure as Code
- Work with engineering teams to provide automation and tooling to deploy and manage applications
- Ensure availability, performance, security, and scalability of our AWS environments
- Maintain and deploy systems for metrics, logging, and monitoring of AWS environments
- Automate compliance reporting for SOC2 and FedRamp environments
- Troubleshoot and resolve problems across various application domains and platforms
The Work: What type of work will you be doing? What assignments, requirements, or skills will you be performing on a regular basis?
- Performance Monitoring and Troubleshooting:
- Manage monitoring and Alerting systems for cloud infrastructure, monitoring performance metrics, identify bottlenecks, and optimizations ensuring optimal service delivery
- Support optimization efforts to ensure cost-effectiveness, performance, and reliability of cloud infrastructure (AWS - 90% serverless IoT, Lambda, Docker/Kubernetes)
- Develop strategies and implement resiliency and infrastructure scale plans to accommodate growth and maintain optimal performance.
- Take charge of critical incidents, coordinating with cross-functional teams to resolve issues swiftly and minimize downtime.
- Lead the on-call rotation providing 24/7 support for production systems and report on incidents in a timely manner.
- Develop and maintain comprehensive disaster recovery plans to ensure business continuity in the event of system failures or disasters.
- DevSecOps and Automation:
- Knowledge of DevSecOps principles and practices.
- Experience in infrastructure automation and Infrastructure as Code (IaC) tools and practices (Terraform, BitBucket, CircleCI)
- Ability to design and implement automation and tooling to deploy and manage applications efficiently.
- Work with DevSecOps team to implement automation solutions for provisioning, configuration, and scaling of infrastructure components to enhance reliability and efficiency.
- Security and Compliance:
- Implement and enforce security measures to safeguard cloud infrastructure and data.
- Ensure compliance with industry regulations and internal security policies.
- Strong understanding of cloud security measures and best practices.
- Experience in implementing security controls in regulated environments.
- Familiarity with industry regulations such as PCI, GovCloud, or FedRamp.
- Collaborate with security teams to implement best practices for securing infrastructure components and data, including vulnerability assessments and patch management.
- Leadership and Collaboration:
- Strong leadership skills with the ability to provide technical guidance and mentorship.
- Ability to collaborate effectively with cross-functional teams, including software developers, system engineers, and security experts.
- Drive initiatives to enhance reliability, scalability, and performance through process improvements, tooling enhancements, and architectural optimizations.
What is the leadership like for this role? What is the structure and culture of the team?
This role reports to our Director of Cloud Engineering in our R&D organization.
The team culture is one based on building trust, collaboration, on-going development through radical candor, kindness, authenticity, courage, drive, and fun!
What is the salary range?
The base salary range for this full-time position is $ $119,000-$165,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. Please note that the compensation details listed in role posting reflect the base salary only, and do not include bonus, equity, or benefits.
Where is the role located?
This role is based out of the HQ in Waltham, Massachusetts with flexibility for remote work. Full remote work is possible for the right candidate.