Description
At Shutterfly, we make life’s experiences unforgettable. We believe there is extraordinary power in the self-expression. That’s why our family of brands helps customers create products and capture moments that reflect who they uniquely are.
We are initiating a comprehensive consumer website re-platforming effort, with the Senior Manager of Site Reliability Engineering (SRE) being pivotal in establishing the new shared infrastructure.
This Senior Manager SRE role is ultimately responsible for system reliability, developer productivity and reducing down-time and supporting time to market by striving to reduce technical debt of the services that are supported by the SRE team. As a key figure in the centralized infrastructure team, the Senior Manager of SRE enables engineering activities and efficiencies within the AWS cloud. This leadership role demands innovation in a sophisticated cloud ecosystem, adherence to sustained compliance, and support for a high-traffic, visible platform with substantial data volumes, third-party integrations, and intricate scalability and performance considerations.
In addition to infrastructure development, the role emphasizes Site Reliability Engineering, engaging with workload teams to develop Infrastructure as Code (IAC) and troubleshoot application issues. Monitoring and observability are strategic priorities, with contributions to enhancing capabilities using tools like Terraform, Packer, Splunk, SignalFx, and other observability/IAC tools.
The Senior Manager of SRE leads a highly skilled engineering team, fostering collaboration and innovation. Managing and mentoring this team requires inspiring and guiding engineers in complex problem-solving, offering technical leadership and support. The ideal candidate will have a proven track record of successfully managing highly skilled engineering teams, demonstrating strong communication and leadership skills at both team and executive levels. Cultivating a culture of continuous learning, collaboration, and excellence is paramount in this role.
What You Will Do Here:
- Cross Functional Engagement:
- Embed with application/workload teams to manage and execute infrastructure related priorities
- Partner closely with other infrastructure organizations to contribute to the enterprise standards/best practices
- Cloud Infrastructure Management:
- Lead and manage the Site Reliability Engineering team with a focus on AWS cloud services.
- Oversee the design, implementation, and optimization of scalable and reliable infrastructure solutions in the cloud.
- Observability and Monitoring:
- Develop and implement robust monitoring, observability and diagnostic strategies.
- Leverage observability tools such as Splunk, CloudWatch, SignalFx, etc., to enhance system reliability and performance.
- Team Management:
- Build and lead a high-performing and distributed SRE team, fostering a culture of collaboration, innovation, and continuous improvement.
- Mentor and develop team members, providing guidance on technical skills and career growth.
- Continuous Improvement:
- Stay abreast of industry trends and emerging technologies, applying relevant advancements to enhance our systems and processes.
- Drive a culture of continuous improvement and learning within the SRE team.
The Skills You Will Bring:
- Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
- Proven experience in managing and leading Site Reliability Engineering or DevOps teams.
- Extensive hands-on experience with AWS cloud services and infrastructure.
- Strong background in implementing and optimizing monitoring and observability solutions using tools such as Splunk, Datadog, SignalFx, etc.
- Excellent communication and interpersonal skills.
- Highly motivated, curious, and committed to driving excellence within the team.
- Extensive experience managing and interacting with personnel across all time zones.
- Natural curiosity about modern technology and continually re-tooling to adapt to innovation.
Supporting a diverse and inclusive workforce is important to Shutterfly not only because it directly reflects our value of Embracing our Differences, but also because it’s the right thing to do for our business and for our people. Learn more about our commitment to Diversity, Equity and Inclusion at Shutterfly DE&I.
#SFLYTechnology