Site Reliability Engineer BugSnag

SmartBear • Full-time • UK, UK • 10m ago

At SmartBear, we deliver the complete visibility developers need to make each release better than the last. Our award winning and industry favorite toolsTestComplete, Swagger, Cucumber, ReadyAPI, Zephyrare trusted by over 16 million developers, testers, and software engineers at 32,000+ organizations – including world-renowned innovators like Adobe, JetBlue, FedEx, and Microsoft.

Site Reliability Engineer- BugSnag

You will build and maintain key infrastructure that is observable, stable, and performant.
You will have the opportunity to work with the latest industry-leading technologies, architectures, languages, data storage, and messaging frameworks; as well as being empowered to explore and contribute your own ideas.
You will work with a modern, highly scalable microservices architecture that delivers Application error monitoring, Real User Monitoring (RUM), and observability solutions.

Product Intro

Bugsnag is the trusted software stability "command center" for over 5,000 engineering teams worldwide, including Airbnb, Slack, Pinterest, Lyft, Docker, and Pandora. We process over 1 billion crash reports daily from 85,000 applications, and empower our customers to make data-driven decisions on when to focus on building new features, or when to fix bugs.

Go to our product page if you want to know more about Bugsnag.

You can even have a free trial to check it out 😊

About the role

We're looking for a Site Reliability Engineer to join the BugSnag Infrastructure team. You will be working alongside a small, talented team based predominantly in Bath, UK. Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. We use our engineering skills and knowledge to build tools and processes to keep the BugSnag product online 24/7, making our systems self-heal wherever possible to limit the impact of being on-call.

Ensure that all our microservices, databases and other key infrastructure are observable, introducing monitoring in the right places to help us have visibility into what is happening in our production systems.
Influence the design of new and existing microservices to ensure they will be observable, stable and performant.
Maintain our databases and help to keep them performant by making changes to their configuration and deployment, as well as looking into the system's usage of them.
Automate tasks to keep the BugSnag systems resilient with minimal manual intervention.
Validate our infrastructure to ensure it is highly available and recoverable.
Work with our security team to implement best practice security measures to protect our infrastructure and customer data.
Make sure that our On-Premise product is as reliable as our SaaS product.
Review the work of others in the team to a high standard to reduce the risk of production issues.
Regularly be on-call to deal with production issues and keep BugSnag running 24/7.

We are looking for you if you have:

Substantial engineering experience.
Professional experience developing software.
Experience working with and maintaining a Linux or other *nix flavour system.
Hands on experience with one of the major cloud platforms: GCP, AWS or Azure.
Professional experience querying, deploying, and maintaining databases/datastores (we use MongoDB, Redis, Elasticsearch, ClickHouse and Gluster).

You are:

Quick to learn new skills, and can readily apply existing skills and knowledge to solve new, complex problems.
Able to take ownership of all stages of a project from architecture/design through implementation to delivery.
Able to effectively balance idealism with pragmatism when assessing the direction a project should take.
Willing to go the extra mile to make other developers more efficient.

You may also have:

Experience with Docker/Kubernetes.
Experience with Terraform/Packer/Vagrant.
Experience with Chef/Puppet/Ansible.
Experience with MongoDB/Redis/Elasticsearch/ClickHouse.
Experience with Gluster.
Experience with RabbitMQ/Kafka.
Experience with microservices in production.

Why you should join the SmartBear crew:

You can grow your career at every level.
We invest in your success as well as the spaces where our teams come together to work, collaborate, and have fun.
We love celebrating our SmartBears; we even encourage our crew to take their birthdays off.
We are guided by a People and Culture organization - an important distinction for us. We think about our team holistically – the whole person.
We celebrate our differences in experiences, viewpoints, and identities because we know it leads to better outcomes.

Did you know:

Our main goal at SmartBear is to make our technology-driven world a better place.
SmartBear is committed to ethical corporate practices and social responsibility, promoting good in all the communities we serve.
SmartBear is headquartered in Somerville, MA with offices across the world including Galway Ireland, Bath, UK, Wroclaw, Poland and Bangalore, India.
We’ve won major industry(product and company) awards including best places to work

SmartBear is an equal employment opportunity employer and encourages success based on our individual merits and abilities without regard to race, color, religion, gender, national origin, ancestry, mental or physical disability, marital status, military or veteran status, citizenship status, age, sexual orientation, gender identity or expression, genetic information, medical condition, sex, sex stereotyping, pregnancy (which includes pregnancy, childbirth, and medical conditions related to pregnancy, childbirth, or breastfeeding), or any other legally protected status.

#LI-JC4