Senior Site Reliability Engineer

Visa • Full-time • Bengaluru, India • 11m ago

Company Description

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive.

When you join Visa, you join a culture of purpose and belonging – where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world – helping unlock financial access to enable the future of money movement.

Join Visa: A Network Working for Everyone.

Job Description

Product Reliability Engineering (PRE) is part of the Visa's technology organization. The division is responsible for maintaining and supporting Visa's data assets and provides support for value added products and services to drive innovation for our partners and clients, within Visa and globally. Product Reliability Engineering Big Data Platform Team is part of PRE and supports Open-source Big Data stack and Big Data Services in Visa.

As a Staff Site Reliability will be responsible for monitoring, troubleshooting, automating and continuously developing software products and tools to improve the availability and resiliency of Open-source Platforms at Visa.

Key Responsibilities:

Person will be responsible to Perform Administration and Engineering activities on Open-source Hadoop, Open-source Spark, Airflow, Machine learning platform running on Open-source Kubernetes clusters.

Strong Troubleshooting and debugging skills.

Cross-team teamwork, build and maintain relationships with the customer teams, the user community, architects, and engineering teams, jointly work on key deliverables ensuring production scalability and stability.

Plan and perform capacity expansion and upgrades in timely manner avoiding any scaling issues and bugs.

Automation of repetitive tasks to reduce manual effort and avoid Human errors.

Tune alerting and setup observability to proactively identify the issues and performance problems.

Knowledge of Infrastructure Operations and Production Support of container technologies and orchestration platforms is plus.

Perform automation and selfheal as per the requirement.

Participate in the determination of root causes for Kubernetes Application service failures and support escalation.

Ensure the Kubernetes platform services can effectively meet performance and SLA requirements.

Knowledge of Docker/Kubernetes deployment, configuration, scaling, and management of containerized applications is a plus.

Hardening, securing the Kubernetes cluster with monitoring and auditing dashboards.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Qualifications

Basic Qualification:
2+ years of relevant work experience and a Bachelors degree, OR 5+ years of relevant work experience

Preferred Qualifications:
3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD)
At least 1-year hands-on experience with On-Prem container Infrastructure – OpenShift, Opensource Kubernetes preferred.
Experience in managing and tuning performance of Hadoop platforms.
Extensive knowledge on Hadoop eco-system such as HDFS, Yarn, HIVE and SPARK.
Excellent Shell, Python programming skills for automation requirement for repetitive dev-ops tasks
Understanding of security tools like Kerberos and Ranger.
Must have Strong Knowledge & experience in Unix/Linux Systems Administration in relevant technologies.
Experience with configuration management tools like Chef, Ansible is a plus
Working knowledge of monitoring and logging tools: Prometheus, Grafana etc is plus
Excellent verbal and written communication and presentation skills, analytical and problem-solving skills
Self-driven, Ability to work independently.

Additional Information

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.