Position Summary...
What you'll do...
Summary:
As a member of the Global Technology Platforms (GTP) CCC team you will work with other CCC, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability of Walmart’s technology stack.
You're right for the job if you are comfortable leading major incident response with a technical team of engineer’s laser focused on restoring service across complex distributed systems. To successfully achieve this, you will draw upon your knowledge of the tech stack and tools to surface key data. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our SRE, Engineering and DevOps teams to support our next generation “always up” cloud-based technology platforms.
You will take command and control of Major Incidents focusing on restoration by identifying and coordinating with appropriate resources through all the phases of triage, restoration and validation. You will understand the technology stack and use this knowledge to ensure systems continue to meet production ready standards, Operational Excellence is key! Good judgement is crucial as you will own and deploy critical switches for command/control mitigation tooling. Your ability to continuously challenge yourself and develop a strong network with peers and stakeholders cross functionally will see you exceed in this role. Our goal is to protect the customer, merchant and associate experience and deliver outstanding levels of availability across Walmart Global Technology.
What You’ll Do:
- Control incident management processes and procedures.
- Calm under pressure when controlling major incident response.
- Excellent end to end technical understanding of core infrastructure, cloud services, platforms and micro-services.
- Ability to understand traffic flows and key dependencies between services.
- Ability to understand and capture key data from various sources, systems and people.
- Ability to effectively triage – be able to detect and determine symptom vs cause.
- Act as a technical leader and coach within the CCC. Analyze trends to pro-actively prevent incidents.
- Focus on leading immediate restoration vs root cause.
- Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).
- Development of monitoring and alerting frameworks
- Absorb knowledge and understand complex distributed systems - ability to share and impart this knowledge into your peer group and beyond.
- Strong focus on collecting and inferring metrics.
- Influence the design of system architecture and tactical solutions.
- Provide data for and actively participate in root cause analysis partnering with the Problem Management function.
- Design and implement technical solutions and process improvements to improve detection and resolution response that prevent repeat issues.
- Familiar with log centric tooling, ideally Splunk. Produce time series data and reusable dashboards for use both during and post event.
- Define the standards and requirements for new service onboarding, ensuring they are fit of production.
- Drive standardization and service focused instrumentation.
- Share knowledge globally between CCC teams.
- Strive for continuous improvement and make recommendations based on CCC process.
What You’ll Bring:
- Experience in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.
- Bachelor's Degree in Computer Science or a related field, or relevant work experience.
- Strong and demonstrable incident management skills with relevant experience in an enterprise organization.
- Experience and exposure working in a 24/7 operations support environment.
- Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
- Experience investigating, analyzing and troubleshooting large scale enterprise systems.
- Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
- Experience administering Unix/Linux in a production environment.
- Experience working with and developing enterprise monitoring/tooling/logging solutions like Grafana, Kibana, Splunk, Graphite, Nagios, New Relic, DynaTrace and Prometheus.
- Working knowledge of one or more cloud technologies such as AZURE, GCP, OpenStack.
- Experience with distributed version control like Git or similar
- Familiarity with continuous integration/deployment processes and tools such as Jenkins, Maven, Nexus, etc.
- Programming experience in one or more of the following languages: Go, Java, Python, Shell, etc.
- Experience in data science/machine learning would be advantageous.
About Walmart Global Tech
Imagine working in an environment where one line of code can make life easier for hundreds of millions of people. That’s what we do at Walmart Global Tech. We’re a team of software engineers, data scientists, cybersecurity expert's and service professionals within the world’s leading retailer who make an epic impact and are at the forefront of the next retail disruption. People are why we innovate, and people power our innovations. We are people-led and tech-empowered. We train our team in the skillsets of the future and bring in experts like you to help us grow. We have roles for those chasing their first opportunity as well as those looking for the opportunity that will define their career. Here, you can kickstart a great career in tech, gain new skills and experience for virtually every industry, or leverage your expertise to innovate at scale, impact millions and reimagine the future of retail.
Ways of Working
Flexible, hybrid work
We use a hybrid way of working with primary in office presence coupled with an optimal mix of virtual presence. We use our campuses to collaborate and be together in person, as business needs require and for development and networking opportunities. This approach helps us make quicker decisions, remove location barriers across our global team, be more flexible in our personal lives.
Benefits
Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include a host of best-in-class benefits maternity and parental leave, PTO, health benefits, and much more.
Walmart, Inc. is an Equal Opportunity Employer – By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing diversity- unique styles, experiences, identities, ideas and opinions – while being inclusive of all people.
#LI-Pl1
#LI-Hybrid
At Walmart, we offer competitive pay as well as performance-based incentive awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. For information about PTO, see https://one.walmart.com/notices .Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. For information about benefits and eligibility, see One.Walmart at https://bit.ly/3iOOb1J .The annual salary range for this position is $143,000.00-$286,000.00Additional compensation includes annual or quarterly performance incentives.Additional compensation for certain positions may also include:- Regional Pay Zone (RPZ) (based on location)- Stock equity incentives
Minimum Qualifications...
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 4 years’ experience in software engineering or related area.Option 2: 6 years’ experience in software engineering or related area.
Preferred Qualifications...
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Master’s degree in Computer Science or related field and 3 years' experience in software engineering
Primary Location...
640 W California Avenue, Sunnyvale, CA 94086-4828, United States of America