Please note that this site has updated features that do not run on older versions of Internet Explorer.
For an optimal experience, please use another browser or the most recent version of IE.
What You’ll Do
Engage in and improve the whole lifecycle of our products—from ideation and design, through development, launch, operation and iteration.
Build upon and strengthen our AWS based cloud infrastructure
Partner with product engineering teams through PDLC on design, development, capacity planning, and ramp plans to ensure Venmo continues to scale and maximize availability.
Ensure sufficient logging, monitoring and alerting strategies around availability, latency and overall system health.
Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity.
Host incident reviews and blameless post-mortems.
Continuously improve Incident Management policies, procedures, tools, and implementation.
What We’re Looking For
BS degree in Computer Science or related technical field involving systems engineering (e.g., physics or mathematics), or equivalent practical experience.
Software Development background with ability to analyze and improve existing codebase.
Cloud based architecture experience in AWS with the ability to architect large scale solutions
Container and Kubernetes
Established ability to diagnose technical problems, debug code, and automate routine tasks.
Ability to support a 24/7/365 always available production grade service.
Experience in one or more of the following: Java, Python, Golang, or shell scripting.
Experience with Unix/Linux operating systems internals and administration.
Ability to debug and optimize code and automate redundant tasks.
Great analytical and problem solving skills.
Familiarity with orchestration tools (Ansible, Puppet, Chef, Terraform, etc.).
Established experience with monitoring/logging tools and best practices.
Preferred Qualifications
Proficiency in managing cloud based large-scale infrastructure.
Expertise in designing and troubleshooting large scale distributed systems.
Strong communicator, both written and spoken.
Kubernetes and container experience.
3+ years SRE experience
7+ years experience in cloud infrastructure
5+ years experience in AWS