Tech Site Reliability Engineer
JPMorgan Chase & Co.
There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Asset & Wealth Management, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.
Develops and refine Service Level Objectives( including metrics like accuracy, fairness, latency, drift targets, TTFT (Time To First Token), and TPOT (Time Per Output Token)) for large language model serving and training systems, balancing availability/latency with development velocity
Designs, implement and continuously improve monitoring systems including availability, latency and other salient metrics
Collaborates in the design and implementation of high-availability language model serving infrastructure capable of handling the needs of high-traffic internal workloads
Develops and manage automated failover and recovery systems for model serving deployments across multiple regions and cloud providers
Develops AI Incident Response playbooks for AI-specific failures like sudden drift or bias spikes, including automated rollbacks and AI circuit breakers.
Leads incident response for critical AI services, ensuring rapid recovery and systematic improvements from each incident
Builds and maintain cost optimization systems for large-scale AI infrastructure, ensuring efficient resource utilization without compromising performance.
Engineers for Scale and Security, leveraging techniques like load balancing, caching, optimized GPU scheduling, and AI Gateways for managing traffic and security.
Collaborates with ML engineers to ensure seamless integration and operation of AI infrastructure, bridging the gap between development and operations. Implements Continuous Evaluation, including pre-deployment, pre-release, and continuous post-deployment monitoring for drift and degradation.
Formal training or certification on software engineering concepts and 3+ years applied experience
Demonstrated proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices
Proficient knowledge and experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
ECS, Kubernetes, Docker)
Experience with troubleshooting common networking technologies and issues
Understand the unique challenges of operating AI infrastructure, including model serving, batch inference, and training pipelines
Have proven experience implementing and maintaining SLO/SLA frameworks for business-critical services
Are comfortable working with both traditional metrics (latency, availability) and AI-specific metrics (model performance, training convergence)
Can effectively bridge the gap between ML engineers and infrastructure teams
Experience with AI-specific observability tools and platforms, such as OpenTelemetry and OpenInference. Familiarity with AI incident response strategies, including automated rollbacks and AI circuit breakers.
Knowledge of AI-centric SLOs/SLAs, including metrics like accuracy, fairness, drift targets, TTFT (Time To First Token), and TPOT (Time Per Output Token).
Expertise in engineering for scale and security, including load balancing, caching, optimized GPU scheduling, and AI Gateways.
Experience with continuous evaluation processes, including pre-deployment, pre-release, and post-deployment monitoring for drift and degradation.
- ...professionals can uncover accelerate and capture buyer demand to drive more revenue. Reliability Support Specialists at 6sense are instrumental figures of our Reliability team and work with Engineering teams to help diagnose and fix issues to ensure our services and infrastructure...SuggestedFull timeRemote job
- ...data-driven innovation.The Observability team needs experienced engineers skilled in cloud-native design, legacy maintenance, and SRE best... ...Tools;- Collaborate with development teams to design scalable and reliable systems, considering aspects such as fault tolerance,...Suggested
- ...skills and experience needed to grow within your role and advance your career and we have the perfect software engineering opportunity for you. As a Software Engineer III at JPMorgan Chase within the Consumer and Community Banking you are part of an agile team that works to...SuggestedFull time
- ...rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Consumer &... ...integration challenges. Secure Coding & Platform Reliability: Develop and maintain high-quality secure production...SuggestedFull time
- ...where each individual can thrive. We are looking for a QA Engineer III with expertise in data systems observability and automation... ...data validation performance benchmarking and ensure reliability automation and test best practices for large-scale distributed...SuggestedFull timeLocal area
- ...can thrive. Joel Hendrickson : Software Development Engineer III - QA We are seeking a Software Development Engineer III-... ...help ensure our platform meets the highest standards of quality reliability and performance. This is a hands-on technical role where you...Full timeLocal area
- ...your potential. Our team is key to our success. Were people-first. We value collaboration curiosity and commitment. As a Software Engineer III at JPMorgan Chase within the Accelerator business you are the heart of this venture focused on getting smart ideas into the hands...Full time
- ...rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Consumer and... ...in production. Be responsible for stability reliability security and production run books of the machine...Full time
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Asset & Wealth Management you serve as a seasoned member of an agile team to design and...Full time
- ...rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Commercial &... ...agile team tasked with designing and delivering reliable market-leading technology products that are secure...Full time
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Consumer and Community Banking - Wealth Management you will be a seasoned member of an...Full time
- ...facing data analytics and big data technologies As a Software Engineer III at JPMorgan Chase within the Commercial & Investment Banks... ...addressing complex issues and ensuring high availability and reliability. Support BAU (Business As Usual) operations for Markets businesses...Full time
- ...rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Commercial &... ...agile team tasked with designing and delivering reliable market-leading technology products that are secure...Full timeShift work
- ...whats possible with us as an experienced member of our Software Engineering team. As an Experienced Software Engineer at JPMorganChase within... ...Engineers at Software Engineer II and Software Engineer III levels. A determination will be made on placement for successful...Full time
- ...infrastructure. As a Software Development Engineer in the MLOps team, you will design and develop... ...robust systems that enable efficient, reliable, and optimized model deployment at scale.Role Value : As a Software Engineer (SDE III), you will drive continuous improvements across...
- ...exciting and rewarding opportunity for you to take your software engineering career to the next level. We are building a next generation... ...other enterprise collaboration systems As a Software Engineer III at JPMorgan Chase within the Digital Communication Compliance team...Full timeHybrid work
- ...to join us and make a significant impact. As a Software Engineer III at JPMorgan Chase within the International Consumer Bank, you will... ...to identify and solve problems. Ensure our systems are reliable and easy to operate and keep us up to date by continuously updating...
- ...advanced technologies that enable breakthrough research in fields like semiconductors life sciences and materials analysis. As a Software Engineer III youll design and implement sophisticated software applications that control our electron microscopes and analytical instruments...Full timeWork at officeWorldwideDay shift
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorganChase within the Infrastructure Platforms team you serve as a seasoned Cloud Migration Engineer specializing...Full time
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Commercial & Investment Bank you serve as a seasoned member of an agile team to design...Full time
- ...Inc. for a meaningful opportunity. As an Engineer III in Bengaluru you will be essential in maintaining... ...including weekends and holidays. ~ Reliable dependable and flexible to changing... ...teamwork skills. ~ Ability to travel to other sites if needed. Strongly preferred but...Full timeWork at officeWorking Monday to FridayFlexible hoursShift work
- ...channelling our start-up mentality every step of the way – meaning you'll have the opportunity to make a real impact. As a Software Engineer III at JPMorgan Chase within the International Consumer Bank, you will be joining our Cloud Platform Engineering tribe, which is at the...Full timeWork at officeRotating shift
- ...rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Asset & Wealth... ...and input validation. Owns system performance, reliability, observability (metrics, logging, tracing), and...
- ...tomorrow. Join us. Title : Staff Software Engineer (CloudOps III). Location : Pune India. Itrons Resiliency... ...evolve and strengthen cloud infrastructure and reliability practices. Youll be part of a high-performing Site Reliability Engineering (SRE) and Cloud...Full timeHybrid work
- ...Job Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Consumer & Community Banking Team, you serve as a seasoned member of an agile team...
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Consumer and Community - Wealth Management you will be a seasoned member of an agile...Full time
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the AI/ML & Data Platform team you will be a seasoned member of an agile team tasked with...Full time
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Consumer & Community Banking Team you serve as a seasoned member of an agile team to...Full time
- ...lead technology support in a dynamic environment enhancing your career with growth opportunities. Job Summary As a Software engineer III at JPMorgan Chase within the Consumer & Community Banking division you will play a pivotal leadership role in maintaining the operational...Full time
- ...Description We have an exciting and rewarding opportunity for you to take your software engineering career to the next level. As a Software Engineer III at JPMorgan Chase within the Commercial and Investment Banks Global Banking Tech Team you serve as a seasoned member...Full time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Tech Site Reliability Engineer. Be the first to apply!
