Availability Engineer
DigiCert
Who we are
DigiCert is a global leader in intelligent trust. We protect the digital world by ensuring the security, privacy, and authenticity of every interaction. Our AI-powered DigiCert ONE platform unifies PKI, DNS, and certificate lifecycle management, to secure infrastructure, software, devices, messages, AI content and agents. Learn why more than 100,000 organizations, including 90% of the Fortune 500, choose DigiCert to stop today’s threats and prepare for a quantum-safe future at
Job summary
We are seeking a highly skilled Observability & Incident Response Site Reliability Engineer (SRE) to own incident management practices across all production systems. In this role, you will be the subject matter expert for monitoring, alerting, tracing, and logging and lead incident response efforts. You will work at the intersection of product engineering, platform, and security teams to ensure our systems are observable, resilient, and compliant with SLA/SLO commitments.
What you will do
- Excellent knowledge on Kubernetes clusters and container workloads for production reliability.
- Administer and optimize CI/CD pipelines to support safe, fast, and frequent deployments, repeated manual tasks (Harness, GitHub Actions, etc.)
- Act as the primary Incident Manager for high priority production incidents — coordinating swift resolution across engineering, infrastructure, and business teams.
- Own and continuously improve incident response runbooks, escalation matrices, and on-call schedules.
- Drive root cause analysis for all major incidents — ensuring root cause analysis, action item tracking, and long-term resolution.
- Reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR) through proactive alerting and automated remediation.
- Establish and enforce SLA/SLO/SLI frameworks across all production services.
- Build automated runbooks and self-healing mechanisms to reduce manual intervention during incidents.
- Implement synthetic monitoring to proactively detect customer-facing issues.
- Hands-on experience with Splunk queries to investigate incidents, build dashboards, and drive observability across production systems.
- Exceptional communication skills — able to lead high-pressure incident bridges calmly and clearly.
- Detail-oriented with a strong sense of ownership and accountability.
- Ability to manage multiple concurrent incidents and priorities without losing composure.
What you will have
- 4+ years of experience in SRE, DevOps, Platform Engineering, or Observability Engineering roles.
- Hands-on experience leading incident response for high-severity production incidents.
- Strong background in Linux systems administration and distributed systems troubleshooting.
- Experience defining and managing SLOs, SLIs, and Error Budgets in production.
Nice to have
- Monitoring & alerting: New Relic, Nagios, or equivalent.
- Log management: Splunk.
- Incident management: PagerDuty, OpsGenie, VictorOps, or equivalent.
- Container orchestration: Kubernetes, Helm, Docker — with deep observability integration experience.
- Scripting & automation: Python, Bash or similar for building tooling and automations.
- Infrastructure as Code: Terraform or Salt.
- CI/CD pipelines: GitHub Actions, Harness.
Benefits
- Generous time off policies.
- Top shelf benefits.
- Education, wellness and lifestyle support.
To protect candidate information and maintain a secure hiring process, all applications must be submitted through our careers portal. Resumes or CVs sent directly via email will not be reviewed or considered.
#LI-SS1
- ...looking for an experienced Senior Remote Desktop Services (RDS) Engineer to support and enhance enterprise remote access infrastructure.... ..., and core infrastructure services while ensuring high availability, security, and performance across business-critical systems.This...Suggested
- ...commitment, collective expertise, and unique capabilities are the engine room behind SBM Offshore’s True. Blue. Transition. - shaping... ...a long-term, asset-backed business model that delivers high-availability assets and predictable cash flows. SBM Offshore combines engineering...SuggestedLong term contractFull timeContract workFor subcontractor
- ...activities, requires us to be ready to onboard and mobilize our future Engineering Project Engineers as soon as a new vacancy opens! ~ You are... ...Ensure the latest approved /agreed vendor data is available toproject team (engineers and designers) Ensuring the verification...SuggestedLong term contractFull timeTemporary work
- ...international team. We are looking for: Senior Software test Engineer - Domain Test engineer You’ll make an impact by: · As a... ...support demo servers used by global stakeholders, ensuring high availability, performance, and reliability. · Coordinate with cross-functional...Suggested1 day week
- ...carbon and replace cars. Could you be the full-time Bogie Test Engineer in Bangalore we’re looking for? Your future role Take on... ...ensure seamless execution ~ Managing timely calibration and availability of test instruments ~ Ensuring data integrity and quality to...SuggestedLong term contractFull timeLocal areaWorldwide
- ...people, companies, and the planet. The Team: As an IT Service Engineer at Celonis, you will be a vital part of our international IT... ...paid leave for primary carers and 12 weeks for supporting carers, available from your first day of employment. Work-Life Integration:...Full timeHybrid workWork at officeLocal areaImmediate startRemote jobWorldwideFlexible hoursShift work
- ...commitment, collective expertise, and unique capabilities are the engine room behind SBM Offshore’s True. Blue. Transition. - shaping... ...a long-term, asset-backed business model that delivers high-availability assets and predictable cash flows. SBM Offshore combines engineering...Long term contractFull timeContract workFor contractors
- ...commitment, collective expertise, and unique capabilities are the engine room behind SBM Offshore’s True. Blue. Transition. - shaping... ...a long-term, asset-backed business model that delivers high-availability assets and predictable cash flows. SBM Offshore combines engineering...Long term contractFull timeContract workFor contractors
- ...what runs the world. This Position reports to: Senior Engineering Manager Your role and responsibilities: In this role, you... ...project readiness by preparing material lists, coordinating availability of parts, tools, and equipment, and supporting continuous improvement...Full timeRemote job
- ...run what runs the world. This Position reports to: Senior Engineering Manager In this role, you will have the opportunity to act... ...that ABB makes no such requests. All our open positions are made available on our career portal for all the criteria to apply. ABB does...Full timeHybrid work
- ...with customer and other ABB teams. Coaching technicians and engineers and supervising complex site activities. Qualifications:... ...that ABB makes no such requests. All our open positions are made available on our career portal for all fitting the criteria to apply. ABB...Full timeLocal area
- ...journey. What you'll be doing: As a Designated Services Engineer, you will play a key role in ensuring WEKA's Customer Success,... ...Participate in on-call, follow-the-sun support rotations as needed. Availability for alternative work hours (nights, weekends, holidays) and...Full timeLocal areaNight shiftWeekend work
- The opportunity: ‘Engineer - Hardware PCB design’. Key responsibilities include: 1. Strategic Responsibilities: - We are looking... ...and footprints. 3.Conduct BOM scrubbing to ensure component availability and cost-effectiveness. 4.Design [ multi-layer PCBs(upto 8 layers...
- ...product lifecycle, and ensure stable, secure, performant and high available operations for and between system and cloud services.... ...OpenAI LLMs with prompt templates and structured outputs, Prompt engineering is desirable. ~ Experience in dynamic and interactive frontend...Permanent employmentFull timeHybrid workLocal areaWorldwideFlexible hours
- ...help run what runs the world. This Position reports to: Engineering Manager Job Advert In this role, you will have the opportunity... ...ABB makes no such requests. All our open positions are made available on our career portal for all the criteria to apply. ABB does...Full timeContract workHybrid work
- ...largest fintech infrastructure? We are seeking a skilled SRE - System Engineer to join our dynamic team. The ideal candidate will have a solid... .... Join us to ensure our services remain always-up and always-available. Key Requirement: ● Proficiency in Linux/Unix...Full timeContract workInternshipRelocation packageFlexible hours
- ...deals. 7. Maintains customer relations via phone/e-mail, or available service tools, and performs customer visits on a regular basis... ...Education, Experience And Competencies: Education : · Engineering Bachelor/ master’s Degree or higher, or equivalent through experience...
- ...dreams with purpose and speed, join us! Service Delivery Engineer Team: The SRE - System Delivery team at PhonePe is a cross-functional... ...: Bangalore Note: This role requires the candidate to be available for on-call shifts on a rotational basis, covering 24 hours a...Full timeContract workInternshipRelocation packageFlexible hoursShift work
- ...currently recruiting for a 'Embedded Hardware & Power electronics Engineer' on behalf of our partner. Explore the opportunity below and... ...(UART/SPI/I²C/CAN), power consumption, package, and lifecycle availability. ∙Designing signalconditioning circuits for sensors and inputs...Full timeShift work
- We are looking for a Temenos Transact (T24) Environment Engineer to manage and support banking application environments across the full... ...upgrades, deployments, and ensure system stability, performance, and availability across all environments.Roles & Responsibilities : - Provision...Full time
- ...landscape. Responsibilities Be part of a trading systems engineering team, dedicated to building out the core trading platforms.... ...Floor or related application support experience. Need to be available to work in US (Night) Shift Good product knowledge in financial...Full timeLocal areaUS shiftNight shift
- We are seeking a Senior IT Infrastructure Engineer (Network) to act as a technical Subject Matter Expert (SME) across enterprise network... ...- Traffic management and application delivery- High availability and failover design5. Security Architecture : - Network segmentation...Hybrid workWork at officeRotating shift
Rs 21 lakh p.a.
...Hiring: Specialist System Engineer | VMware + Hardware | Bangalore ? ? Position: Specialist System Engineer – VMware + Hardware... ...involves hands-on support, troubleshooting, and ensuring high availability and performance of VMware-based systems. Requirements...Full timeHybrid work- Job Description : Job Title : Kong API Gateway Engineer / AdministratorLocation : Bangalore / PuneExperience : 3 to 8 YearsEmployment Type... ...standards, developing custom plugins, and supporting highly available API ecosystems across enterprise applications and microservices...Full time
- ...companies, and the planet. Role Description ~ As an Applied Value Engineer you are pushing the envelope in solving business-critical... ...paid leave for primary carers and 12 weeks for supporting carers, available from your first day of employment. Work-Life Integration:...Full timeHybrid workWork at officeLocal areaImmediate startRemote jobWorldwideFlexible hoursShift work
- ...About the role The Senior Integration, API & Automation Engineer is a hands-on technical lead role responsible for shaping and executing... ...requirements (NFRs) and SLAs for APIs and integrations—availability, latency, throughput, error budgets, RTO/RPO—and ensure designs...Long term contractFull timeContract workLocal area
- ...Description :We are seeking an experienced .NET Production Support Engineer to provide critical application support for enterprise-grade... ..., infrastructure, and business teams to maintain high service availability and performance.Key Responsibilities :1. Production Support & Incident...
- ...there’s a good chance you will love being a part of our Software Engineering – Development team at Kyndryl, where you will be able to see... ...) Optimize application performance and ensure high availability Support deployment and release activities Using design documentation...Remote jobHybrid workImmediate startFlexible hours
- ...maintaining integration solutions across Thomson Reuters products using available APIs, with a strong focus on customer-facing API... ...manage, and strengthen relationships with customers, product teams, engineering teams, and technology stakeholders across Thomson Reuters....Full timeHybrid workWork at officeFlexible hours2 days week3 days week
- ...keen eye on system performance and throughput. Good understanding of system performance trade-offs, load balancing, and engineering for high availability. Excellent programming skills in Java/Python or any other popular programming language. Strong problem-solving and...Full time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Availability Engineer. Be the first to apply!
