Site Reliability Developer

Site Reliability Developer Job Description Template

Our company is looking for a Site Reliability Developer to join our team.

Responsibilities:

  • Work closely with development team on maintaining operational health of core compute services for API availability and low latency;
  • Managing and triaging tickets. Driving prioritization and execution of work based on impact;
  • Drives new runbooks to help reduce mean triage time of incidents. Prioritize and automate high hit count runbooks;
  • Practice sustainable incident response and drive root case analysis.

Requirements:

  • Strong understanding of Linux/Unix commands;
  • Deep understand of service metrics and alarms through the development of dashboards, service KPIs, alarming systems;
  • Understanding of Linux operating systems and Linux system administration;
  • Systematic problem-solving approach, strong communication skills, a sense of ownership and drive;
  • BS degree in Computer Science or related technical field involving coding or equivalent practical experience;
  • Experience automating tasks with scripting languages such as Python, Bash, and JavaScript;
  • Experience working in an operational environment with mission critical tier one services with associated pager duty.