Site Reliability Developer Job Description Template
Our company is looking for a Site Reliability Developer to join our team.
Responsibilities:
- Work closely with development team on maintaining operational health of core compute services for API availability and low latency;
- Managing and triaging tickets. Driving prioritization and execution of work based on impact;
- Drives new runbooks to help reduce mean triage time of incidents. Prioritize and automate high hit count runbooks;
- Practice sustainable incident response and drive root case analysis.
Requirements:
- Strong understanding of Linux/Unix commands;
- Deep understand of service metrics and alarms through the development of dashboards, service KPIs, alarming systems;
- Understanding of Linux operating systems and Linux system administration;
- Systematic problem-solving approach, strong communication skills, a sense of ownership and drive;
- BS degree in Computer Science or related technical field involving coding or equivalent practical experience;
- Experience automating tasks with scripting languages such as Python, Bash, and JavaScript;
- Experience working in an operational environment with mission critical tier one services with associated pager duty.