Cloud Site Reliability Engineer

Cloud Site Reliability Engineer Job Description Template

Our company is looking for a Cloud Site Reliability Engineer to join our team.

Responsibilities:

  • Manage our cloud application using common DevOps and Agile practices to successfully keep uptime and delivery;
  • About 50% of your time should be spent automate the site systems to self-manage and self-heal.

Requirements:

  • Experience managing Windows and Linux servers;
  • Minimum of 5+ years of experience;
  • Experience in public clouds (Preferably Azure and AWS);
  • Deep understanding and knowledge of modern monitoring and alerting tools such as ELK stack, Nagios, Prometheus, Qualys, Dome9, etc;
  • Excellent communication and documentation skills;
  • Bachelor’s degree in Computer Science, Computer Engineering or related field or equivalent experience;
  • Experience with configuration management software like puppet, salt, chef, etc;
  • Working knowledge of scripting language such as Python, C#, PowerShell, Bash, etc;
  • Experience with some of the DevOps standard tools such as docker, Cloudera, Hadoop, terraform, Jenkins, git, consul, Vault, etc;
  • Strong problem-solving skills.