RichardsonRecruiter Since 2001
the smart solution for Richardson jobs

Sr. Site Reliability Engineer

Company: Yoh
Location: Richardson
Posted on: January 12, 2022

Job Description:

Sr. Site Reliability Engineer (AWS) Direct Hire 100% Remote -- - Responsibilities:-- - Be an integral part of the team responsible for all infrastructure, security, and deployment operations in Amazon Web Services (AWS) for production and non-production environments Contribute to infrastructure architecture and design for building secure, highly performant, resilient, scalable, extensible, maintainable and highly available software solutions, with ever-increasing automation Contribute to the strategy to ensure the confidentiality, integrity, and availability of cloud-based and Internet accessible systems and services that support the core business functions Contribute to application uptime commitment, including defining and deploying systems for metrics, logging, monitoring and alerting Develop, manage, utilize and champion effective documentation, tooling, and alerts to both identify and address reliability risks Use extensive metrics to identify issues before they impact end users, including establishing and reducing MTTD and MTTR Establish a complete monitoring and alerting strategy for all critical aspects of the flagship application to ensure SLAs and configure meaningful proactive notifications of possible issues for all systems Design platforms for reliability metrics and ensure that our production SLAs are measured, monitored and met Be responsible for driving processes for continuous improvement, including metrics Identify underlying root causes and provide recommendations or solutions for long term permanent fixes to critical production issues-- - Participate in patching, scaling, backup/recovery functions (including Disaster Recovery) and troubleshooting to ensure the platform is kept up-to-date, is available, and meets the needs of the business Collaborate closely with Application Development to identify opportunities which continually mature our deployment processes, both for infrastructure and code Own change management procedures and controls for our Production environments Contribute to Infrastructure As Code (IaC) utilizing Terraform and Azure DevOps Participate in on-call rotation and trouble-shooting for incident response Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, including incident response Drive operational excellence through better documentation, refined processes, and measuring results Contribute to configuration and management of our software defined networking capabilities including VPCs, firewalls, and routing -- - Skills & Abilities:-- - Bachelor's degree in Engineering, Computer Science or related field desired 5+ years' experience in Site Reliability Engineering, DevOps, CloudOps, or related field Microsoft Tools- Azure DevOps, Powershell, SQL Server, Windows Domain Administration, IIS-- - AWS- EC2, Lambda, API Gateway, DynamoDB, Elasticsearch, Kinesis, Cloudwatch Code Build and Deployment - CI/CD principals, AWS CodeDeploy, Azure Pipelines, Powershell Infrastructure as Code - Terraform and/or Cloudformation Containers - Docker, Kubernetes, AWS Fargate Working knowledge of software development lifecycle practices, AGILE and Kanban methodologies Excellent written and verbal communication skills #Dice-SPG #CB-SPG #Zip-SPG

Keywords: Yoh, Richardson , Sr. Site Reliability Engineer, Engineering , Richardson, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

Richardson RSS job feeds