McGraw Hill logo

McGraw Hill

Sr Site Reliability Engineer

🇺🇸 Remote - US

🕑 Full-Time

💰 $124K - $155K

💻 Software Engineering

🗓️ September 18th, 2024

Ansible Golang Python
Overview

Impact the Moment

At McGraw Hill we create best-in-class, next-generation learning platforms that are used by millions of students and educators worldwide every day. We design intuitive and effective tools and experiences that maximize teachers’ time and students’ learning. And we do all of this in a supportive and collaborative environment where we work alongside brilliant colleagues, touch lives around the world, see the difference our hard work makes, and continue our paths of lifelong learning.

Your impact on team

As a Sr Site Reliability Engineer at McGraw Hill, you will play a crucial role in designing and maintaining high-capacity systems that ensure the reliability, performance, and security of our customer platforms. You will collaborate with product teams within a DevOps framework to implement automation tools and processes that enhance predictability, accelerate time-to-market, and optimize costs. Your efforts will directly contribute to operational excellence and help advance our mission to deliver exceptional, reliable services.

This is a remote position open to applicants authorized to work for any employer within the United States.

What You’ll Do

Cloud Engineering

  • Design, deploy, and manage automation tools in a DevOps model to enhance predictability, accelerate time-to-market, and ensure repeatability, traceability, and transparency of infrastructure automation (infrastructure-as-code, monitoring-as-code).
  • Collaborate with product development teams to optimize systems for reliability and performance, while managing AWS costs and using optimization tools to maximize ROI and meet Service Level Objectives.
  • Continuously learn and stay updated on the AWS ecosystem through participation in game day scenarios, professional conferences, and other development opportunities.

Observability Engineering

  • Ownership of the reliability, uptime, system security, cost, capacity, resiliency, and performance of applications and platforms, while leading data-driven initiatives to enhance stability and improve service levels.
  • Ensure that the architecture and deployment models are adequately designed to meet SLA commitments
  • Act as the primary contact during major incidents, resolving issues and managing on-call alarms.
  • Maintain and enhance telemetry systems to improve visibility into application performance and business metrics, ensuring operational workloads are effectively managed

DevSecOps

  • Support healthy software development practices, including complying with agile software development methodology, building standards for code reviews, work packaging, and continuous delivery
  • Partner with CyberSecurity and develop plans and automation to respond to new risks and vulnerabilities 

Resiliency Engineering

  • Collaborate with development teams to identify system failure points and blast radius, validate monitoring and observability configurations, coordinate failure injection testing, and document steady-state production levels and growth patterns.
  • Plan and forecast for seasonal growth, communicate trends with leadership, and enhance infrastructure scaling plans to handle 2x the anticipated load, while coordinating improvements to software and infrastructure to meet resiliency goals.
  • Mentor and nurture engineers across varying levels of experience; foster growth by setting high-reaching goals and providing support to achieve them.

About You

  • Minimum of 5 years of applicable Site Reliability Engineering (SRE) experience.
  • Hands-on experience with following technologies is required:
    • Cloud and Infrastructure as a Code: AWS (CloudFront, S3, EC2, ECS, SES, SQS, SNS, Load Balancing, VPC, Config, Systems Manager, Lambda, API Gateway, DB services) and Terraform
    • Programming and Containerization: Python, Golang, Bash, Ansible, and AWS ECS
    • Security and web platforms: Rapid7, WAF, Apache httpd, Apache Tomcat, Angular
    • Config Management and provisioning: Ansible, Packer
    • Telemetry: NewRelic, CloudWatch, DataDog
    • DevSecOps: Artifactory, Jenkins, CircleCI, SonarQube, Jfrog X-Ray, Control Tower, GitHub
  • Experience with Automation tools and software development is a bonus

Why McGraw Hill?

There has never been a better time to join McGraw Hill. In our culture of curiosity and innovation, you will be able to own your growth and develop as we do.
 
The pay range for this position is between $124,350 - $155,000 annually, however, base pay offered may vary depending on job-related knowledge, skills, experience, and location. An annual bonus plan may be provided as part of the compensation package, in addition to a full range of medical and/or other benefits, depending on the position offered. Click here to learn more about our benefit offerings.
 
McGraw Hill recruiters always use a “@mheducation.com” email address and/or from our Applicant Tracking System, iCIMS. Any variation of this email domain should be considered suspicious. Additionally, McGraw Hill recruiters and authorized representatives will never request sensitive information in email. 

McGraw Hill uses an automated employment decision tool (AEDT) to assist in the screening process by recommending candidates with “like skills” based on resume and job data. To request an alternative screening process, please select “Opt-Out” when asked to “Consent to use of Automated Employment Decision Tools” during the application.

McGraw Hill is committed to celebrating and supporting the differences that make us each unique and will not discriminate based on a person's gender, gender identity or expression, nationality, color, race, ethnicity, religion, sexual orientation, disability, appearance or veteran status. We are proud to be an equal opportunity and affirmative action employer, and we will also provide reasonable accommodation to qualified individuals with disabilities.