Head of Site Reliability Engineering
OpenGov
This job is no longer accepting applications
See open jobs at OpenGov.See open jobs similar to "Head of Site Reliability Engineering" ACME Capital.Administration, Software Engineering
Boston, MA, USA
Posted on Nov 9, 2023
OpenGov is home to an exceptional team - passionate about our mission to power more effective and accountable government. By bringing the OpenGov Cloud to our nation's state and local government, we’re transforming communities so they can thrive!
Imagine yourself being able to help small business owners open their doors faster, ensuring our tax dollars are accounted for, creating safer infrastructure, modernizing the permitting process, and assisting with disaster recovery. The work you do here every day has a meaningful impact on people's lives!
🌟 OpenGov is a 2023 Top Workplaces USA award winner and a Forbes America's Best Startup Employer. OpenGov is a high growth, late stage venture backed startup with investors including Andreessen Horowitz, Formation 8, Cox Enterprises, and Emerson Collective. Our Board of Directors includes iconic Silicon Valley executives John Chambers (former Cisco Chairman and CEO) and Marc Andreessen (Time Magazine’s list of the 100 most influential people in the world). 🌟
Join our smart, fun, and humble team to experience the most rewarding career of your life!
The Head of Site Reliability Engineering, reporting to the SVP of Platform Engineering, is ultimately responsible for leading the SRE team and efforts to support the operational environments of OpenGov Cloud Products.
The mission of the SRE organization is to ensure system availability, reliability, and performance while promoting efficient development processes. The team balances the trade-off between rapid innovation and system stability. This includes adopting practices such as error budgeting, service level objectives (SLOs), and automating toil work.
The primary responsibility of the SRE organization is to maintain the availability, stability and security of the customer product experience. This responsibility overrides all other functions of the SRE organization, and this section will provide an overview of the components of this responsibility.
Site Reliability Engineering is an engineering discipline, and as such this team is expected to accomplish its primary responsibility through constant and iterative development and integration of both in-house tools and third party services. As such, these services can be viewed as the “product” that the SRE team delivers to OpenGov. This product, when combined with OpenGov’s product lines, makes up our cloud services.
With this in mind, the SRE work product contains the following core functions:
• 24/7 monitoring and management of customer facing services
• Service availability and resiliency oversight. Ensure all teams have clear roadmaps to increase availability and resiliency. Ensure that we have a proper Disaster Recovery posture and that we regularly exercise and prove our ability to recover and restore service availability.
• Ownership of the monthly Ops Review and follow-up items - Lead the monthly Ops Review where each team surfaces operational issues and tracks operational metrics; make sure teams and leaders are tackling the top items surfaced and identified for improvement; • Provide guidance to teams in advance of the Ops review on areas that they should be tackling and to which they should paying attention.
• Data-center cost optimization - Ensure all oversight and that all teams are tracking their spend and meeting their product lines’ datacenter cost goals; Initiate and track cross-cutting cost optimization initiatives; Forecast our Data-center spend and re-cast the forecast based on any product or technology changes.
• Operational security and regulatory compliance of the service. Security posture “central command” within Engineering - Be the central point of contact for the Global Security Team in planning our R&D and products security posture; ensuring all R&D teams we are continually evolving their posture; and in executing internal and external security audits.
• Cost of service and costs associated with the SRE work product
• Change management and CI/CD
• Third party product and service integration and vendor relationships with those services
Requirements
The SRE Leader themselves should have considerable technical expertise, with significant experience in building and launching Cloud Services. While this is not an exhaustive profile, the following considerations are critical:
• 8+ years of experience and expertise leading delivery and operation of cloud-based services.
• Managing a team of at least 25.
• Managing through managers.
• Experience with delivering services to regulated industries and the associated regulatory hurdles.
• Experienced in the practical financial aspects of operating Cloud Services, including managing vendor relationships, budget management and building operational cost models.
• Experience managing global 24/7 organizations and growing organizations from small teams into larger organizations.
• Significant technical background in Cloud Services, and comfort with deep architectural aspects of running such services.
• A proven track record in leading organizational transformations by facilitating the development and implementation of new processes, structures, and cultural shifts to enhance efficiency and productivity.
• Demonstrating strong influencing skills to effect change and drive strategic initiatives across all levels of the organization. Utilizing relationship-building skills to collaborate with teams and stakeholders, fostering an environment of cooperation towards common goals.
• Showing resilience in the face of resistance, navigating organizational dynamics effectively to get buy-in for new ideas and strategies.
• Strong people leadership capabilities; setting, measuring and reporting on organizational goals, and aligning teams to company objectives
What makes OpenGov unique
» Leadership: CEO Zac Bookman (MPA from Harvard and JD from Yale) is truly a mission-driven CEO. He was named one of the 100 most Intriguing Entrepreneurs by Goldman Sachs, a Tech Pioneer by the World Economic Forum, and SF and Silicon Valley Business Times' 40 under 40 class of 2018!
» Funding: Over $250 million, late stage venture backed startup, from top tier investors including Andreessen Horowitz, 8VC, Cox Enterprises, and Emerson Collective.
» Board of Directors: Includes iconic executives John Chambers (former Cisco Chairman and CEO), Marc Andreessen (Time Magazine’s list of the 100 most influential people in the world), and Amy Pressman (co-founder, former president, and a current board member of Medallia).
» Growth: Record breaking growth with 1,800+ governments (and counting) using our products. Click here to read more.
» Culture: Winner of 2023 Top Workplaces USA award, Awarded Top 25 Cloud Companies to Work For, Winner of Forbes 2022 Best Startup Employers, 50 Best Workplaces award. Check out our Careers Video!
» Perks: 90% paid Medical/Dental/Vision premium for employees, fully paid Life and Short/Long term disability insurance, Unlimited PTO, Parental Leave policy, annual wellness stipend, anniversary awards, and more!
» Product: Named to the GovTech 100 (seven consecutive years), we are the leader in cloud software for our nation's cities, counties, and state agencies.
» Mission Driven: We are a technology company with a passion for the mission. We're powering more effective and accountable government.
Come join us and make a positive social impact!
OpenGov is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
This job is no longer accepting applications
See open jobs at OpenGov.See open jobs similar to "Head of Site Reliability Engineering" ACME Capital.