Staff Site Reliability Engineer
OpenGov
This job is no longer accepting applications
See open jobs at OpenGov.See open jobs similar to "Staff Site Reliability Engineer" ACME Capital.Software Engineering
Pune, Maharashtra, India
Posted on Jul 28, 2023
OpenGov is home to an exceptional team - passionate about our mission to power more effective and accountable government. By bringing the OpenGov Cloud to our nation's state and local government, we’re transforming communities so they can thrive!
Imagine yourself being able to help small business owners open their doors faster, ensuring our tax dollars are accounted for, creating safer infrastructure, modernizing the permitting process, and assisting with disaster recovery. The work you do here every day has a meaningful impact on people's lives!
🌟 OpenGov is a 2023 Top Workplaces USA award winner and a Forbes America's Best Startup Employer 🌟
Join our smart, fun, and humble team to experience the most rewarding career of your life!
As a Staff SRE at OpenGov, you'll build and own best-in-class SaaS solutions that enable efficiency, transparency, and accountability within government agencies. You'll be the lead of our SRE team, delivering new infrastructure and CI/CD improvements in a fast-paced, agile environment using a mix of traditional and modern technologies, including Windows OS level patching, SSM, IIS Administration, Windows Server installations and bundling, Microsoft packaging plus GitHub Actions, Terraform, Kubernetes, AWS, ArgoCD, and Grafana. You will be required to balance multiple tracks between Single Tenant and moving to multi tenancy, Feature deployments and CI/CD along with Customer reliability etc. You will focus on combining software and systems engineering to build and run large-scale, massively distributed fault-tolerant systems for managing local government's assets and infrastructure. This role will work to ensure that our FIN services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customers' needs, and at a fast rate of improvement. Additionally, SREs will keep an ever-watchful eye on our system's capacity and performance.
Responsibilities
Setup and lead SRE driven automation and strategic initiatives
Awareness and experience of single tenant and multi tenant application suite on AWS
Suggest and lead cost optimization initiatives with AWS instance Management
Run the production environment by monitoring availability and taking a holistic view of system health
Responsible for Security vulnerabilities patching OS level (Windows) and IIS Administration tasks
Debugging of infra related issues on customer single tenant applications
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Provide primary operational support and engineering for multiple large-scale distributed software applications
Setup of Observability and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding using Grafana
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Oversee the deployment, scaling, and management of Kubernetes clusters.
Monitor utilization and analyze performance trends to identify and resolve bottlenecks to remain aligned with the team’s Service Level Objectives.
Participate in incident response activities, ensuring prompt resolution of any system or application issues.
Lead deployment aspects related to schema migrations, security (e.g. secrets management), reliability, and scalability
Provide incident response support, troubleshooting of production systems, including optimization to their performance, scale, utilization, and costs
Qualifications
8+ years professional experience using Git or another popular source control system
8 + years on AWS, SSM , OS patching, 3rd Party libraries upgrade and cleanup
8+ years of Windows Server OS level and IIS administration
7+ years of experience building successful software products
10+ years of experience working cross-functionally, collaborating with designers, and coordinating with both product and program managers
Strong verbal and written communication skills in English
Deep understanding of functional programming fundamentals
10+ yrs experience of SRE experience with Product Organization a must
10+ yrs of experience AWS, AZURE, Security Operations, Cloud Platform setup
Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and well versed with Scripting and Shell scripting
Good to have
- PHP 7+ knowledge, DB2 or related database administration experience
Experience with distributed storage technologies such as Amazon S3, as well as dynamic resource management frameworks (Kubernetes)
Proactive approach to identifying problems, performance bottlenecks, and areas for improvement
What makes OpenGov unique
» Leadership: CEO Zac Bookman (MPA from Harvard and JD from Yale) is truly a mission-driven CEO. He was named one of the 100 most Intriguing Entrepreneurs by Goldman Sachs, a Tech Pioneer by the World Economic Forum, and SF and Silicon Valley Business Times' 40 under 40 class of 2018!
» Growth: Record breaking growth with 1,800+ governments (and counting) using our products. Click here to read more.
» Culture: Winner of 2023 Top Workplaces USA award, Awarded Top 25 Cloud Companies to Work For, Winner of Forbes 2022 Best Startup Employers, 50 Best Workplaces award. Check out our Careers Video!
» Product: Named to the GovTech 100 (seven consecutive years), we are the leader in cloud software for our nation's cities, counties, and state agencies.
» Mission Driven: We are a technology company with a passion for the mission. We're powering more effective and accountable government.
Come join us and make a positive social impact!
OpenGov is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
This job is no longer accepting applications
See open jobs at OpenGov.See open jobs similar to "Staff Site Reliability Engineer" ACME Capital.