Archive for category: SRE

Contract-to-hire Site Reliability Engineer (SRE) Opportunity in Plano, TX

September 1, 2022/in Contract-to-hire Site Reliability Engineer, Contract-to-hire SRE, Jobs, SRE/by admin

This can be hybrid (preferred; both remote and in office) or remote.

PRIMARY PURPOSE OF THIS POSITION:

The Lead Site Reliability Engineer (SRE) is primarily responsible for leading a Site Reliability Engineering program for the purpose of building and running large-scale, distributed, fault-tolerant systems with reliability and uptime appropriate to users’ needs. The Lead SRE is also responsible for the performance, training, discipline and development of assigned personnel; and provides input and assistance with budgeting, financial management, and technical system design and selection.

ESSENTIAL FUNCTIONS: (other duties may be assigned)

Coordinate and manage the activities of a Site Reliability Engineering (SRE) team responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning for their assigned system(s).
Work with appropriate stakeholders to identify Service Level Objectives (SLOs) for critical systems. Identify the Service Level Indicators (SLIs) required to effectively measure SLOs and work with the appropriate technical stakeholders to implement and/or instrument the required systems and processes to measure and monitor the SLIs.
Build, administer and participate in a program to minimize change disruptions by identifying, developing, and implementing automation to implement progressive rollouts, quickly and accurately detect problems and roll back changes safely when problems arise.
Identify, pursue, and implement systems and tools to eliminate toil work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.
Administer and participate in, as needed, a postmortem program for all significant incidents. This program should include an investigation to establish what happened in detail, find all root causes of the event, and assign actions (via User Stories) to correct the problem or improve how it is addressed next time.
Work with the appropriate service owners/stakeholders to develop playbooks containing best practices for troubleshooting steps and tips for the most likely and impactful failure modes. Exercise the playbooks through group tabletop exercises.
Collaborate with the Systems Monitoring Lead to ensure monitoring systems to include the corresponding people, processes, and tools are appropriately defined and implemented.
Identify and facilitate individual development plans (IDPs), to include both formal and informal development opportunities, for direct reports.
Assist in the development of annual budgets for assigned area of responsibility and monitor spend and performance to optimize organizational profitability.
Provide evening and weekend support to the team as needed.

REQUIREMENTS: (Equivalent combinations of education, licenses, certifications and/or experience may be considered)

Education

A four-year degree in Computer Science, Management Information Systems, Computer Engineering; or a four-year degree in another field of study which includes courses in computer programming, systems analysis, system development, or systems engineering; or relevant work experience is required

Experience

6 years of applicable experience in a technology environment, preferably with time spent in an engineering capacity, is required.
2 years of multi-person team management experience, including task assignment, performance coaching and reviews, hiring and firing, and conflict management is required.
2 years+ working in a Cloud environment with AWS as the preferred
4 years+ working with performance monitoring tools like Datadog, Dynatrace, etc…
Coding experience beyond simple scripts is required.
Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, and JavaScript is preferred.
Experience with distributed storage technologies like NFS, HDFS, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn) is preferred.

Licenses/Certifications

SRE Practitioner, SRECP or equivalent is preferred.

Tools & Equipment

General Office Equipment

If you have this experience, feel you are a fit for this position, and are interested, please answer the questions below:

Your Email (required)

Position You Are Applying For?

What is your availability to start?

Are you open to a contract-to-hire position?

What is your current salary or pay rate?

Are you currently eligible to work for any employer in the US?

When is the best time to contact you and what # can you be reached at for this opportunity?

Upload Your Resume

Δ

Mike Hanes
ProVisionTech

Contract AWS FedRAMP DevOps Engineer/SRE Opportunity

December 16, 2021/in AWS, Dallas DevOps, DevOps, fedramp, Jobs, SRE, TX DevOps/by admin

This is a contract position that is responsible for deploying software and infrastructure to a secure FedRAMP environment. You will be joining an existing team. You bring your experience and knowledge of how to build, deploy, and maintain a FedRAMP environment.

Responsibilities

Design and develop CI/CD pipeline through industry standard CI/CD technologies
Ability to develop shared library to using GitLab or AWS Code Commit to enable build and deploy
Able to build and deploy applications to secure government cloud (FedRAMP) infrastructure on AWS
Able to work with a cross functional teams to deliver Infra-as-a-code solutions to build infrastructure configuration management solutions
Troubleshoot build and deployment related issues on cloud platforms
Collaborate with application teams, infrastructure teams, tools teams, and operations teams to develop integrated solution to enable CI/CD pipelines (both on premise and public cloud)
Act as Scrum Leader or Lead when needed

Required Skills

10+ years overall deployment/build/support/architecture experience with minimum of 2+ years in DevOps or cloud technologies
General knowledge of UNIX/Linux & Windows
Experience working with AWS in a solution and deployment role
Experience with scripting (Shell, Python etc.,)
Should have experience on build and deployment process for different technology-based applications like Java, .Net, PHP, Node JS, Angular, Python etc.
Experience with at least one of CI/CD build and deployment system (Jenkins, GitLab, GitHub or AzureDevOps GitHub, Maven)
Experience in a scrum team as an SRE or senior DevOps role from start to finish (build/deploy/support)
Strong understanding of cloud native and container based distributed systems like Kubernetes
Ability to collaborate with Engineering, Architecture, Infrastructure, and Operations teams to develop design and deliver solutions to drive infrastructure provisioning on AWS using Terraform, Helm, and Ansible
Ability to work with AppDev/Engineering to develop release mechanisms that improve the product release cycle (enable hooks for APM, develop DR and Automated redeployment strategies)

Desired Skills

Experience deploying and configuring APM and Application Observability tools
Exposure to open-source technologies
Experience in Docker, Docker compose, Docker file
Experience on container orchestration technologies on cloud platforms (AKS/EKS/PKS/GKE Instances)
Infra automation using Ansible/Terraform/Helm
Ability to provision infrastructure using Terraform both in a pipeline and outside a pipeline
Working experience of provisioning and managing container clusters
Experience managing container-based applications
Good understanding of docker networking, volumes and registry

If you have this experience, feel you are a fit for this position, and are interested, please email the answers the questions below:

Regards,

Mike Hanes

ProVisionTech

mhanes@provisiontechgroup.com