Contract-to-hire Site Reliability Engineer (SRE) Opportunity in Plano, TX

This can be hybrid (preferred; both remote and in office) or remote.

PRIMARY PURPOSE OF THIS POSITION:

The Lead Site Reliability Engineer (SRE) is primarily responsible for leading a Site Reliability Engineering program for the purpose of building and running large-scale, distributed, fault-tolerant systems with reliability and uptime appropriate to users’ needs. The Lead SRE is also responsible for the performance, training, discipline and development of assigned personnel; and provides input and assistance with budgeting, financial management, and technical system design and selection.

ESSENTIAL FUNCTIONS: (other duties may be assigned)

  • Coordinate and manage the activities of a Site Reliability Engineering (SRE) team responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning for their assigned system(s).
  • Work with appropriate stakeholders to identify Service Level Objectives (SLOs) for critical systems. Identify the Service Level Indicators (SLIs) required to effectively measure SLOs and work with the appropriate technical stakeholders to implement and/or instrument the required systems and processes to measure and monitor the SLIs.
  • Build, administer and participate in a program to minimize change disruptions by identifying, developing, and implementing automation to implement progressive rollouts, quickly and accurately detect problems and roll back changes safely when problems arise.
  • Identify, pursue, and implement systems and tools to eliminate toil work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.
  • Administer and participate in, as needed, a postmortem program for all significant incidents. This program should include an investigation to establish what happened in detail, find all root causes of the event, and assign actions (via User Stories) to correct the problem or improve how it is addressed next time.
  • Work with the appropriate service owners/stakeholders to develop playbooks containing best practices for troubleshooting steps and tips for the most likely and impactful failure modes. Exercise the playbooks through group tabletop exercises.
  • Collaborate with the Systems Monitoring Lead to ensure monitoring systems to include the corresponding people, processes, and tools are appropriately defined and implemented.
  • Identify and facilitate individual development plans (IDPs), to include both formal and informal development opportunities, for direct reports.
  • Assist in the development of annual budgets for assigned area of responsibility and monitor spend and performance to optimize organizational profitability.
  • Provide evening and weekend support to the team as needed.


REQUIREMENTS: (Equivalent combinations of education, licenses, certifications and/or experience may be considered)

Education

  • A four-year degree in Computer Science, Management Information Systems, Computer Engineering; or a four-year degree in another field of study which includes courses in computer programming, systems analysis, system development, or systems engineering; or relevant work experience is required

Experience

  • 6 years of applicable experience in a technology environment, preferably with time spent in an engineering capacity, is required.
  • 2 years of multi-person team management experience, including task assignment, performance coaching and reviews, hiring and firing, and conflict management is required.
  • 2 years+ working in a Cloud environment with AWS as the preferred
  • 4 years+ working with performance monitoring tools like Datadog, Dynatrace, etc…
  • Coding experience beyond simple scripts is required.
  • Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, and JavaScript is preferred.
  • Experience with distributed storage technologies like NFS, HDFS, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn) is preferred.

Licenses/Certifications

  • SRE Practitioner, SRECP or equivalent is preferred.

Tools & Equipment

  • General Office Equipment

If you have this experience, feel you are a fit for this position, and are interested, please answer the questions below:

    Your Email (required)

    Position You Are Applying For?

    What is your availability to start?

    Are you open to a contract-to-hire position?

    What is your current salary or pay rate?

    Are you currently eligible to work for any employer in the US?

    When is the best time to contact you and what # can you be reached at for this opportunity?

    Upload Your Resume

    Mike Hanes
    ProVisionTech

    Contract AWS FedRAMP DevOps Engineer/SRE Opportunity

    This is a contract position that is responsible for deploying software and infrastructure to a secure FedRAMP environment.  You will be joining an existing team.  You bring your experience and knowledge of how to build, deploy, and maintain a FedRAMP environment.

    Responsibilities

    • Design and develop CI/CD pipeline through industry standard CI/CD technologies 
    • Ability to develop shared library to using GitLab or AWS Code Commit to enable build and deploy 
    • Able to build and deploy applications to secure government cloud (FedRAMP) infrastructure on AWS 
    • Able to work with a cross functional teams to deliver Infra-as-a-code solutions to build infrastructure configuration management solutions 
    • Troubleshoot build and deployment related issues on cloud platforms 
    • Collaborate with application teams, infrastructure teams, tools teams, and operations teams to develop integrated solution to enable CI/CD pipelines (both on premise and public cloud)
    • Act as Scrum Leader or Lead when needed

    Required Skills 

    • 10+ years overall deployment/build/support/architecture experience with minimum of 2+ years in DevOps or cloud technologies  
    • General knowledge of UNIX/Linux & Windows 
    • Experience working with AWS in a solution and deployment role
    • Experience with scripting (Shell, Python etc.,) 
    • Should have experience on build and deployment process for different technology-based applications like Java, .Net, PHP, Node JS, Angular, Python etc.
    • Experience with at least one of CI/CD build and deployment system (Jenkins, GitLab, GitHub or AzureDevOps GitHub, Maven)
    • Experience in a scrum team as an SRE or senior DevOps role from start to finish (build/deploy/support)
    • Strong understanding of cloud native and container based distributed systems like Kubernetes
    • Ability to collaborate with Engineering, Architecture, Infrastructure, and Operations teams to develop design and deliver solutions to drive infrastructure provisioning on AWS using Terraform, Helm, and Ansible
    • Ability to work with AppDev/Engineering to develop release mechanisms that improve the product release cycle (enable hooks for APM, develop DR and Automated redeployment strategies)

    Desired Skills

    • Experience deploying and configuring APM and Application Observability tools
    • Exposure to open-source technologies
    • Experience in Docker, Docker compose, Docker file
    • Experience on container orchestration technologies on cloud platforms (AKS/EKS/PKS/GKE Instances)
    • Infra automation using Ansible/Terraform/Helm
    • Ability to provision infrastructure using Terraform both in a pipeline and outside a pipeline
    • Working experience of provisioning and managing container clusters
    • Experience managing container-based applications
    • Good understanding of docker networking, volumes and registry

    If you have this experience, feel you are a fit for this position, and are interested, please email the answers the questions below:

      Your Email (required)

      Position You Are Applying For?

      What is your availability to start?

      Are you open to a contract position?

      What is your current salary or pay rate?

      Are you currently eligible to work for any employer in the US?

      When is the best time to contact you and what # can you be reached at for this opportunity?

      Upload Your Resume

      Regards,

      Mike Hanes

      ProVisionTech

      mhanes@provisiontechgroup.com