SanAntonioTXRecruiter Since 2001
the smart solution for San Antonio jobs

Lead Site Reliability Engineer with Java :: San Antonio, TX

Company: TOPSYSIT
Location: San Antonio
Posted on: April 11, 2025

Job Description:

Role: Lead Site Reliability Engineer with Java

Interested in this role You can find all the relevant information in the description below.

Location: San Antonio, Texas

Project Tenure: 18 Month

Customer: Banking Client

Relevant Experience: 14+ Years

Job Description & Key Responsibilities:

As a Lead Site Reliability Engineer (SRE), you will leverage your extensive experience in SRE practices to

maintain and enhance the reliability, performance, and scalability of mission-critical systems. You will

play a crucial role in ensuring the continuous availability and optimal functioning of our services.

Key Responsibilities:

--- Senior-Level SRE Expertise: Apply your deep understanding of SRE principles to lead efforts in

improving system reliability and operational efficiency.

--- Incident Management: Provide expert-level support during incidents, ensuring swift resolution

with minimal service disruption. Lead post-incident reviews to drive continuous improvement.

--- Monitoring & Alerting: Design, implement, and optimize monitoring, alerting, and incident

response processes. Ensure the effectiveness of these systems to proactively address potential

issues.

--- Automation: Drive the automation of manual processes to enhance operational efficiency,

reduce human error, and increase overall system resilience.

--- CI/CD Pipeline Management: Develop, maintain, and improve automated CI/CD pipelines using

tools such as GitLab CI/CD and Jenkins, ensuring seamless and reliable deployment processes.

--- Cross-Functional Collaboration: Work closely with cross-functional teams to ensure the

reliability, performance, and scalability of our infrastructure. Foster a culture of collaboration

and knowledge sharing.

--- Support Across Time Zones: Provide support across all U.S. time zones, with the flexibility to

work weekends, rotational shifts, and overtime as required to maintain service continuity.

Required Skills & Qualifications:

--- Java Programming: Advanced proficiency in Java, with a deep understanding of contemporary

software development practices.

--- Kubernetes & Containerization: Extensive hands-on experience with Kubernetes, including

containerization technologies like Docker and Kubernetes storage solutions such as Portworx.

--- Linux/Unix Systems: Strong command of Linux/Unix operating systems and Shell Scripting

(BASH), with a focus on system reliability and automation.

--- Functional Programming: Proficiency in functional programming languages such as Prolog,

Haskell, and OCaml.

--- Scripting & Automation: Experience with Python or Go, particularly in the context of scripting

and automation tasks.

--- Virtualization: In-depth knowledge of VMware and other virtualization platforms, with a focus

on optimizing virtual environments for reliability and performance.

--- Streaming Technologies: Expertise with Kafka Stream Generator, KSQLDB, cluster federation, and

Spark Streams, including experience in managing and optimizing streaming data architectures.

--- Service Mesh & Networking: Familiarity with Istio and Anthos Service Mesh, with the ability to

manage and optimize service meshes for complex environments.

--- Performance Monitoring & Debugging: Proficiency in using EBPF (Extended Berkeley Packet

Filter) for performance monitoring and debugging.

--- Monitoring & Logging Tools: Experience with industry-standard monitoring and logging tools

such as Splunk, Prometheus, Datadog, and Kiali.

--- Load Balancing: Familiarity with Nginx Controller and Seesaw for effective load balancing and

traffic management.

--- Infrastructure-as-Code (IaC): Competence in using Terraform for managing cloud infrastructure,

ensuring consistency and scalability across environments.

Additional Requirements:

--- Flexibility: Willingness to work weekends, rotational shifts, and provide 24/7 support as

necessary to maintain service reliability and meet project deadlines.

--- Certifications Required:

o Kubernetes

o Azure



Thanks,

Prem Kusuma

Keywords: TOPSYSIT, San Antonio , Lead Site Reliability Engineer with Java :: San Antonio, TX, IT / Software / Systems , San Antonio, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

San Antonio RSS job feeds