Senior Engineer - Site Reliability Job Details | G Forty Two General Trading LLC

Apply now »

Senior Site Reliability Engineer, Core42 – United States - Remote

About Us

Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs. With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning ourselves at the forefront of AI innovation in the Middle East and beyond.

The Opportunity

As a Senior Site Reliability Engineer, you will be responsible for designing, implementing, and operating scalable, reliable, and secure infrastructure to support large-scale AI and HPC workloads. You will play a key role in building and maintaining CI/CD pipelines, Kubernetes-based environments, and observability systems that ensure high availability and performance across globally distributed platforms.

Working closely with engineering, product, and operations teams, you will drive automation, enforce SRE best practices, and contribute to a resilient and efficient infrastructure ecosystem that supports mission-critical applications.

Your Key Responsiblities

CI/CD & Automation: Design, build, and maintain robust CI/CD pipelines using tools such as GitLab CI, Azure DevOps, and/or Jenkins to enable rapid and secure software delivery
Kubernetes Operations: Operate, manage, and optimize Kubernetes clusters, ensuring scalability, performance, and resilience
Infrastructure as Code: Develop and maintain infrastructure using Terraform, Helm, Ansible, or similar tools to automate provisioning and configuration
Observability & Monitoring: Implement and manage monitoring solutions using Prometheus, VictoriaMetrics, Grafana, and ELK/EFK to ensure system health and performance
Incident Management: Lead root cause analysis (RCA), post-mortems, and continuous improvement initiatives to enhance system reliability
Reliability Engineering: Define and implement SRE best practices, including SLAs, SLOs, and error budgets
Logging & Alerting: Build and maintain logging, alerting, and tracing systems for proactive issue detection and rapid troubleshooting
Security & Compliance: Enforce security best practices and compliance standards across CI/CD pipelines and runtime environments; support audit readiness
Collaboration: Work cross-functionally with engineering, product, and infrastructure teams to align platform capabilities with business needs
Mentorship: Provide guidance and mentorship to junior engineers and contribute to knowledge sharing across teams
On-call Support: Participate in on-call rotations to support critical platform services

What we're looking for

Required Skills/Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field
5+ years of experience in DevOps, Site Reliability Engineering, or platform engineering roles in production environments
Proven experience managing Kubernetes clusters (e.g., GKE, EKS, AKS, or self-managed)
Hands-on experience with CI/CD tools and automation frameworks
Strong experience with infrastructure-as-code tools such as Terraform, Helm, or Ansible
Proficiency in container technologies (Docker, containerd) and orchestration with Kubernetes
Strong scripting/programming skills (e.g., Python, Bash, Go)
Experience with observability and monitoring stacks (Prometheus, Grafana, ELK/EFK)
Solid understanding of Linux systems, networking concepts, and cloud-native security best practices

Preferred Skills/ Qualifcations

Experience supporting AI/ML or HPC workloads in production environments
Knowledge of GPU resource management, workload schedulers, and performance tuning
Familiarity with distributed systems and large-scale infrastructure environments
Experience with incident management frameworks and reliability engineering practices
Strong collaboration and communication skills across cross-functional teams

Compensation

The U.S. base salary range for this full-time role is $109,600 to $164,400, with bonus, and benefits on top. Salary ranges are set according to the role, level, and location. The range listed represents the minimum and maximum target salary for new hires across all U.S. locations. Actual pay within this range will depend on factors such as work location, job-related skills, experience, and relevant education or training.

What Working at Core 42 Offers

With a diverse team of 1,100+ employees from 68 nationalities, we foster an inclusive, innovative, and collaborative environment. At Core42, we foster a culture grounded in trust, accountability, and high performance.

We are united by our values:

Grit – overcoming challenges with resilience and determination
Passion – striving for excellence in everything we do
Impact – driving meaningful change and progress

Our team members thrive in an environment where each contribution matters, and together, we achieve extraordinary results.

Apply now »