Senior Systems Engineer

 

Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs. With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning ourselves at the forefront of AI innovation in the Middle East and beyond.

 

With a diverse team of 1,100+ employees globally from ~70 nationalities, we foster an inclusive, innovative, and collaborative environment. At Core42, we foster a culture grounded in trust, accountability and high performance. We are united by our values: Grit, where we overcome challenges with resilience and determination, Passion, which drives us to pursue excellence in everything we do, and Impact, as we aim to inspire progress and create meaningful change. Our team members thrive in an environment where each person’s contributions propel us forward, and together, we commit to achieving extraordinary results.

 

The Opportunity

 

The Systems Engineer will be responsible for the provisioning, rollout, and maintenance of large-scale GPU-based AI clusters. This role will focus on deploying and managing AI computing infrastructure, troubleshooting system issues, and coordinating with onsite personnel and vendors to ensure platform reliability and efficiency.

 

The ideal candidate will have experience in AI and HPC environments, strong Linux system administration skills, and expertise in hardware, software, and networking troubleshooting.

 

Key Responsibilities

 

  • Deploy and configure GPU-based AI compute nodes, storage systems, and networking components.
  • Manage firmware, BIOS, and driver updates to maintain system stability and performance.
  • Work with automation tools to streamline infrastructure provisioning and configuration.
  • Ensure high availability, reliability, and scalability of AI workloads.
  • Diagnose and resolve hardware, software, and network issues in collaboration with onsite teams.
  • Perform root cause analysis (RCA) and corrective actions to prevent recurring failures.
  • Work with vendors (NVIDIA, AMD, Intel, Dell, HPE, etc.) to escalate and resolve technical issues.
  • Provide hands-on troubleshooting support for compute, management, and storage fabrics.
  • Implement and maintain monitoring and alerting tools to track system health and performance.
  • Actively monitor GPU utilization, memory management, and workload distribution.
  • Assist in capacity planning and scaling to support growing AI workloads.
  • Work closely with networking, storage, and DevOps teams to ensure seamless integration of AI workloads.
  • Document procedures, system configurations, and troubleshooting guides.
  • Assist in developing best practices and SOPs for AI infrastructure operations.

 

Required Qualifications

 

  • Bachelor’s Degree in Computer Science or Equivalent
  • 3+ years of experience in systems engineering, HPC, or AI infrastructure management.
  • BA/BS or higher degree in Computer Science or Equivalent
  • Proficiency in Linux system administration (RHEL, Ubuntu, Rocky Linux, etc.).
  • Experience with GPU-based AI clusters and workload orchestration.
  • Strong troubleshooting skills in compute, storage, and networking environments.
  • Familiarity with automation tools (Ansible, Terraform, Bash, Python, etc.).
  • Experience working with on-premise AI infrastructure and cloud-based AI platforms.
  • Ability to collaborate with onsite personnel and vendors for issue resolution.

 

Preferred Qualifications

 

  • Experience with containerized AI workloads (Kubernetes, Docker, Singularity).
  • Knowledge of high-speed Ethernet networking and distributed storage.
  • Familiarity with monitoring tools (Prometheus, Grafana, ELK stack, etc.).
  • Certifications such as RHCSA, NVIDIA DLI, or Kubernetes CKA.

 

Compensation & Benefits

 

The base salary for this full-time position ranges from $109,950 in our lowest geographic market to $164,900 in our highest geographic market. The actual base salary will be determined by various factors, including the position’s location, job-related skills, knowledge, experience, and relevant education or training.

 

Certain roles are eligible for additional rewards, such as merit-based salary increases, annual bonuses, and long-term incentive plans, which are contingent on individual and company performance. Additionally, some positions offer the opportunity to earn sales incentives based on revenue or utilization targets.

 

As a full-time employee, you will also have access to comprehensive benefits, including leading healthcare options (medical, dental, and vision insurance), a 401(k) plan with company matching, company-sponsored short-term and long-term disability coverage, life insurance, paid time off, and various well-being benefits, among others.

 

Equal Employment Opportunity

 

Core42 is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances.

 

If you need assistance and/or a reasonable accommodation to participate in the job application or interview process, or to perform the essential functions of the position, please contact us at USA-ExternalCandidates@core42.ai.