Overview:

About Us

Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs. With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning itself at the forefront of AI innovation in the Middle East and beyond.


The opportunity

We are seeking a highly skilled Lead Engineer – Network Operations to oversee the daily operations and support of the network infrastructure underpinning our global high-performance computing (HPC) environments. This role is responsible for ensuring high availability, security, and optimal performance of switches, firewalls, and network fabrics that support large-scale AI and ML workloads across geographically distributed data centers. The ideal candidate brings deep hands-on experience with enterprise-grade network technologies, low-latency HPC fabrics (e.g., InfiniBand), and automation of network operations.

Responsibilities:

Your key responsibilities

  • Lead the daily operational support of HPC network infrastructure, including Layer 2/3 switches, routers, firewalls, and RDMA-based fabrics (e.g., InfiniBand, RoCE), ensuring network performance and reliability.
  • Troubleshoot and resolve complex network issues affecting HPC workloads and services, minimizing downtime and maximizing throughput.
  • Configure, upgrade, and maintain enterprise-grade firewalls, VPNs, ACLs, and routing protocols (e.g., BGP, OSPF), ensuring network security and performance.
  • Provide network integration support for HPC platforms, including Slurm, Kubernetes, and bare-metal provisioning systems.
  • Design and manage IP address planning, VLAN configurations, network segmentation, and security zones in alignment with operational and compliance requirements.
  • Develop and maintain network automation scripts and infrastructure-as-code solutions (e.g., Ansible, Python, Terraform) to optimize processes and reduce human error.
  • Collaborate closely with compute, storage, security, and site reliability teams to design and implement scalable, resilient, and high-performance network solutions for AI workloads.
  • Document network architecture, configurations, runbooks, and change management procedures in accordance with ITIL/ISO standards.
  • Participate in on-call rotations, providing support for incident response, change management, and root cause analysis (RCA) processes.
  • Lead root cause analysis (RCA) for operational network issues, contributing to post-mortem documentation and driving continuous improvement efforts.
  • Provide mentorship and technical guidance to junior engineers, helping to build skills and foster a collaborative environment.
  • Ensure strict adherence to security and operational policies and assist with audits and documentation related to change and incident management processes.

Qualifications:

 

What we’re looking for

(a) Required skills / qualifications

  • Bachelor’s degree in Network Engineering, Computer Science, or a related field; or equivalent hands-on experience.
  • Minimum of 8 years of experience in enterprise network operations or engineering roles, with at least 2 years in a lead or ownership capacity.
  • Extensive hands-on experience with data center networking equipment (e.g., Cisco, Arista, Juniper, Mellanox, or NVIDIA Networking).
  • Deep understanding of Layer 2/3 protocols, the TCP/IP stack, multicast, QoS, and VLAN/VXLAN/EVPN technologies.
  • Proficiency in configuring and managing firewalls (e.g., Palo Alto, Fortinet, Cisco ASA) and VPN solutions to ensure secure network operations.
  • Proven experience in supporting low-latency, high-throughput networks in HPC, AI/ML, or cloud-scale environments.
  • Hands-on experience with InfiniBand or RoCE technologies for HPC network environments.
  • Familiarity with Kubernetes networking (e.g., CNI plugins, network policies, service meshes) for cloud-native networking.
  • Exposure to CI/CD, Git, and modern DevNet practices for automating and optimizing network infrastructure.

Compensation

The U.S. base salary range for this full-time role is US$133,200 to US$199,800 per year, with bonus and benefits on top. Salary ranges are determined by role, level, and location. The range listed represents the minimum and maximum target salary for new hires across all U.S. locations. Actual compensation within this range will depend on factors such as work location, job-related skills, experience, and relevant education or training

 

What Working at Core42 Offers

With a diverse team of 1,100+ employees from 68 nationalities, we foster an inclusive, innovative, and collaborative environment. At Core42, we are grounded in trust, accountability, and high performance. We are united by our values: Grit, Passion, and Impact—driving resilience, excellence, and meaningful progress across everything we do.

 

Core42 is committed to building a diverse and inclusive workplace. As an equal opportunity employer, Core42 does not discriminate based on race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or any other legally protected status. In compliance with the Americans with Disabilities Act (ADA), we provide reasonable accommodations to qualified individuals with disabilities throughout the application and employment process. If you need assistance or a reasonable accommodation, please contact reasonableaccommodations@core42.com, including the role you are applying for and the accommodation required.