Overview:

About Us

Core42 is the UAE’s national-scale enabler for cloud and generative AI, combining the G42 Group’s deep expertise across multiple technology disciplines into one unified platform for digital transformation. Building on our capabilities as a sovereign cloud and HPC specialist, we bring world-class innovation in generative AI, cybersecurity, professional and managed services to empower large-scale enterprise and public sector transformations across industries.

The Opportunity

We are seeking a Major Incident & Problem Manager — a decisive, calm leader who thrives under pressure, commands the bridge during live incidents, and drives the recovery of critical services across complex, multi-vendor environments.

Beyond crisis leadership, this role focuses on continuous improvement and preventive insight — analysing incident trends and alert data to identify root causes, reduce noise, and drive sustainable service stability.

If you are driven by clarity in chaos, have the curiosity to grasp complex systems quickly, and can translate incident data into meaningful action, this is your opportunity to make a national-scale impact.

Responsibilities:

Incident Leadership & Crisis Management

 

  • Lead live or potential Major Incidents (P1/P2), assembling cross-functional technical teams and maintaining clear communication on business impact, actions, and resolution progress.

  • Manage incident bridges in high-pressure, multi-vendor environments, ensuring composure and conflict resolution.

  • Conduct structured post-incident reviews, driving actionable preventive measures with clearly defined ownership and accountability.

  • Deliver transparent, audience-specific updates to internal leadership and stakeholders.

  • Provide on-call leadership for critical incidents outside of business hours.

 

Problem Management

  • Manage the full Problem Management lifecycle, from identification to root cause analysis (RCA) and closure.

  • Ensure RCA outcomes address systemic issues, not just symptoms.

  • Translate lessons from incidents into preventive process and configuration improvements across operations and delivery teams.

  • Analyse historical incident and alert data to identify recurring issues, reduce alert noise, and recommend automation and tuning opportunities.

  • Produce periodic trend reports and insights — highlighting key offenders, noise reduction progress, and improvement metrics.

 

Qualifications:

Qualifications

  • 6–10 years of experience in IT Service Management (ITSM) or Operations.

  • 5+ years leading Major Incidents and Problem Management within a Managed Service Provider (MSP) environment.

  • Proven experience driving data-driven stability or noise-reduction programs (preferred).

  • Proficient in ITSM platforms such as ServiceNow or SMAX.

  • Familiarity with monitoring and alerting tools and their underlying mechanisms.

  • Experience with data reporting or ETL tools (e.g., Power BI, Power Query) is advantageous.

  • Strong understanding of Infrastructure, Cloud, Application, and Network dependencies.

  • Exposure to multi-cloud or hybrid infrastructure environments (particularly Azure) preferred.

  • Exceptional crisis leadership, cross-functional coordination, and stakeholder communication skills.

  • Analytical mindset with a focus on service reliability and operational excellence.

  • ITIL v4 Foundation certification required.

  • ITIL MP, ISO 20000, or equivalent process specialization certifications preferred.