Senior Engineer - Site Reliability
Overview:
About AIQ:
AIQ is an Abu Dhabi based joint venture company between Presight and ADNOC, which focuses on developing artificial intelligence technologies. AIQ develops and commercializes AI products and applications for energy world. It aims in providing end-to-end solutions by using its data, cloud and talents to develop AI solutions that seek to reduce costs and generate revenue for its clients. AIQ embodies an innovative and entrepreneurial spirit that embraces challenges to push boundaries and seeks to welcome professionals to its team that share the desire to make meaningful and impactful contributions to its mission. Always on the cutting edge of technology, AIQ provides its talent all the opportunities to thrive and excel. Working at AIQ includes dealing with massive data sets, an AI infrastructure that is powered by the latest NVIDIA GPU cloud computing platform and access to limitless computing, storage and network resources.
About the role:
AIQ is looking for a Senior Site Reliability Engineer to lead root cause analysis, improve system reliability, and collaborate across engineering teams to enhance service performance and stability. As a Senior Site Reliability Engineer, you will enhance the reliability and performance of our platforms. You will lead key reliability projects, improve observability, and respond to complex production incidents.
Responsibilities:
- Maintain and evolve monitoring, alerting, and incident response systems.
- Proactively find and fix performance bottlenecks and failure points.
- Contribute to infrastructure automation and deployment pipelines.
- Drive SLO/SLI adoption in collaboration with engineering teams.
- Lead root cause analysis and build preventive solutions.
- Mentor junior engineers and help scale operational excellence.
Qualifications:
- 5-8 years of previous relevant experience.
- Solid experience with containerized environments (Docker, Kubernetes).
- Hands-on with CI/CD pipelines and automation tools.
- Proficiency in scripting languages (Python, Bash).
- Strong grasp of observability tools (Prometheus, Grafana, ELK, Sentry).
- Good knowledge of cloud platforms (Huawei Cloud, Azure preferred)