Lead Data Engineer

Inception, a G42 company, is the region’s leading innovator of AI-powered domain-specific as well as industry-agnostic products, built on a rich heritage of research and development. Within the G42 ecosystem, Inception functions as the core intelligence layer – transforming data and compute infrastructure into real-world, applied AI solutions. Beyond its commercial endeavors, Inception is committed to creating positive societal impact. For more information, please visit www.inceptionai.ai

Overview:

Inception is seeking a highly skilled Lead Data Engineer to architect and build scalable, cloud-native data and AI pipelines that power enterprise LLM, RAG, and retrieval systems.

 

Responsibilities:

• Design, build, and optimize scalable data pipelines for AI/LLM workloads, including vectorization and embedding processing.

• Develop and maintain ETL/ELT workflows for structured, unstructured, and streaming data.

• Create and manage vector database indexing and similarity search pipelines using tools like FAISS, Pinecone, Weaviate, Qdrant, Chroma.

• Build retrieval systems for RAG, semantic search, and enterprise knowledge retrieval.

• Develop robust, reusable data orchestration pipelines using Airflow, Spark, or similar tools.

• Architect and manage data pipelines across Azure (primary), AWS, and GCP environments.

• Integrate and optimize storage and processing across SQL, NoSQL, and vector databases.

• Contribute to the design and implementation of event-driven architectures.

• Collaborate with AI teams to enable embedding generation, LLM integration, and model-serving pipelines.

• Ensure end-to-end data quality, monitoring, reliability, and observability.

• Lead or participate in system design for large-scale, distributed data and AI systems.

 

Required Skills 

 

Programming & Data

• Strong expertise in Python for data processing, APIs, automation, or distributed workloads.

• Strong proficiency in SQL and knowledge of NoSQL databases (MongoDB, DynamoDB, Cosmos DB, etc.).

• Experience with vector databases, such as: FAISS, Pinecone, Weaviate, Qdrant, Chroma.

• Strong knowledge of data modeling, pipeline development, and ETL/ELT frameworks.

AI/LLM Infrastructure

• Solid understanding of vectorization, embeddings, and similarity search techniques.

• Familiarity with LLMs, embedding models, and RAG pipeline concepts.

• Experience integrating embedding-generation pipelines via Hugging Face, OpenAI, or other model providers.

Cloud & Distributed Systems

• Proficiency with Azure (primary), and familiarity with AWS and GCP.

• Experience with Docker and containerized development.

• Understanding of Kubernetes is a strong plus.

Orchestration & Big Data

• Expertise in Apache Airflow for scheduling and orchestration.

• Experience with Apache Spark or equivalent distributed processing frameworks.

Architecture & Engineering Fundamentals

• Strong system design fundamentals for scalable and distributed systems.

• Knowledge of event-driven architecture and modern data platforms.

• Strong understanding of DevOps, CI/CD, version control, and observability best practices.

Qualifications:

  • 8+ years of progressive experience in data engineering, distributed systems, or AI/ML data infrastructure
  • Experience building RAG pipelines in production.
  • Knowledge of graph databases or hybrid search systems.
  • Understanding of model deployment, inference optimization, and caching techniques for LLM workloads.
  • Familiarity with data governance, IAM, and security patterns across cloud ecosystems
What We Look For
If you are a performance-driven, inquisitive mind with the agility to adapt to ambiguity, you will fit right in. You should be eager to explore opportunities to build meaningful collaborations with stakeholders and aspire to create unique customer-centric solutions. Bias for action and a passion to conquer new frontiers in the AI space is at the heart of the Inception community.
 
What Working At Inception Offers
 
Culture: An open, diverse and inclusive environment with a global vision that encourages personal growth and focuses on ground-breaking, industry-first innovations.
Career: Outstanding learning, development & growth opportunities via structured training programs and innovative, high-tech projects.
Rewards: A competitive remuneration package with a host of perks including healthcare, education support, leave benefits and more.
 
If you can confidently demonstrate that you meet the criteria above, please contact us as soon as possible.