Role: Senior Data Engineer - AI Platform
Function: Data Engineering
Location: Mumbai, India
Type: Full-time
Industry: AI/Technology
About Company
The company is building the AI layer for Bharat at India-scale. Backed by partnerships with global tech leaders like Meta and Google, the team is creating AI that serves the entire Indian user base—across languages, contexts, and daily needs. This is AI designed for real adoption, not experiments.
They bring a rare combination of deep India-first AI capability and unmatched India-scale distribution. The focus is a platform-and-product stack that makes AI useful, reliable, and safe for everyday consumers. It’s engineered from day one for massive scale—100M+ users early and 1B-ready constraints on latency, cost, reliability, and safety.
If you want to be part of a fast-moving, high-ambition team building technology with real-world reach, this is that opportunity. The culture emphasises engineering excellence, strong collaboration, and tangible impact across sectors that matter to India—while building toward a category-defining consumer AI experience.
Position Overview
You'll design and implement core data pipelines and semantic data models that power AI solutions at massive scale. You'll be instrumental in building the data infrastructure that enables high-quality retrieval, grounding, and reduced hallucination for multi-domain LLM applications. You'll work alongside global tech leaders to create meaningful impact across language, public sector, enterprise technology, and deep learning domains.
Role & Responsibilities
- Design and build scalable data pipelines for multi-domain LLM applications
- Implement semantic data models that enable high-quality information retrieval
- Develop and optimize vector database solutions for embedding storage and retrieval
- Create robust ETL/ELT systems for processing large-scale datasets
- Build metadata enrichment pipelines to improve retrieval accuracy
- Design GraphQL schemas for efficient data access patterns
- Optimize chunking strategies and embedding pipelines for reduced hallucination
Must Have Criteria
- 5-10 years of hands-on experience in data engineering or data science
- Expert-level proficiency in Python and SQL for data processing
- Experience building data systems supporting 100+ million users or equivalent scale
- Proven experience building and maintaining ETL/ELT systems at scale
- Hands-on experience with GraphQL schema design and implementation
- Working experience with vector databases (Pinecone, Milvus, or Weaviate)
- Experience with workflow orchestration tools (Airflow, Spark, or Flink)
Nice to Have
- Experience in hyperscaler domain (cloud platforms, distributed systems at scale)
- Experience with LLM applications and retrieval-augmented generation (RAG)
- Background in machine learning or NLP projects
- Knowledge of semantic search and information retrieval systems
- Experience with multi-tenant, globally distributed data architectures
What We Offer
- Opportunity to build AI infrastructure at gigawatt scale
- Work with cutting-edge LLM and vector database technologies
- Collaborate with global tech leaders and industry pioneers
- Impact millions of users through the company's AI platform
- Competitive compensation and growth opportunities
Apply Now
Share your details below to apply for this job.
Job Description
Role: Senior Data Engineer - AI Platform
Function: Data Engineering
Location: Mumbai, India
Type: Full-time
Industry: AI/Technology
About Company
The company is building the AI layer for Bharat at India-scale. Backed by partnerships with global tech leaders like Meta and Google, the team is creating AI that serves the entire Indian user base—across languages, contexts, and daily needs. This is AI designed for real adoption, not experiments.
They bring a rare combination of deep India-first AI capability and unmatched India-scale distribution. The focus is a platform-and-product stack that makes AI useful, reliable, and safe for everyday consumers. It’s engineered from day one for massive scale—100M+ users early and 1B-ready constraints on latency, cost, reliability, and safety.
If you want to be part of a fast-moving, high-ambition team building technology with real-world reach, this is that opportunity. The culture emphasises engineering excellence, strong collaboration, and tangible impact across sectors that matter to India—while building toward a category-defining consumer AI experience.
Position Overview
You'll design and implement core data pipelines and semantic data models that power AI solutions at massive scale. You'll be instrumental in building the data infrastructure that enables high-quality retrieval, grounding, and reduced hallucination for multi-domain LLM applications. You'll work alongside global tech leaders to create meaningful impact across language, public sector, enterprise technology, and deep learning domains.
Role & Responsibilities
- Design and build scalable data pipelines for multi-domain LLM applications
- Implement semantic data models that enable high-quality information retrieval
- Develop and optimize vector database solutions for embedding storage and retrieval
- Create robust ETL/ELT systems for processing large-scale datasets
- Build metadata enrichment pipelines to improve retrieval accuracy
- Design GraphQL schemas for efficient data access patterns
- Optimize chunking strategies and embedding pipelines for reduced hallucination
Must Have Criteria
- 5-10 years of hands-on experience in data engineering or data science
- Expert-level proficiency in Python and SQL for data processing
- Experience building data systems supporting 100+ million users or equivalent scale
- Proven experience building and maintaining ETL/ELT systems at scale
- Hands-on experience with GraphQL schema design and implementation
- Working experience with vector databases (Pinecone, Milvus, or Weaviate)
- Experience with workflow orchestration tools (Airflow, Spark, or Flink)
Nice to Have
- Experience in hyperscaler domain (cloud platforms, distributed systems at scale)
- Experience with LLM applications and retrieval-augmented generation (RAG)
- Background in machine learning or NLP projects
- Knowledge of semantic search and information retrieval systems
- Experience with multi-tenant, globally distributed data architectures
What We Offer
- Opportunity to build AI infrastructure at gigawatt scale
- Work with cutting-edge LLM and vector database technologies
- Collaborate with global tech leaders and industry pioneers
- Impact millions of users through the company's AI platform
- Competitive compensation and growth opportunities
Apply Now
Share your details below to apply for this job.
Application Submitted Successfully!
Thank you for applying to Senior Data Engineer - AI Platform. We have received your application and will review it shortly.
You will be redirected shortly...