Lead Data Scientist

Straive · Karnataka, India

Full-time · Senior · Posted 17 days ago

Job Title: Data Scientist
Job Type:-Permanent
Years of Experience – 7+ Years

About Straive:
Straive is a market leading Content and Data Technology company providing data services, subject matter expertise, & technology solutions to multiple domains. Data Analytics & Al Solutions, Data Al Powered Operations and Education & Learning form the core pillars of the company’s long-term vision. The company is a specialized solutions provider to business information providers in finance, insurance, legal, real estate, life sciences and logistics. Straive continues to be the leading content services provider to research and education publishers. Data Analytics & Al Services: Our Data Solutions business has become critical to our client's success. We use technology and Al with human experts-in loop to create data assets that our clients use to power their data products and their end customers' workflows. As our clients expect us to become their future fit Analytics and Al partner, they look to us for help in building data analytics and Al enterprise capabilities for them. With a client-base scoping 30 countries worldwide, Straive’s multi-geographical resource pool is strategically located in eight countries - India, Philippines, USA, Nicaragua, Vietnam, United Kingdom, and the company headquarters in Singapore.
Website: https://www.straive.com/

Senior Data Analyst - Generative AI & Automation

Experience: 7+ Years AI/ML Space (2+ Years specifically in Generative AI/LLMs)

About the Role
We are looking for a high-impact Senior Data Analyst with a strong engineering foundation to join our AI team. You will go beyond standard analytics, building and optimizing production-grade Generative AI solutions. You will be responsible for creating RAG systems, integrating multimodal models, and implementing agentic workflows that turn unstructured data into business insights and automated actions.

Responsibilities
Production RAG Development: Build and optimize end-to-end RAG systems, implementing Hybrid Search (dense vector embeddings + BM25) and advanced reranking strategies to maximize retrieval precision.
Multimodal AI Workflows: Integrate Vision-Language Models (e.g., GPT-4o, Claude 3.5, Idefics) to analyze visual data, documents, and images for automated insights extraction.
Advanced Prompt Engineering: Develop, test, and maintain complex prompt strategies (Chain-of-Thought, ReAct) using OpenAI/Anthropic APIs and the Hugging Face ecosystem.
Vector Database Management: Design indexing strategies and manage high-dimensional data at scale within vector stores such as Pinecone, Weaviate, Milvus, or Pgvector.
High-Performance Python Engineering: Write scalable, asynchronous Python code (FastAPI/asyncio) to handle high-throughput AI API calls and API-based data processing.
Agentic Orchestration & Deployment: Use LangChain or LlamaIndex to orchestrate LLM workflows and deploy containerized applications using Docker and Kubernetes on cloud platforms (AWS/Azure/GCP).
Data Intelligence: Perform advanced NLP preprocessing (semantic chunking, metadata filtering) to improve context quality for LLMs.

Technical Requirements
Experience: 7+ years in AI/ML, with 2+ years of hands-on experience building/deploying Generative AI applications.
Languages: Expert-level Python programming (asynchronous, OOPs, high-performance).
GenAI/LLM Ecosystem: Deep proficiency with OpenAI, Anthropic, LangChain, LlamaIndex, and Hugging Face. Text Analytics experience is mandatory.
RAG & Search: Hands-on experience with vector databases (Pinecone, etc.) and Hybrid Search implementation.
Multimodal: Experience with VLMs for document intelligence or image understanding.
Cloud & DevOps: Containerization (Docker) and Cloud AI services (AWS/Azure/GCP)