Byteridge - Machine Learning Engineer - Infrastructure & Optimisation
Byteridge · Pune, Maharashtra, India
Full-time · Senior · Posted 1 month ago
Description
About the Role :
Byteridge is seeking a Rapid Prototyping Engineer specializing in AI Infrastructure & Optimization to work with our most strategic customers on deploying, fine-tuning, and optimizing large language models at scale. You will be at the forefront of Byteridge's AI infrastructure capabilities, helping customers unlock the full potential of foundation models through expert-level deployment on GPU infrastructure.
This highly technical role requires deep expertise in machine learning infrastructure, GPU optimization, and production ML systems, combined with the ability to translate complex technical concepts into customer success.
What You'll Do
Model Deployment & Optimization :
Lead end-to-end deployments of large language models on AWS infrastructure for strategic customers
Design and implement training, fine-tuning, and inference pipelines using Amazon SageMaker AI
Optimize model performance through GPU-level tuning, kernel optimization, and infrastructure configuration
Deploy models on diverse GPU architectures including NVIDIA and AWS custom silicon (Trainium, Inferentia)
Infrastructure Architecture & Performance
Architect scalable ML infrastructure using SageMaker AI Inference, HyperPod, and distributed training frameworks
Implement CUDA-level optimizations and custom kernels for improved model performance
Design storage and networking architectures optimized for high-throughput ML workloads
Troubleshoot and resolve complex performance bottlenecks at the GPU driver and kernel level
Customer Engagement & Technical Leadership
Partner with AWS AI Specialist Solution Architects and customer ML teams to understand model requirements and deployment constraints
Provide technical guidance on model selection, fine-tuning strategies, and production best practices
Conduct performance benchmarking and cost optimization analysis for ML workloads
Share field insights with AWS product teams to influence infrastructure and service roadmaps
What We're Looking For
Core Qualifications :
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience (Master's or PhD preferred)
5+ years of experience in machine learning infrastructure, model deployment, or GPU computing
Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow, JAX)
Deep understanding of LLM architectures, training methodologies, and inference optimization
Technical Expertise (High-Level Alignment)
Hands-on experience training, fine-tuning, or deploying large language models in production
Proficiency with GPU programming, CUDA, and kernel-level optimization techniques
Experience with distributed training frameworks and multi-GPU/multi-node orchestration
Strong knowledge of AWS core services : EC2 (GPU instances), S3, EFS, VPC, and networking
Preferred Experience
Direct experience with Amazon SageMaker AI (Training, Inference, HyperPod) or equivalent ML platforms
Understanding of GPU architectures (NVIDIA A100, H100) and AWS custom silicon (Trainium, Inferentia)
Experience with model compression techniques (quantization, pruning, distillation)
Knowledge of MLOps practices, model monitoring, and production ML system design
Background in high-performance computing, distributed systems, or systems programming
Essential Attributes
Ability to dive deep into technical problems and debug complex infrastructure issues
Strong analytical skills with data-driven approach to optimization
Excellent communication skills to explain complex technical concepts to diverse audiences
Comfortable working in ambiguous, fast-paced environments with evolving requirements
Ownership mindset with ability to drive projects from architecture to production
(ref:hirist.tech)