Data Engineer
Mphasis · Chennai, Tamil Nadu, India
Full-time · Staff · Posted 15 days ago
Key Responsibilities
Technical Leadership & Ownership
Own the end-to-end data engineering architecture for large-scale AWS data platforms
Define and enforce data engineering standards, best practices, and governance frameworks
Lead design reviews, code reviews, and technical decision-making across teams
Act as the primary technical escalation point for complex data pipeline issues
ETL/ELT Design & Development
Design, build, and optimize scalable ETL/ELT pipelines using:
AWS Glue (Jobs, Workflows, Crawlers)
PySpark / Spark SQL, Snowflake, SnowsQL
Python-based data processing frameworks
Implement incremental processing, CDC, and data partitioning strategies
Develop reusable and modular data pipeline frameworks for enterprise use
Data Lake & Storage Management
Design and manage data lake architecture on AWS (S3 + Apache Iceberg)
Implement ACID-compliant data layers using Iceberg
Optimize storage formats (Parquet, ORC) and data layouts for performance
Define and enforce data lifecycle, retention, and archival policies
Performance Optimization & Cost Efficiency
Tune Spark/Glue jobs for performance optimization (memory, partitioning, caching)
Optimize workloads for cost efficiency in AWS (compute, storage, I/O)
Monitor and improve pipeline SLAs, throughput, and latency metric
Data Governance & Quality
Implement data quality frameworks, validations, and reconciliation checks
Ensure compliance with data governance, lineage, and security standards
Work with cataloging tools (AWS Glue Data Catalog, etc.) for metadata management
Integration & Orchestration
Design and manage end-to-end orchestration workflows (Glue Workflows, Step Functions, Airflow if applicable)
Integrate data across multiple sources (RDBMS, APIs, streaming platforms, files)
Enable reliable, fault-tolerant, and restartable pipeline execution
Stakeholder Collaboration
Partner with business, analytics, and AI teams to understand data requirements
Collaborate with architects and DevOps teams for environment setup and automation
Provide technical guidance to junior engineers and team members
Team Leadership & Mentoring
Lead and mentor a team of data engineers
Drive skill development in Spark, AWS, and modern data architectures
Ensure adherence to Agile practices and timely delivery of milestones
Required Skills & Experience
Core Technical Skills
Strong experience in AWS Data Engineering stack:
AWS Glue, S3, Lambda, IAM, CloudWatch
Advanced proficiency in:
PySpark / Apache Spark
Spark SQL
Python
Hands-on experience with Apache Iceberg / modern table formats
Deep understanding of ETL/ELT design patterns and data pipelines
Data Engineering Expertise
Experience with data lake and lakehouse architectures
Strong knowledge of data modeling (star/snowflake schemas)
Experience with batch and near real-time processing
Familiarity with file formats (Parquet, ORC, Avro)
Performance & Optimization
Proven experience in large-scale data processing (TB/PB scale)
Strong expertise in query optimization, partitioning, and indexing strategies
DevOps & Automation
Experience with CI/CD pipelines for data workflows
Knowledge of infrastructure as code (CloudFormation/Terraform) is a plus
Familiarity with version control (Git) and deployment strategies
Preferred Skills (Good to Have)
Experience with data orchestration tools (Airflow, Step Functions)
Exposure to streaming frameworks (Kafka, Kinesis)
Knowledge of data security (encryption, masking, access control)
Experience supporting AI/ML data pipelines
Exposure to BI tools (Power BI, Tableau, Sigma)
Qualifications
Bachelor’s/Master’s degree in Computer Science, Engineering, or related field
8–12+ years of experience in data engineering, with 3+ years in a technical leadership role