Python, Pyspark Developer
Infosys · Hyderabad, Telangana, India
Full-time · Senior · Posted 10 days ago
Technology->Analytics - Packages->Python - Big Data,Technology->Big Data - Data Processing->PySpark
Design, develop, and maintain scalable batch/stream data pipelines using Python and PySpark in distributed environments.
Implement efficient transformations, aggregations, and joins on large datasets while ensuring performance and cost optimization.
Write optimized SQL for data extraction, validation, and reconciliation across multiple sources.
Build reusable, testable modules and follow engineering best practices (code reviews, unit testing, documentation).
Troubleshoot production issues, perform root-cause analysis, and implement long-term fixes and monitoring improvements.
Collaborate with stakeholders to translate requirements into technical designs, delivery plans, and measurable outcomes.
Ensure data quality through validation checks, anomaly detection patterns, and consistent schema management.
Contribute to continuous improvement of development standards, performance benchmarks, and pipeline reliability.
Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
5–9 years of hands-on experience in software development and/or data engineering roles.
Strong proficiency in Python with experience building production-grade applications or data workflows.
Strong proficiency in PySpark, including DataFrame APIs, optimization techniques, and distributed processing concepts.
Working knowledge of SQL for complex queries, data analysis, and validation.
Experience delivering reliable solutions with attention to performance, scalability, and maintainability.