GenAquarius Careers AWS Data Engineer (Glue, Kafka, Airflow)
AWS Data Engineer (Glue, Kafka, Airflow)
Location:
Noida
Remote Type:
Hybrid
Employment Type:
Permanent Full-Time
Job Description
We are seeking a highly skilled Senior AWS Data Engineer with strong expertise in modern data lakehouse architectures, streaming platforms, and enterprise data modeling. The ideal candidate should have hands-on experience with AWS Glue, PySpark, Kafka/MSK, Apache Iceberg/Delta Lake, and Airflow-based orchestration.
Experience in Banking domain concepts such as BIAN,
The role involves designing scalable, metadata-driven, cloud-native data platforms on AWS while ensuring high performance, schema consistency, and support for both batch and real-time processing.
Key Responsibilities
Lakehouse Data Modeling on Amazon S3
- Design and implement Medallion Architecture (Bronze / Silver / Gold layers)
- Build scalable lakehouse data models optimized for partitioning and domain-based access
- Support schema evolution and time-travel capabilities
- Design efficient storage and querying strategies on Amazon S3
AWS Glue + PySpark (ETL Modeling)
- Develop scalable ETL pipelines using AWS Glue and PySpark
- Translate logical and physical data models into optimized PySpark transformations
- Optimize joins, partition pruning, and pushdown predicates for performance
- Manage schemas and metadata using AWS Glue Data Catalog
Schema Design & Metadata Management
- Define canonical schemas and enterprise data contracts
- Maintain centralized metadata repositories using Glue Catalog
- Implement schema versioning and backward compatibility strategies
- Ensure governance and consistency across data domains
Modern Table Formats (Apache Iceberg / Delta Lake)
- Implement ACID-compliant table architectures on Amazon S3
- Design incremental load, CDC, and snapshot-based querying solutions
- Optimize compaction strategies and partition management
- Support scalable analytics and historical data tracking
Streaming & CDC Data Modeling (Kafka / MSK)
- Design event-driven schemas aligned with enterprise domain models
- Build streaming and CDC ingestion pipelines using Kafka/MSK
- Ensure consistency between streaming and batch processing layers
- Support near real-time data integration use cases
Required Skills
- AWS Glue
- PySpark
- Apache Kafka / Amazon MSK
- Apache Iceberg / Delta Lake
- Amazon S3
- AWS Glue Data Catalog
- Apache Airflow
- Data Vault 2.0
- Dimensional Data Modeling
- CDC (Change Data Capture)
- Lakehouse Architecture
- Schema Design & Metadata Management
Preferred Qualifications
- Experience in Banking and Financial Services domain
- Strong understanding of BIAN architecture and CIF concepts
- Experience designing enterprise-scale cloud data platforms
- Strong analytical and problem-solving skills
- Excellent communication and collaboration abilities