AWS Data Engineer (Glue, Kafka, Airflow)

Location:

Noida

Remote Type:

Hybrid

Employment Type:

Permanent Full-Time

Job Description

We are seeking a highly skilled Senior AWS Data Engineer with strong expertise in modern data lakehouse architectures, streaming platforms, and enterprise data modeling. The ideal candidate should have hands-on experience with AWS Glue, PySpark, Kafka/MSK, Apache Iceberg/Delta Lake, and Airflow-based orchestration.

Experience in Banking domain concepts such as BIAN,

The role involves designing scalable, metadata-driven, cloud-native data platforms on AWS while ensuring high performance, schema consistency, and support for both batch and real-time processing.

Key Responsibilities

Lakehouse Data Modeling on Amazon S3

Design and implement Medallion Architecture (Bronze / Silver / Gold layers)
Build scalable lakehouse data models optimized for partitioning and domain-based access
Support schema evolution and time-travel capabilities
Design efficient storage and querying strategies on Amazon S3

AWS Glue + PySpark (ETL Modeling)

Develop scalable ETL pipelines using AWS Glue and PySpark
Translate logical and physical data models into optimized PySpark transformations
Optimize joins, partition pruning, and pushdown predicates for performance
Manage schemas and metadata using AWS Glue Data Catalog

Schema Design & Metadata Management

Define canonical schemas and enterprise data contracts
Maintain centralized metadata repositories using Glue Catalog
Implement schema versioning and backward compatibility strategies
Ensure governance and consistency across data domains

Modern Table Formats (Apache Iceberg / Delta Lake)

Implement ACID-compliant table architectures on Amazon S3
Design incremental load, CDC, and snapshot-based querying solutions
Optimize compaction strategies and partition management
Support scalable analytics and historical data tracking

Streaming & CDC Data Modeling (Kafka / MSK)

Design event-driven schemas aligned with enterprise domain models
Build streaming and CDC ingestion pipelines using Kafka/MSK
Ensure consistency between streaming and batch processing layers
Support near real-time data integration use cases

Required Skills

AWS Glue
PySpark
Apache Kafka / Amazon MSK
Apache Iceberg / Delta Lake
Amazon S3
AWS Glue Data Catalog
Apache Airflow
Data Vault 2.0
Dimensional Data Modeling
CDC (Change Data Capture)
Lakehouse Architecture
Schema Design & Metadata Management

Preferred Qualifications

Experience in Banking and Financial Services domain
Strong understanding of BIAN architecture and CIF concepts
Experience designing enterprise-scale cloud data platforms
Strong analytical and problem-solving skills
Excellent communication and collaboration abilities