Skip to content
Back To Careers

AWS Data Engineer (Glue, Kafka, Airflow)

Location:

Noida

Remote Type:

Hybrid

Employment Type:

Permanent Full-Time

Job Description

We are seeking a highly skilled Senior AWS Data Engineer with strong expertise in modern data lakehouse architectures, streaming platforms, and enterprise data modeling. The ideal candidate should have hands-on experience with AWS Glue, PySpark, Kafka/MSK, Apache Iceberg/Delta Lake, and Airflow-based orchestration.

Experience in Banking domain concepts such as BIAN,

The role involves designing scalable, metadata-driven, cloud-native data platforms on AWS while ensuring high performance, schema consistency, and support for both batch and real-time processing.

Key Responsibilities

Lakehouse Data Modeling on Amazon S3

  • Design and implement Medallion Architecture (Bronze / Silver / Gold layers)
  • Build scalable lakehouse data models optimized for partitioning and domain-based access
  • Support schema evolution and time-travel capabilities
  • Design efficient storage and querying strategies on Amazon S3

AWS Glue + PySpark (ETL Modeling)

  • Develop scalable ETL pipelines using AWS Glue and PySpark
  • Translate logical and physical data models into optimized PySpark transformations
  • Optimize joins, partition pruning, and pushdown predicates for performance
  • Manage schemas and metadata using AWS Glue Data Catalog

Schema Design & Metadata Management

  • Define canonical schemas and enterprise data contracts
  • Maintain centralized metadata repositories using Glue Catalog
  • Implement schema versioning and backward compatibility strategies
  • Ensure governance and consistency across data domains

Modern Table Formats (Apache Iceberg / Delta Lake)

  • Implement ACID-compliant table architectures on Amazon S3
  • Design incremental load, CDC, and snapshot-based querying solutions
  • Optimize compaction strategies and partition management
  • Support scalable analytics and historical data tracking

Streaming & CDC Data Modeling (Kafka / MSK)

  • Design event-driven schemas aligned with enterprise domain models
  • Build streaming and CDC ingestion pipelines using Kafka/MSK
  • Ensure consistency between streaming and batch processing layers
  • Support near real-time data integration use cases

Required Skills

  • AWS Glue
  • PySpark
  • Apache Kafka / Amazon MSK
  • Apache Iceberg / Delta Lake
  • Amazon S3
  • AWS Glue Data Catalog
  • Apache Airflow
  • Data Vault 2.0
  • Dimensional Data Modeling
  • CDC (Change Data Capture)
  • Lakehouse Architecture
  • Schema Design & Metadata Management

Preferred Qualifications

  • Experience in Banking and Financial Services domain
  • Strong understanding of BIAN architecture and CIF concepts
  • Experience designing enterprise-scale cloud data platforms
  • Strong analytical and problem-solving skills
  • Excellent communication and collaboration abilities

I’m interested