Data Engineering

Kafka

  • Development

    • Built kafka based events processing pipeline for third party integrations and SDK events

    • GDPR compliance implementation on Kafka for PII and sensitive information storage and retrieval

  • Deployment

    • Manual deployment of multi node Kafka cluster on vanilla VMs

    • Setup kafka cluster in Azure cloud with HD Insights

Apache Spark

  • Development

    • Batch processing pipeline design and implementation for

      • Digestion to AI Subsystem

      • Business Metrics and reporting

      • Full Stack Microservices

    • Scalatra Based API microservice for AI systems to process data in expressive SQL format that works on top on spark

    • Built multinode spark cluster deployment service with notebook access (github repo here)

  • Deployment

    • Setup Databricks spark cluster on Azure Cloud

    • Manual Deployment of spark cluster on vanilla VMs

Cassandra

  • Design

    • Data Schema design of Cassandra data storage for

      • AI subsystem Consumption

      • Transformed and processed data storage

      • Time series data storage

      • Reporting and business metrics calculation

  • Deployment

    • Manual deployment of multi node Cassandra cluster in use for all production systems (Blog post here)

Streaming

  • Streaming platform prototyping and design for real time data processing requirement

    • Kafka

    • Kafka Streams

    • Faust

    • Spark Streaming

Platform Integrations

Lead building external data integration with following platforms

  • mParticle

  • Clevertap

  • Shopify Apps

  • SDK

System Design

  • Designed and implemented data pipeline for batch data processing and streaming data processing

  • Tech stack included

    • Spark

    • Kafka

    • Cassandra

    • Scala

    • Faust

    • Kafka streams

Last updated