Data Engineering
Kafka
Development
Built kafka based events processing pipeline for third party integrations and SDK events
GDPR compliance implementation on Kafka for PII and sensitive information storage and retrieval
Deployment
Manual deployment of multi node Kafka cluster on vanilla VMs
Setup kafka cluster in Azure cloud with HD Insights
Apache Spark
Development
Batch processing pipeline design and implementation for
Digestion to AI Subsystem
Business Metrics and reporting
Full Stack Microservices
Scalatra Based API microservice for AI systems to process data in expressive SQL format that works on top on spark
Built multinode spark cluster deployment service with notebook access (github repo here)
Deployment
Setup Databricks spark cluster on Azure Cloud
Manual Deployment of spark cluster on vanilla VMs
Cassandra
Design
Data Schema design of Cassandra data storage for
AI subsystem Consumption
Transformed and processed data storage
Time series data storage
Reporting and business metrics calculation
Deployment
Manual deployment of multi node Cassandra cluster in use for all production systems (Blog post here)
Streaming
Streaming platform prototyping and design for real time data processing requirement
Kafka
Kafka Streams
Faust
Spark Streaming
Platform Integrations
Lead building external data integration with following platforms
mParticle
Clevertap
Shopify Apps
SDK
System Design
Designed and implemented data pipeline for batch data processing and streaming data processing
Tech stack included
Spark
Kafka
Cassandra
Scala
Faust
Kafka streams
Last updated