# Data Engineering

## Kafka

* **Development**
  * Built kafka based events processing pipeline for third party integrations and SDK events
  * GDPR compliance implementation on Kafka for PII and sensitive information storage and retrieval
* **Deployment**
  * Manual deployment of multi node Kafka cluster on vanilla VMs
  * Setup  kafka cluster in Azure cloud with HD Insights

## Apache Spark

* **Development**
  * Batch processing pipeline design and implementation for&#x20;
    * Digestion to AI Subsystem
    * Business Metrics and reporting
    * Full Stack Microservices&#x20;
  * Scalatra Based API microservice for AI systems to process data in expressive SQL format that works on top on spark
  * Built  multinode spark cluster deployment service with notebook access (github repo here)
* **Deployment**
  * Setup Databricks spark cluster  on Azure Cloud&#x20;
  * Manual Deployment of spark cluster on vanilla VMs

## Cassandra

* **Design**
  * Data Schema design of Cassandra data storage for&#x20;
    * AI subsystem Consumption
    * Transformed and processed data storage
    * Time series data storage
    * Reporting and business metrics calculation
* **Deployment**
  * Manual deployment of multi node Cassandra cluster in use for all production systems (Blog post here)

## Streaming

* Streaming platform prototyping and design for real time data processing requirement
  * Kafka
  * Kafka Streams
  * Faust
  * Spark Streaming

## Platform Integrations

Lead building external data integration  with following platforms

* &#x20;mParticle
* Clevertap
* Shopify Apps
* SDK

## System Design

* Designed and implemented data pipeline for batch data processing and streaming data processing
* Tech stack included
  * Spark
  * Kafka
  * Cassandra
  * Scala
  * Faust
  * Kafka streams


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://me.julu.dev/work/data-engineering.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.