Shrey Patel
Portfolio 2026
Open to work
Data & ML Engineer

Building the infrastructure for intelligence

Streaming lakehouses · Real-time fraud pipelines · LLM safety systems

Scroll to explore

Where I've built

2 roles · 4+ years

AI Data Engineer

Nov 2024 — Present
Walmart · USA
  • Architected enterprise-scale Azure Lakehouse (ADLS Gen2, Delta Lake, Databricks, Synapse) processing 5+ TB daily across $500M+ annual transaction ecosystems
  • Reduced data latency 65%, reporting cycle from 48h to under 6h
  • Real-time fraud detection with Event Hubs + Spark Structured Streaming monitoring $500M+ transactions — threat response decreased 70%
  • AI recommendation systems (PyTorch, Azure ML, MLflow) influencing 1M+ interactions, +22% engagement, -40% model deployment cycles
  • Delta Lake partitioning & Z-order indexing — BI performance +55%
  • Terraform IaC with PCI-DSS/SOC2 compliance
AzureDelta LakeDatabricksPySparkSynapseEvent HubsMLflowPyTorchTerraform

Software Data Engineer

Jan 2021 — Jul 2023
IBM · India
  • Architected AWS Glue + PySpark ETL processing 100M+ financial transactions annually — 99.9% accuracy, 45% faster, 30% cost reduction
  • Designed Redshift warehouse integrating 15+ enterprise systems, delivering $120K+ annual savings
  • Serverless ingestion (Lambda, S3, Step Functions) eliminating 500+ manual hours at 99.95% SLA
  • Real-time monitoring with Kinesis + Spark Streaming on $400M+ portfolio assets, fraud response -35%
  • SageMaker anomaly detection improving fraud precision 18%
  • Terraform IaC across multi-environment architectures
AWS GluePySparkRedshiftKinesisLambdaSageMakerTerraformSpark Streaming

Technical Graph

Selected Works

← scroll to explore →
01

Rust HFT Risk Engine

Real-time risk engine streaming synthetic venue data with pre-trade credit and price collar checks. PTP timestamped to ClickHouse with deterministic replay.

RustTokioClickHousePrometheusGrafana
<100µsp50 Latency
<500µsp99 Latency
10M+Zero Loss
02

Latent Diffusion Engine

Latent-diffusion pipeline with point-in-time content embeddings trained on 10 H100 nodes with Artbench + OpenImage data.

PyTorchU-NetVAECLIPONNXTensorRT
−20%FID
1.84sInference
10×H100HPC Scale
03

Notion Data Spine Clone

CRDT collaborative editing on Kafka, Postgres, and gRPC with backpressure handling and idempotent writes.

KafkaPostgresgRPCCRDTsDockerPrometheus
3KOps/sec
<30msMedian Lat
1000+Concurrent
04

HPC Image Captioning

Memory-efficient DDP with CNN-RNN, crash recovery via gradient checkpointing across DenseNet169, InceptionV3, ResNet50.

PyTorch DDPSlurmMPICNN-RNN
+15%BLEU
−40%Memory
−50%Train Time

How I see

Credentials

Certifications

O
OCI Multicloud Architect Professional
Oracle · 2025

Enterprise multi-cloud architecture design across AWS, Azure, and OCI platforms.

Verify ↗
O
OCI Cloud Database Services Professional
Oracle · 2025

Cloud database design, migration patterns, and autonomous database management.

Verify ↗
O
OCI Data Science Professional
Oracle · 2025

Machine learning model lifecycle, feature stores, and model deployment on OCI.

Verify ↗
O
OCI Generative AI Professional
Oracle · 2025

LLM integration, RAG architecture, and prompt engineering on Oracle Cloud.

Verify ↗
N
NVIDIA Certified Professional GenAI/LLMs
NVIDIA · 2026

GPU-accelerated LLM inference, TensorRT optimization, and multi-node training.

Verify ↗
D
Databricks ML Professional
Databricks · 2026

MLflow lifecycle, feature engineering, AutoML, and production MLOps on Unity Catalog.

Verify ↗
M
McKinsey Forward Program
McKinsey.org · 2025

Strategic problem-solving, structured communication, and leadership development.

Verify ↗
A
AWS Academy Cloud Foundations
AWS · 2021

Core AWS services, security, architecture, and cloud economics fundamentals.

Verify ↗
A
AWS Academy Machine Learning
AWS · 2021

SageMaker model training, deployment pipelines, and ML infrastructure on AWS.

Verify ↗

Education

Sep 2023 — May 2025
Northeastern University
M.S. Computer Software Engineering
GPA: 3.4/4.0

Generative AI, HPC with Deep Learning, Big Data & Indexing. Co-founder at CareWallet.

Jun 2018 — May 2022
Ganpat University
B.Tech Computer Science · Big Data
GPA: 3.8/4.0

Probability & Statistics, Cloud Computing. 2× GCP Quest Leader. Google Code-in Mentor.

Contributions

ShreyPatel4
@ShreyPatel4
249 Commits · 42 PRs · 19 Repos
Follow
Python 47% Shell 29% JavaScript 24%

Word of mouth

"Shrey is rare in that he combines strong engineering depth with clear product thinking. He's thoughtful about schemas, data quality, and monitoring..."

HK
Harsh Kakadiya
Lead Data Engineer, Kenexai

"He managed end-to-end data workflows — handling everything from source extraction and complex transformations to delivering well-structured data marts."

CK
Chandan Kamal
Lead Data Engineer, Kenexai

"His proactive stance on identifying and mitigating potential challenges underscores his critical role in our risk management strategies."

GC
Grant Chau
Graduate Research Assistant, Dartmouth

"Shrey offered insights that added strength to our technological foundation, paving the way for operations that scale."

ES
Evan Smith
Co-Founder, Eden

Let's talk

Or let Spectra set up a call (Voice coming soon)

Or reach me directly: patelshrey77@gmail.com