~/gurunath · senior ml · data · inference engineer · chennai, in

// GENERALIST ENGINEER · 9+ YEARS · DATA → ML → PLATFORM → INFERENCE

Gurunath
L V

I build the machinery that makes data and intelligence move — from billion-row pipelines to the inside of an LLM's KV cache.

Data Engineering ML & MLOps Platform & Distributed Systems AI & LLM Systems Inference Engineering Web3 & DePIN

Read recent writing → New writing on Substack ↗

SCROLL

Years across the stack

Records / day at peak

Teams shipped with

Open-source projects

The short version

01 / WHO

I'm a generalist by instinct. Over the last nine years I've moved up and down the stack — wrangling billions of records a day through data lakehouses, shipping ML systems to production, building the platforms that hold them up, and lately optimizing the guts of LLM inference. I'm drawn to the hard, ambiguous problems that live between disciplines — the ones where nobody's quite sure whose job it is. Right now that means inference infrastructure at IO.net, and a growing fascination with web3, DePIN, and what happens when compute itself becomes a marketplace.

What I work on

02 / DOMAINS

Inference Engineering

Squeezing latency and cost out of LLM serving — KV cache optimization, distributed cache offloading, disaggregated prefill / decode.

vLLM · Aibrix · LMCache

Data Engineering

Lakehouses and streaming frameworks moving 1–5B records/day at sub-second query latency, benchmarked to 100B.

Spark · ClickHouse · Iceberg · Trino

ML & MLOps

End-to-end ML systems following MLOps best practices — research-to-production pipelines, retraining, model promotion.

PyTorch · XGBoost · MLflow · Ray

AI & LLM Systems

RAG pipelines, MCP servers, and production AI agents with end-to-end observability, tool use, and fault tolerance.

LangChain · RAG · MCP · OpenRouter

Platform & Distributed

Managed compute platforms on EMR, EKS and bare metal — Ray clusters, container-as-a-service, orchestration with Temporal.

Kubernetes · Ray · SkyPilot · AWS

Web3 & DePIN

Block-reward systems for DePIN GPU suppliers — designing and A/B testing distribution formulas behind a token launch on Solana.

DePIN · Solana · token economics

The toolkit · things I reach for

▹vLLM ▹Aibrix ▹LMCache ▹Ray ▹PyTorch ▹Spark ▹ClickHouse ▹Iceberg ▹Trino ▹Kubernetes ▹SkyPilot ▹MLflow

▹LangChain ▹MCP ▹RAG ▹XGBoost ▹Temporal ▹Delta Lake ▹Dask ▹OpenRouter ▹Airflow ▹AWS ▹Solana ▹Streamlit

Currently

03 / NOW

ACTIVE Senior ML Software Engineer & Data Research Analyst · IO.net · Feb 2024 → now

Driving inference infrastructure for a decentralized GPU cloud.

▸Hosted & fine-tuned open LLMs (Qwen, GLM, DeepSeek, LLaMA) on vLLM + Aibrix for low-latency enterprise inference.

▸KV cache optimization & distributed offloading to cut memory footprint and latency at scale.

▸Disaggregated prefill–decode cluster deployments improving TTFT and token throughput.

▸Integrated io-intelligence as a provider in OpenRouter, expanding ecosystem reach.

▸Built a ChatGPT-style unified interface — web search, image & video gen, RAG.

▸Shipped an MCP server letting agents create & manage GPU clusters programmatically.

▸Pioneered a block-rewards system for DePIN GPU suppliers behind the IO-COIN launch on Solana.

▸Contributed to the cloud platform — Ray clusters, CaaS, bare metal, SkyPilot marketplace integrations.

The track record

04 / PATH

2024 — now

IO.net · Inference & DePIN

LLM inference infra, AI agents, and GPU-supplier reward systems for a decentralized compute network.

2022 — 2025

Chargebee · Data Platform

Enterprise lakehouse & streaming frameworks processing 1–5B records/day with sub-second latency.

2021 — 2022

Nike · Platform Team

Managed big-data service on AWS EMR + Spark; org-wide job orchestration on EKS.

2020 — 2021

Mercedes-Benz · Analytics

Built an analytics platform end to end — ingestion through interactive dashboards and PDF reporting.

2017 — 2020

Prodapt · ML & Data

Airflow ETL, streaming + batch systems, anomaly detection & time-series forecasting in production.

Out in the open

05 / OSS

Contributions and published packages across distributed computing and LLM tooling.

✦ LiteLLM provider ✦ SkyPilot IO Cloud ✦ dask-sql maintainer ✦ Delta Lake CI/CD ✦ Dask & Ray ✦ Trino LLM plugin ✦ dask-deltalake ✦ streamlit-reactflow ✦ mlvajra ✦ lakehouse-sharing

Recent writing

06 / DEEP DIVES

Three hands-on studies of Preferred Networks' PLaMo model family — speculative decoding, LoRA fine-tuning for function calling, and prefill/decode batch-scaling — all measured on a single laptop.

Inference · llama.cpp

Speculative Decoding for PLaMo 2

A measured negative result: draft-model and n-gram speculation for the Mamba-hybrid plamo-2-8b on Apple Silicon — plus a novel quantization bug found along the way.

Fine-Tuning · LoRA

Teaching PLaMo 3 to Call Functions

The first open-weights PLaMo with function calling — a LoRA on PFN's hidden control tokens, and two evaluation traps that took argument accuracy from 55% to 100%.

Serving · Batching

Prefill vs Decode

Where the time goes on a laptop: prefill is compute-bound, decode is bandwidth-bound (19× slower/token), and what continuous batching buys each architecture.

The archive

07 / ARCHIVE

HEADS UP These are older notes from my early ML days — kept here for the curious. Fresh, deeper writing is brewing over on guruengineering.substack.com. New posts coming soon.

MLOps

Gurunath
L V

The short version

What I work on

Inference Engineering

Data Engineering

ML & MLOps

AI & LLM Systems

Platform & Distributed

Web3 & DePIN

Currently

Driving inference infrastructure for a decentralized GPU cloud.

The track record

Out in the open

Recent writing

Speculative Decoding for PLaMo 2

Teaching PLaMo 3 to Call Functions

Prefill vs Decode

The archive

MLVajra

Airflow, Under the Hood

Learning Algorithms

Random Forest

Gradient Boosting

Types of GBMs

Recommendation Systems