← Varaksha|Build Timeline

Feb 28 – Mar 31, 2026 · 32 days · V2

How We Built
Varaksha in a Sprint

Two workstreams converging on a single system. Security in pink, ML in blue, shared decisions in gradient.

Security Expert
ML Expert
Together

01

The Problem

India's Unified Payments Interface processes over 500 million transactions daily. Legacy fraud detection operates on batch cycles, introducing delays that allow mule networks to execute and disperse before a single alert is raised. Real-time classification at the transaction layer is a structural necessity, not an optimisation.

02

The Architecture

Varaksha is a five-layer detection pipeline: a Rust privacy gateway that hashes identifiers at ingress, a Random Forest ML engine trained on 111K real transactions, a graph topology analyser for network-pattern fraud, a multilingual alert agent covering 22 Indian languages, and a real-time operations dashboard.

03

The Outcome

96.52% RF accuracy, 0.9952 ROC-AUC (LightGBM fusion). Sub-10ms P99 gateway latency. Four BIS money-mule typologies detected via streaming graph. Fraud alerts in 8 Indian languages. Three deployment tiers: Cloud (Railway), Enterprise (API + graph), Embedded SDK (ONNX Mobile). Built and deployed in 32 days.

Together
Feb 28

Defining the Architecture

Demonstrability is a first-class design constraint.

Five-layer architecture scoped: Rust privacy gateway, ML classifier, graph topology analyser, multilingual alert agent, and ops dashboard. System designed to be comprehensible in under a minute, evaluated under pressure.

5-Layer DesignArchitectureCore Pipeline
Security
Mar 1

Privacy Gateway in Rust

Sensitive identifiers must not persist beyond the perimeter. Everything downstream operates on hashes.

The Actix-Web 4 gateway is the sole component that handles raw Virtual Payment Addresses — SHA-256 hashing is applied at ingress so all downstream services receive only derived identifiers. DashMap provides a lock-free concurrent risk cache across the Actix worker pool; score_to_verdict() threshold logic determines ALLOW, FLAG, and BLOCK classifications.

RustSHA-256DashMapActix-Web 4
ML
Mar 1

ML Baseline Established

A working baseline yields insights that an unimplemented optimal architecture cannot.

Random Forest + XGBoost soft-vote ensemble on transaction velocity, round-amount flag, network out-degree, and time-of-day encoding. Stratified 50K PaySim sample with SMOTE rebalancing. Reference point established for subsequent iterations.

RF + XGBoostSMOTEPaySim
Security
Mar 2

Graph-Based Mule Detection

Network fan-out is a consistent topological signature across all known money-mule architectures.

A NetworkX graph agent runs asynchronously outside the payment critical path, detecting all four BIS Project Hertha mule typologies: fan-out, fan-in, directed cycles, and scatter patterns. Score aggregation uses the maximum across detected patterns to prevent false positives on legitimate high-volume merchants; results push to the Rust risk cache via HMAC-SHA256-signed webhooks.

NetworkXFan-outDirected CyclesAsyncHMAC-SHA256
Security
Mar 2–3

Multilingual Alert Delivery

A fraud alert has no utility if the recipient cannot read the language in which it is issued.

Alerts synthesised in 8 Indian languages via Microsoft Neural TTS (edge-tts) embed the transaction ID, blocked amount, and risk score in the recipient’s preferred language. BLOCK verdicts cite IT Act 2000 §66D and BNS §318(4) verbatim; the template engine is swappable for IndicTrans2 at production time.

8 languagesNeural TTSIT Act 2000 §66Dedge-tts
Together
Mar 3

Integration Proof-of-Concept

End-to-end verdicts validated — from Rust ingress to multilingual alert.

A live operations dashboard confirmed verdicts flowing through all five layers: transaction ingress, hashing, ML scoring, graph analysis, and multilingual alert dispatch. Force-directed network visualization, Hindi alert panel, and 50-event audit log. All data is synthetic—no real PII processed.

5-Layer PipelineLive DashboardAudit LogSynthetic
ML
Mar 5–7

Model Architecture Overhaul

At 450 MB combined, the ensemble consumed nearly the entire memory budget for a sub-0.005 accuracy gain.

XGBoost was removed from the serving stack: RF-300 achieves ROC-AUC 0.9869 in isolation and the marginal ensemble gain was insufficient to justify 450 MB combined weight. Feature engineering expanded from 8 to 16 variables, incorporating balance_drain_ratio, account_age_days, previous_failed_attempts, and transfer_cashout_flag; the output artefact became varaksha_rf_model.onnx.

RF-300 only16 features75K rowsONNXROC-AUC 0.9869
Together
Mar 9–10

Production Deployment

Static export to a global edge network eliminates cold starts and infrastructure overhead from the demonstration path entirely.

Next.js 15 configured with static export and deployed to Cloudflare Pages eliminates cold starts and Node.js server overhead from the demonstration path. The frontend ships three routes: a live stats landing page, an animated architecture walkthrough, and a real-time transaction feed with Security Arena and Cache Visualizer panels.

Next.js 15Cloudflare PagesStatic Exportframer-motion
ML
Mar 11 AM

Dataset Coverage Audit

Model timestamps revealed the training pipeline had never ingested the complete dataset.

Three missing dataset files discovered: supervised_dataset.csv, remaining_behavior_ext.csv, and ton-iot.csv. All loaders written, validated against schema, and integrated into the merge pipeline. 54,142 rows recovered.

Dataset Audit54K Rows3 Loaders
ML
Mar 11 PM

85.24% (V1)

V1 retraining on the leakage-corrected dataset reached 85.24% accuracy (later superseded by V2).

The expanded 111,499-row dataset rebalanced by SMOTE to 51,735/51,735 yielded: RF Accuracy 85.24%, ROC-AUC 0.9546, Precision 0.7709, Recall 0.9229, F1 0.8401. Stale artefacts — lightgbm, xgboost, voting ensemble — were removed from the repository.

111K rowsV1 baselineROC-AUC 0.9546Artefact cleanup
Together
Mar 11

V1 Finalisation and Deployment

A deployable system is defined by finishing details—texture, colour, and interactive feedback.

Frontend polish: dot-grid body texture, surface-gradient card utility, amber token separated from saffron for distinct FLAG verdict rendering. Next.js static export deployed to Cloudflare Pages. Core pipeline hardened and ready for production integration.

Next.js 15Static ExportPolishProduction-Ready
Security
Mar 12–14

Gateway Hardening — Rate Limiting & Auth

A gateway without rate limiting is a door without a lock.

Production-grade security hardening: per-VPA rate limiter enforcing NPCI OC-215/2025-26 caps (100 req/24h), mTLS mutual authentication layer, HMAC-SHA256 webhook signing for all graph agent push events, and audit log ring-buffer in DashMap. CORS policy tightened to allowlist PSP bank origins only.

Rate LimitermTLSHMAC-SHA256CORSAudit Log
ML
Mar 14–16

LightGBM Secondary Model & Feature Expansion

The ensemble gap that justified removing XGBoost does not apply to a gradient-boosted lightweight.

LightGBM trained as a secondary scorer on the 111K-row corpus. Feature set expanded to 18 variables, adding merchant_risk_freq and amount_log. Both models exported to ONNX; inference pipeline updated to fuse RF and LightGBM scores via weighted average (0.7 / 0.3). Sweep artifacts saved as lgbm_sweeper.onnx.

LightGBM18 featuresONNX fusionWeighted 0.7/0.3
Together
Mar 17–19

Three-Tier Deployment Architecture

Cloud, enterprise, and edge are not three products — they are one system at three integration depths.

Formalised the three-tier deployment model: Cloud (hosted Rust gateway on Railway + Cloudflare Pages), Enterprise (API-first with HMAC webhooks, graph topology streaming, PSP bank integration), Embedded SDK (quantized ONNX < 5MB, ONNX Runtime Mobile, zero round-trip on-device scoring). Each tier shares the same model artefacts and scoring logic.

Cloud TierEnterprise APIEmbedded SDKONNX Mobile
ML
Mar 20–22

IsolationForest Calibration & Sweep

An anomaly detector miscalibrated at 5% contamination flags legitimate high-value merchants every hour.

IsolationForest contamination tuned from 5% to 2% after simulation revealed excessive false positives on recurring high-value UTILITY payments. Bayesian sweep across n_estimators (100–400) and max_samples: optimal 300 trees, 256 max_samples. False positive rate halved without recall loss. Saved as isolation_forest.onnx v2.

Contamination 2%Bayesian sweepFP reductionv2
Security
Mar 24–28

Graph Agent Streaming & Consortium Layer

Batch topology analysis finds yesterday's mules. Streaming graph analytics finds today's.

Graph agent migrated from batch NetworkX snapshots to event-driven incremental updates: each transaction edge appended and fan-out/fan-in/cycle metrics recomputed in O(k) per event. Consortium risk-sharing prototype implemented: anonymised score deltas federated via HMAC-signed shared registry — zero PII exposure.

Streaming graphIncremental O(k)ConsortiumNo PII
Together
Mar 29–31

V2 Three-Tier Launch

One codebase. Three deployment surfaces. One fraud intelligence system.

Varaksha V2 deployed across all three tiers. Cloud: Rust gateway on Railway + Cloudflare Pages CDN, live SSE stream, <10ms P99. Enterprise: graph topology network monitor, SecurityArena attack simulations, webhook delivery. Embedded SDK: on-device ONNX scoring simulation. All tiers share model artefacts from the 111K-row corpus.

V2 LaunchAll Three TiersCloudflareRailwayLive

On the Horizon

What We Build Next

Next steps: features we consciously set down to meet the deadline.

All 22 Scheduled Languages

Expand from 8 to all constitutionally scheduled Indian languages via IndicTrans2 — swap a single function call in agent03.

Accessibility

Mobile SDK Packaging

Package the ONNX inference layer as an Android / iOS SDK so PSPs can embed sub-1ms on-device scoring without a network call.

Distribution

On-Device Edge Inference

Ship varaksha_rf_model.onnx to handsets via ONNX Runtime Mobile. Scores computed locally — zero round-trip latency, works offline.

Performance

Streaming Graph Analytics

Replace batch NetworkX with Apache Flink or Kafka Streams so fan-out and cycle detection updates continuously as edges arrive.

Architecture

Live LLM Legal Summaries

Replace the mock LLM in agent03 with GPT-4o-mini or Groq to generate dynamic, context-aware legal citations per transaction.

AI

NPCI Consortium Risk Sharing

Federate anonymised risk scores across participating PSP banks via a shared NPCI registry — consortium intelligence without PII exposure.

Ecosystem

Automated Regulatory Reporting

Auto-generate FIU-IND Suspicious Transaction Reports for PMLA §3 triggers and maintain a DPDP Act 2023 audit trail per blocked VPA.

Compliance

Open-Source Release

Publish the five-layer pipeline as an open library — plug in your own dataset, retrain in one command, deploy to any cloud with azd.

Community

32 days · 2 people · 3 tiers · shipped.