Dashboard / Screens Loading Very Slowly or Not Loading At All

FrequentlyAsked Question / Exact Problem

  1. “RFM segments are taking 12 seconds to load. RFM movement takes 2 minutes.”
  2. “The Product Frequency dashboard does not load at all.”
  3. “Cohort analysis is giving a network error.”
  4. “The LMS (leads) screen isn’t loading for iPlanet.”
  5. “All dashboards taking 30 seconds — multiple email alerts received.”

Affected Tenant / Context
V2 Retail (most severe — very large bill and product volume), iPlanet, Whitemart. Any highvolume tenant can trigger this.

Root Cause

  1. Unoptimized ClickHouse (CH) queries — Large tenants like V2 Retail generate millions of bills. Dashboard queries lacked proper indexing and partition strategy, causing full table scans. RFM movement query was taking 210 seconds before optimization.
  2. Postgres overload from heavy reports — Running large loyalty liability reports (e.g., ~1.5 lakh rows for Ibaco) caused Postgres to spawn 200+ concurrent query instances, exhausting the connection pool. This slowed all other tenants.
  3. Lead queries using temp files — A SELECT DISTINCT query on the leads table was generating 11MB temp files in PostgreSQL due to missing indexes. This caused cascading slowness on the LMS screen for iPlanet and others.
  4. Third-party media provider slowness (Robeta/Eywa) — When SMS providers respond slowly, CASA’s thread is blocked waiting for them, causing queues to back up. Sathya ran a 2-lakh customer journey through Eywa Media, blocking the queue for 8+ hours

Common Debugging Steps (For CS Team)

  1. Check if slowness is affecting one specific page (e.g., only RFM) or all pages — if all pages, likely a server-level issue. Post immediately in prod-issues.
  2. Note the exact time slowness started — this helps devs correlate with server logs.
  3. Ask:“Is anyone running a large journey (lakh+ customers) right now?” — If yes, that may be the cause.
  4. Try reloading after 5 minutes — transient Postgres overloads usually self-resolve once the heavy query completes.
  5. For V2 Retail or any large tenant: do not run reports for large date ranges during business hours.

Fix Made & Developer Conclusion

  1. V2 Retail Dashboard: Multiple CH query optimizations done by Dharshini. RFM movement: 210s → 21s. Infrastructure scaled: RAM 64GB → 128GB, CPU 16 → 32 cores (Ragul handled infra). Cohort query: 30–60s → ~5s.
  2. Postgres overload: Nisar/Dineshr killed runaway queries. Heavy loyalty liability report query flagged for optimization. Alert set for every 15 minutes if Postgres load exceeds threshold.
  3. Lead query temp file: SELECT DISTINCT on leads table flagged for index optimization and rewrite.
  4. Temporary fix: offloaded to CH. Robeta/Eywa slowness: Sathya was moved to a separate WhatsApp queue to prevent blocking other tenants. Sathya team informed to coordinate with Eywa Media on their side.

Exact Dev to Reach Out To
Dharshini — CH query optimizations, dashboard performance
Suriya — Overall prod load, Postgres tuning, architecture decisions
Ragul — Infrastructure scaling (VM resources, ClickHouse vCPU/RAM)
Dineshr — Immediate query killing, emergency response
Mai gnana ganapathy M Real-time service optimizations