Revolutionizing Communication at Scale

At Meesho, seamless communication is at the heart of our user experience. Whether it’s an OTP for login, an order update, or a personalized shopping recommendation, our platform ensures timely and reliable messaging across multiple channels—WhatsApp, SMS, push notifications, and email.

With over 10 billion messages sent daily, our internal communication platform, Meecom, serves millions of users while catering to the diverse needs of 100+ campaign creators and 20+ microservices. Scaling to this magnitude while maintaining speed, accuracy, and reliability required us to rethink our approach to multi-tenancy and system architecture.


The Core Challenges We Set Out to Solve

  1. Multi-Tenancy at Scale – Different teams and microservices need distinct templates, data sources, and preferences.
  2. Ultra-Fast, High-Throughput Communication – OTPs and transactional messages demand low-latency delivery.
  3. Vendor Flexibility & Failover Mechanism – Supporting multiple third-party vendors for each channel with auto-switching in case of failures.
  4. Granular Reporting & Insights – Real-time dashboards tailored for different teams, campaigns, and communication channels.
  5. Effortless Template & Receiver Management – Ensuring a self-serve system that allows seamless message customization and personalization.


How We Built a Scalable, Multi-Tenant Communication Platform

To tackle these challenges, we designed Meecom as a modular, loosely coupled system, breaking it down into well-defined subsystems—each with a dedicated function to ensure scalability, flexibility, and resilience. This approach allowed us to create a robust, multi-tenant communication infrastructure that serves different teams without interference, maintains high reliability, and offers real-time monitoring and failover support.

Each subsystem is optimized for performance and ownership, ensuring minimal dependencies while allowing seamless collaboration. Here’s how we structured Meecom to meet our ambitious goals:

1. Communication Delivery: Ensuring Messages Reach the Right Users

To handle billions of messages daily, our communication delivery subsystem is designed for speed, reliability, and failover resilience. It acts as the backbone of Meecom, ensuring messages are sent through the right channels with intelligent routing and prioritization.

Capabilities:

  • Integration with multiple third-party vendors
  • Smart sender ID selection and dynamic load balancing
  • Auto-failover for vendor outages
  • Prioritization of critical communications over promotional ones

Scalability Benchmarks:

  • 10 billion notifications/day
  • 200 million WhatsApp messages/day
  • 50 million SMS/day
  • 40 million emails/day


2. Receiver Info Management & Template Personalization

Before messages are dispatched, they need to be personalized and enriched to ensure relevance. This subsystem dynamically merges templates with user data to create highly contextual messages.

  • Receivers include mobile numbers, FCM tokens, and emails.
  • Templates merge with real-time user data.
  • Fun Fact: We fetch and personalize 170 million recipients in under 10 minutes.


3. Template Management: A Unified Dashboard

Managing communication templates across multiple channels can be complex. This subsystem ensures teams have a centralized interface for creating, updating, and tracking templates.

  • Supports WhatsApp, SMS, Push, and Email templates
  • Automatic syncing with external vendor platforms
  • Real-time alerts on template failures or deactivations


4. Multi-Tenancy & Client Configuration

This layer ensures that different teams and microservices can configure and use Meecom independently while maintaining system-wide consistency and security.

  • Self-serve onboarding for teams
  • API and Kafka-based real-time push support
  • Granular role-based access control
  • Per-client rate limits and failure isolation


5. Advanced Reporting & Insights

To measure the effectiveness of our communication system, we built a robust reporting layer that provides real-time analytics and insights.

  • Funnel tracking: Sent → Delivered → Clicked → Read
  • Campaign-specific insights
  • Asynchronous event tracking for minimal system overhead

Developer-Focused Tools for Easy Integration

To simplify adoption and reduce engineering effort, we built developer-friendly SDKs and services:

  • comms-client (Java & Go SDKs): Handles caching, template validation, and real-time push payload validation.
  • comms-core: Manages template storage, client configs, and callback handling.
  • comms-serving: Optimized for large-scale deployment, handling traffic ingestion, admin APIs, and vendor response processing.


Design Philosophies That Enable Scale

  1. Modular and Composable – Each subsystem owns a clear domain.
  2. Asynchronous by Default – Except for OTPs, we rely on event-driven workflows.
  3. Graceful Degradation – System stability even when vendors fail.
  4. Alert-Driven Monitoring – Proactive visibility into template failures and delivery issues.
  5. Developer-First Approach – Faster onboarding and safer integrations.


What’s Next for Meecom?

To stay ahead of the curve, we are actively innovating to enhance communication efficiency, reliability, and intelligence:

  • AI-Based Vendor Selection – Implementing real-time AI-driven routing to choose the best-performing vendor for each message based on latency, success rates, and cost optimization.
  • Automated Template Adaptation – Enabling dynamic content reformatting across different vendors and channels to ensure maximum deliverability and consistency.
  • LLM-Powered Personalization – Leveraging large language models to enhance message relevance, crafting more engaging, user-specific content to improve open and click-through rates.
  • Predictive Communication Insights – Using machine learning to anticipate user behavior and optimize send times for maximum engagement.


Key Takeaways

Building a high-scale, multi-tenant communication platform isn’t just about handling billions of messages—it’s about ensuring:

Reliability: Auto-failover and vendor redundancy

Scalability: Supporting hundreds of teams and microservices

Flexibility: Multi-channel, multi-vendor, multi-tenant support

Self-Serve Capabilities: Empowering teams to move fast without engineering bottlenecks

If you’re tackling similar challenges, we’d love to hear from you! How are you solving communication at scale?